101 * For more information about the collation service see 102 * the User Guide. 103 *
104 * Collation service provides correct sorting orders for most locales supported in ICU. 105 * If specific data for a locale is not available, the orders eventually falls back 106 * to the CLDR root sort order. 107 *
108 * Sort ordering may be customized by providing your own set of rules. For more on 109 * this subject see the 110 * Collation Customization section of the User Guide. 111 *
112 * Note, RuleBasedCollator is not to be subclassed. 113 * @see Collator 114 */ 115 class U_I18N_API RuleBasedCollator final : public Collator { 116 public: 117 /** 118 * RuleBasedCollator constructor. This takes the table rules and builds a 119 * collation table out of them. Please see RuleBasedCollator class 120 * description for more details on the collation rule syntax. 121 * @param rules the collation rules to build the collation table from. 122 * @param status reporting a success or an error. 123 * @stable ICU 2.0 124 */ 125 RuleBasedCollator(const UnicodeString& rules, UErrorCode& status); 126 127 /** 128 * RuleBasedCollator constructor. This takes the table rules and builds a 129 * collation table out of them. Please see RuleBasedCollator class 130 * description for more details on the collation rule syntax. 131 * @param rules the collation rules to build the collation table from. 132 * @param collationStrength strength for comparison 133 * @param status reporting a success or an error. 134 * @stable ICU 2.0 135 */ 136 RuleBasedCollator(const UnicodeString& rules, 137 ECollationStrength collationStrength, 138 UErrorCode& status); 139 140 /** 141 * RuleBasedCollator constructor. This takes the table rules and builds a 142 * collation table out of them. Please see RuleBasedCollator class 143 * description for more details on the collation rule syntax. 144 * @param rules the collation rules to build the collation table from. 145 * @param decompositionMode the normalisation mode 146 * @param status reporting a success or an error. 147 * @stable ICU 2.0 148 */ 149 RuleBasedCollator(const UnicodeString& rules, 150 UColAttributeValue decompositionMode, 151 UErrorCode& status); 152 153 /** 154 * RuleBasedCollator constructor. This takes the table rules and builds a 155 * collation table out of them. Please see RuleBasedCollator class 156 * description for more details on the collation rule syntax. 157 * @param rules the collation rules to build the collation table from. 158 * @param collationStrength strength for comparison 159 * @param decompositionMode the normalisation mode 160 * @param status reporting a success or an error. 161 * @stable ICU 2.0 162 */ 163 RuleBasedCollator(const UnicodeString& rules, 164 ECollationStrength collationStrength, 165 UColAttributeValue decompositionMode, 166 UErrorCode& status); 167 168 #ifndef U_HIDE_INTERNAL_API 169 /** 170 * TODO: document & propose as public API 171 * @internal 172 */ 173 RuleBasedCollator(const UnicodeString &rules, 174 UParseError &parseError, UnicodeString &reason, 175 UErrorCode &errorCode); 176 #endif /* U_HIDE_INTERNAL_API */ 177 178 /** 179 * Copy constructor. 180 * @param other the RuleBasedCollator object to be copied 181 * @stable ICU 2.0 182 */ 183 RuleBasedCollator(const RuleBasedCollator& other); 184 185 186 /** Opens a collator from a collator binary image created using 187 * cloneBinary. Binary image used in instantiation of the 188 * collator remains owned by the user and should stay around for 189 * the lifetime of the collator. The API also takes a base collator 190 * which must be the root collator. 191 * @param bin binary image owned by the user and required through the 192 * lifetime of the collator 193 * @param length size of the image. If negative, the API will try to 194 * figure out the length of the image 195 * @param base Base collator, for lookup of untailored characters. 196 * Must be the root collator, must not be nullptr. 197 * The base is required to be present through the lifetime of the collator. 198 * @param status for catching errors 199 * @return newly created collator 200 * @see cloneBinary 201 * @stable ICU 3.4 202 */ 203 RuleBasedCollator(const uint8_t *bin, int32_t length, 204 const RuleBasedCollator *base, 205 UErrorCode &status); 206 207 /** 208 * Destructor. 209 * @stable ICU 2.0 210 */ 211 virtual ~RuleBasedCollator(); 212 213 /** 214 * Assignment operator. 215 * @param other other RuleBasedCollator object to copy from. 216 * @stable ICU 2.0 217 */ 218 RuleBasedCollator& operator=(const RuleBasedCollator& other); 219 220 /** 221 * Returns true if argument is the same as this object. 222 * @param other Collator object to be compared. 223 * @return true if arguments is the same as this object. 224 * @stable ICU 2.0 225 */ 226 virtual bool operator==(const Collator& other) const override; 227 228 /** 229 * Makes a copy of this object. 230 * @return a copy of this object, owned by the caller 231 * @stable ICU 2.0 232 */ 233 virtual RuleBasedCollator* clone() const override; 234 235 /** 236 * Creates a collation element iterator for the source string. The caller of 237 * this method is responsible for the memory management of the return 238 * pointer. 239 * @param source the string over which the CollationElementIterator will 240 * iterate. 241 * @return the collation element iterator of the source string using this as 242 * the based Collator. 243 * @stable ICU 2.2 244 */ 245 virtual CollationElementIterator* createCollationElementIterator( 246 const UnicodeString& source) const; 247 248 /** 249 * Creates a collation element iterator for the source. The caller of this 250 * method is responsible for the memory management of the returned pointer. 251 * @param source the CharacterIterator which produces the characters over 252 * which the CollationElementItgerator will iterate. 253 * @return the collation element iterator of the source using this as the 254 * based Collator. 255 * @stable ICU 2.2 256 */ 257 virtual CollationElementIterator* createCollationElementIterator( 258 const CharacterIterator& source) const; 259 260 // Make deprecated versions of Collator::compare() visible. 261 using Collator::compare; 262 263 /** 264 * The comparison function compares the character data stored in two 265 * different strings. Returns information about whether a string is less 266 * than, greater than or equal to another string. 267 * @param source the source string to be compared with. 268 * @param target the string that is to be compared with the source string. 269 * @param status possible error code 270 * @return Returns an enum value. UCOL_GREATER if source is greater 271 * than target; UCOL_EQUAL if source is equal to target; UCOL_LESS if source is less 272 * than target 273 * @stable ICU 2.6 274 **/ 275 virtual UCollationResult compare(const UnicodeString& source, 276 const UnicodeString& target, 277 UErrorCode &status) const override; 278 279 /** 280 * Does the same thing as compare but limits the comparison to a specified 281 * length 282 * @param source the source string to be compared with. 283 * @param target the string that is to be compared with the source string. 284 * @param length the length the comparison is limited to 285 * @param status possible error code 286 * @return Returns an enum value. UCOL_GREATER if source (up to the specified 287 * length) is greater than target; UCOL_EQUAL if source (up to specified 288 * length) is equal to target; UCOL_LESS if source (up to the specified 289 * length) is less than target. 290 * @stable ICU 2.6 291 */ 292 virtual UCollationResult compare(const UnicodeString& source, 293 const UnicodeString& target, 294 int32_t length, 295 UErrorCode &status) const override; 296 297 /** 298 * The comparison function compares the character data stored in two 299 * different string arrays. Returns information about whether a string array 300 * is less than, greater than or equal to another string array. 301 * @param source the source string array to be compared with. 302 * @param sourceLength the length of the source string array. If this value 303 * is equal to -1, the string array is null-terminated. 304 * @param target the string that is to be compared with the source string. 305 * @param targetLength the length of the target string array. If this value 306 * is equal to -1, the string array is null-terminated. 307 * @param status possible error code 308 * @return Returns an enum value. UCOL_GREATER if source is greater 309 * than target; UCOL_EQUAL if source is equal to target; UCOL_LESS if source is less 310 * than target 311 * @stable ICU 2.6 312 */ 313 virtual UCollationResult compare(const char16_t* source, int32_t sourceLength, 314 const char16_t* target, int32_t targetLength, 315 UErrorCode &status) const override; 316 317 /** 318 * Compares two strings using the Collator. 319 * Returns whether the first one compares less than/equal to/greater than 320 * the second one. 321 * This version takes UCharIterator input. 322 * @param sIter the first ("source") string iterator 323 * @param tIter the second ("target") string iterator 324 * @param status ICU status 325 * @return UCOL_LESS, UCOL_EQUAL or UCOL_GREATER 326 * @stable ICU 4.2 327 */ 328 virtual UCollationResult compare(UCharIterator &sIter, 329 UCharIterator &tIter, 330 UErrorCode &status) const override; 331 332 /** 333 * Compares two UTF-8 strings using the Collator. 334 * Returns whether the first one compares less than/equal to/greater than 335 * the second one. 336 * This version takes UTF-8 input. 337 * Note that a StringPiece can be implicitly constructed 338 * from a std::string or a NUL-terminated const char * string. 339 * @param source the first UTF-8 string 340 * @param target the second UTF-8 string 341 * @param status ICU status 342 * @return UCOL_LESS, UCOL_EQUAL or UCOL_GREATER 343 * @stable ICU 51 344 */ 345 virtual UCollationResult compareUTF8(const StringPiece &source, 346 const StringPiece &target, 347 UErrorCode &status) const override; 348 349 /** 350 * Transforms the string into a series of characters 351 * that can be compared with CollationKey.compare(). 352 * 353 * Note that sort keys are often less efficient than simply doing comparison. 354 * For more details, see the ICU User Guide. 355 * 356 * @param source the source string. 357 * @param key the transformed key of the source string. 358 * @param status the error code status. 359 * @return the transformed key. 360 * @see CollationKey 361 * @stable ICU 2.0 362 */ 363 virtual CollationKey& getCollationKey(const UnicodeString& source, 364 CollationKey& key, 365 UErrorCode& status) const override; 366 367 /** 368 * Transforms a specified region of the string into a series of characters 369 * that can be compared with CollationKey.compare. 370 * 371 * Note that sort keys are often less efficient than simply doing comparison. 372 * For more details, see the ICU User Guide. 373 * 374 * @param source the source string. 375 * @param sourceLength the length of the source string. 376 * @param key the transformed key of the source string. 377 * @param status the error code status. 378 * @return the transformed key. 379 * @see CollationKey 380 * @stable ICU 2.0 381 */ 382 virtual CollationKey& getCollationKey(const char16_t *source, 383 int32_t sourceLength, 384 CollationKey& key, 385 UErrorCode& status) const override; 386 387 /** 388 * Generates the hash code for the rule-based collation object. 389 * @return the hash code. 390 * @stable ICU 2.0 391 */ 392 virtual int32_t hashCode() const override; 393 394 #ifndef U_FORCE_HIDE_DEPRECATED_API 395 /** 396 * Gets the locale of the Collator 397 * @param type can be either requested, valid or actual locale. For more 398 * information see the definition of ULocDataLocaleType in 399 * uloc.h 400 * @param status the error code status. 401 * @return locale where the collation data lives. If the collator 402 * was instantiated from rules, locale is empty. 403 * @deprecated ICU 2.8 likely to change in ICU 3.0, based on feedback 404 */ 405 virtual Locale getLocale(ULocDataLocaleType type, UErrorCode& status) const override; 406 #endif // U_FORCE_HIDE_DEPRECATED_API 407 408 /** 409 * Gets the tailoring rules for this collator. 410 * @return the collation tailoring from which this collator was created 411 * @stable ICU 2.0 412 */ 413 const UnicodeString& getRules() const; 414 415 /** 416 * Gets the version information for a Collator. 417 * @param info the version # information, the result will be filled in 418 * @stable ICU 2.0 419 */ 420 virtual void getVersion(UVersionInfo info) const override; 421 422 #ifndef U_HIDE_DEPRECATED_API 423 /** 424 * Returns the maximum length of any expansion sequences that end with the 425 * specified comparison order. 426 * 427 * This is specific to the kind of collation element values and sequences 428 * returned by the CollationElementIterator. 429 * Call CollationElementIterator::getMaxExpansion() instead. 430 * 431 * @param order a collation order returned by CollationElementIterator::previous 432 * or CollationElementIterator::next. 433 * @return maximum size of the expansion sequences ending with the collation 434 * element, or 1 if the collation element does not occur at the end of 435 * any expansion sequence 436 * @see CollationElementIterator#getMaxExpansion 437 * @deprecated ICU 51 Use CollationElementIterator::getMaxExpansion() instead. 438 */ 439 int32_t getMaxExpansion(int32_t order) const; 440 #endif /* U_HIDE_DEPRECATED_API */ 441 442 /** 443 * Returns a unique class ID POLYMORPHICALLY. Pure virtual override. This 444 * method is to implement a simple version of RTTI, since not all C++ 445 * compilers support genuine RTTI. Polymorphic operator==() and clone() 446 * methods call this method. 447 * @return The class ID for this object. All objects of a given class have 448 * the same class ID. Objects of other classes have different class 449 * IDs. 450 * @stable ICU 2.0 451 */ 452 virtual UClassID getDynamicClassID(void) const override; 453 454 /** 455 * Returns the class ID for this class. This is useful only for comparing to 456 * a return value from getDynamicClassID(). For example: 457 *
458 * Base* polymorphic_pointer = createPolymorphicObject(); 459 * if (polymorphic_pointer->getDynamicClassID() == 460 * Derived::getStaticClassID()) ... 461 *