Collator
92 * Collator is an abstract base class. Subclasses implement 93 * specific collation strategies. One subclass, 94 * RuleBasedCollator, is currently provided and is applicable 95 * to a wide set of languages. Other subclasses may be created to handle more 96 * specialized needs. 97 *
RuleBasedCollator
98 * Like other locale-sensitive classes, you can use the static factory method, 99 * createInstance, to obtain the appropriate 100 * Collator object for a given locale. You will only need to 101 * look at the subclasses of Collator if you need to 102 * understand the details of a particular collation strategy or if you need to 103 * modify that strategy. 104 *
createInstance
105 * The following example shows how to compare two strings using the 106 * Collator for the default locale. 107 * \htmlonly
\endhtmlonly 108 * 109 * \code 110 * // Compare two strings in the default locale 111 * UErrorCode success = U_ZERO_ERROR; 112 * Collator* myCollator = Collator::createInstance(success); 113 * if (myCollator->compare("abc", "ABC") < 0) 114 * cout << "abc is less than ABC" << endl; 115 * else 116 * cout << "abc is greater than or equal to ABC" << endl; 117 * \endcode 118 * 119 * \htmlonly
109 * \code 110 * // Compare two strings in the default locale 111 * UErrorCode success = U_ZERO_ERROR; 112 * Collator* myCollator = Collator::createInstance(success); 113 * if (myCollator->compare("abc", "ABC") < 0) 114 * cout << "abc is less than ABC" << endl; 115 * else 116 * cout << "abc is greater than or equal to ABC" << endl; 117 * \endcode 118 *
121 * You can set a Collator's strength attribute to 122 * determine the level of difference considered significant in comparisons. 123 * Five strengths are provided: PRIMARY, SECONDARY, 124 * TERTIARY, QUATERNARY and IDENTICAL. 125 * The exact assignment of strengths to language features is locale dependent. 126 * For example, in Czech, "e" and "f" are considered primary differences, 127 * while "e" and "\u00EA" are secondary differences, "e" and "E" are tertiary 128 * differences and "e" and "e" are identical. The following shows how both case 129 * and accents could be ignored for US English. 130 * \htmlonly
PRIMARY
SECONDARY
TERTIARY
QUATERNARY
IDENTICAL
\endhtmlonly 131 * 132 * \code 133 * //Get the Collator for US English and set its strength to PRIMARY 134 * UErrorCode success = U_ZERO_ERROR; 135 * Collator* usCollator = Collator::createInstance(Locale::getUS(), success); 136 * usCollator->setStrength(Collator::PRIMARY); 137 * if (usCollator->compare("abc", "ABC") == 0) 138 * cout << "'abc' and 'ABC' strings are equivalent with strength PRIMARY" << endl; 139 * \endcode 140 * 141 * \htmlonly
132 * \code 133 * //Get the Collator for US English and set its strength to PRIMARY 134 * UErrorCode success = U_ZERO_ERROR; 135 * Collator* usCollator = Collator::createInstance(Locale::getUS(), success); 136 * usCollator->setStrength(Collator::PRIMARY); 137 * if (usCollator->compare("abc", "ABC") == 0) 138 * cout << "'abc' and 'ABC' strings are equivalent with strength PRIMARY" << endl; 139 * \endcode 140 *
getSortKey
strcmp()
CollationKey
152 * Note: Collators with different Locale, 153 * and CollationStrength settings will return different sort 154 * orders for the same set of strings. Locales have specific collation rules, 155 * and the way in which secondary and tertiary differences are taken into 156 * account, for example, will result in a different sorting order for same 157 * strings. 158 *
Example of use: 404 *
405 * . char16_t ABC[] = {0x41, 0x42, 0x43, 0}; // = "ABC" 406 * . char16_t abc[] = {0x61, 0x62, 0x63, 0}; // = "abc" 407 * . UErrorCode status = U_ZERO_ERROR; 408 * . Collator *myCollation = 409 * . Collator::createInstance(Locale::getUS(), status); 410 * . if (U_FAILURE(status)) return; 411 * . myCollation->setStrength(Collator::PRIMARY); 412 * . // result would be Collator::EQUAL ("abc" == "ABC") 413 * . // (no primary difference between "abc" and "ABC") 414 * . Collator::EComparisonResult result = 415 * . myCollation->compare(abc, 3, ABC, 3); 416 * . myCollation->setStrength(Collator::TERTIARY); 417 * . // result would be Collator::LESS ("abc" <<< "ABC") 418 * . // (with tertiary difference between "abc" and "ABC") 419 * . result = myCollation->compare(abc, 3, ABC, 3); 420 *
Use CollationKey::equals or CollationKey::compare to compare the 494 * generated sort keys. 495 * If the source string is null, a null collation key will be returned. 496 * 497 * Note that sort keys are often less efficient than simply doing comparison. 498 * For more details, see the ICU User Guide. 499 * 500 * @param source the source string to be transformed into a sort key. 501 * @param key the collation key to be filled in 502 * @param status the error code status. 503 * @return the collation key of the string based on the collation rules. 504 * @see CollationKey#compare 505 * @stable ICU 2.0 506 */ 507 virtual CollationKey& getCollationKey(const UnicodeString& source, 508 CollationKey& key, 509 UErrorCode& status) const = 0; 510 511 /** 512 * Transforms the string into a series of characters that can be compared 513 * with CollationKey::compareTo. It is not possible to restore the original 514 * string from the chars in the sort key. 515 *
Use CollationKey::equals or CollationKey::compare to compare the 516 * generated sort keys. 517 *
If the source string is null, a null collation key will be returned. 518 * 519 * Note that sort keys are often less efficient than simply doing comparison. 520 * For more details, see the ICU User Guide. 521 * 522 * @param source the source string to be transformed into a sort key. 523 * @param sourceLength length of the collation key 524 * @param key the collation key to be filled in 525 * @param status the error code status. 526 * @return the collation key of the string based on the collation rules. 527 * @see CollationKey#compare 528 * @stable ICU 2.0 529 */ 530 virtual CollationKey& getCollationKey(const char16_t*source, 531 int32_t sourceLength, 532 CollationKey& key, 533 UErrorCode& status) const = 0; 534 /** 535 * Generates the hash code for the collation object 536 * @stable ICU 2.0 537 */ 538 virtual int32_t hashCode(void) const = 0; 539 540 #ifndef U_FORCE_HIDE_DEPRECATED_API 541 /** 542 * Gets the locale of the Collator 543 * 544 * @param type can be either requested, valid or actual locale. For more 545 * information see the definition of ULocDataLocaleType in 546 * uloc.h 547 * @param status the error code status. 548 * @return locale where the collation data lives. If the collator 549 * was instantiated from rules, locale is empty. 550 * @deprecated ICU 2.8 This API is under consideration for revision 551 * in ICU 3.0. 552 */ 553 virtual Locale getLocale(ULocDataLocaleType type, UErrorCode& status) const = 0; 554 #endif // U_FORCE_HIDE_DEPRECATED_API 555 556 /** 557 * Convenience method for comparing two strings based on the collation rules. 558 * @param source the source string to be compared with. 559 * @param target the target string to be compared with. 560 * @return true if the first string is greater than the second one, 561 * according to the collation rules. false, otherwise. 562 * @see Collator#compare 563 * @stable ICU 2.0 564 */ 565 UBool greater(const UnicodeString& source, const UnicodeString& target) 566 const; 567 568 /** 569 * Convenience method for comparing two strings based on the collation rules. 570 * @param source the source string to be compared with. 571 * @param target the target string to be compared with. 572 * @return true if the first string is greater than or equal to the second 573 * one, according to the collation rules. false, otherwise. 574 * @see Collator#compare 575 * @stable ICU 2.0 576 */ 577 UBool greaterOrEqual(const UnicodeString& source, 578 const UnicodeString& target) const; 579 580 /** 581 * Convenience method for comparing two strings based on the collation rules. 582 * @param source the source string to be compared with. 583 * @param target the target string to be compared with. 584 * @return true if the strings are equal according to the collation rules. 585 * false, otherwise. 586 * @see Collator#compare 587 * @stable ICU 2.0 588 */ 589 UBool equals(const UnicodeString& source, const UnicodeString& target) const; 590 591 #ifndef U_FORCE_HIDE_DEPRECATED_API 592 /** 593 * Determines the minimum strength that will be used in comparison or 594 * transformation. 595 *
E.g. with strength == SECONDARY, the tertiary difference is ignored 596 *
E.g. with strength == PRIMARY, the secondary and tertiary difference 597 * are ignored. 598 * @return the current comparison level. 599 * @see Collator#setStrength 600 * @deprecated ICU 2.6 Use getAttribute(UCOL_STRENGTH...) instead 601 */ 602 virtual ECollationStrength getStrength(void) const; 603 604 /** 605 * Sets the minimum strength to be used in comparison or transformation. 606 *
Example of use: 607 *
608 * \code 609 * UErrorCode status = U_ZERO_ERROR; 610 * Collator*myCollation = Collator::createInstance(Locale::getUS(), status); 611 * if (U_FAILURE(status)) return; 612 * myCollation->setStrength(Collator::PRIMARY); 613 * // result will be "abc" == "ABC" 614 * // tertiary differences will be ignored 615 * Collator::ComparisonResult result = myCollation->compare("abc", "ABC"); 616 * \endcode 617 *
The reordering codes are a combination of script codes and reorder codes. 648 * @param reorderCodes An array of script codes in the new order. This can be nullptr if the 649 * length is also set to 0. An empty array will clear any reordering codes on the collator. 650 * @param reorderCodesLength The length of reorderCodes. 651 * @param status error code 652 * @see ucol_setReorderCodes 653 * @see Collator#getReorderCodes 654 * @see Collator#getEquivalentReorderCodes 655 * @see UScriptCode 656 * @see UColReorderCode 657 * @stable ICU 4.8 658 */ 659 virtual void setReorderCodes(const int32_t* reorderCodes, 660 int32_t reorderCodesLength, 661 UErrorCode& status) ; 662 663 /** 664 * Retrieves the reorder codes that are grouped with the given reorder code. Some reorder 665 * codes will be grouped and must reorder together. 666 * Beginning with ICU 55, scripts only reorder together if they are primary-equal, 667 * for example Hiragana and Katakana. 668 * 669 * @param reorderCode The reorder code to determine equivalence for. 670 * @param dest The array to fill with the script equivalence reordering codes. 671 * @param destCapacity The length of dest. If it is 0, then dest may be nullptr and the 672 * function will only return the length of the result without writing any codes (pre-flighting). 673 * @param status A reference to an error code value, which must not indicate 674 * a failure before the function call. 675 * @return The length of the of the reordering code equivalence array. 676 * @see ucol_setReorderCodes 677 * @see Collator#getReorderCodes 678 * @see Collator#setReorderCodes 679 * @see UScriptCode 680 * @see UColReorderCode 681 * @stable ICU 4.8 682 */ 683 static int32_t U_EXPORT2 getEquivalentReorderCodes(int32_t reorderCode, 684 int32_t* dest, 685 int32_t destCapacity, 686 UErrorCode& status); 687 688 /** 689 * Get name of the object for the desired Locale, in the desired language 690 * @param objectLocale must be from getAvailableLocales 691 * @param displayLocale specifies the desired locale for output 692 * @param name the fill-in parameter of the return value 693 * @return display-able name of the object for the object locale in the 694 * desired language 695 * @stable ICU 2.0 696 */ 697 static UnicodeString& U_EXPORT2 getDisplayName(const Locale& objectLocale, 698 const Locale& displayLocale, 699 UnicodeString& name); 700 701 /** 702 * Get name of the object for the desired Locale, in the language of the 703 * default locale. 704 * @param objectLocale must be from getAvailableLocales 705 * @param name the fill-in parameter of the return value 706 * @return name of the object for the desired locale in the default language 707 * @stable ICU 2.0 708 */ 709 static UnicodeString& U_EXPORT2 getDisplayName(const Locale& objectLocale, 710 UnicodeString& name); 711 712 /** 713 * Get the set of Locales for which Collations are installed. 714 * 715 *
Note this does not include locales supported by registered collators. 716 * If collators might have been registered, use the overload of getAvailableLocales 717 * that returns a StringEnumeration.
1224 * If standard locale display names are sufficient, Collator instances can 1225 * be registered using registerInstance instead.
1227 * Note: if the collators are to be used from C APIs, they must be instances 1228 * of RuleBasedCollator.