Characters can be accessed in two ways: as code units or as 36 * code points. 37 * Unicode code points are 21-bit integers and are the scalar values 38 * of Unicode characters. ICU uses the type UChar32 for them. 39 * Unicode code units are the storage units of a given 40 * Unicode/UCS Transformation Format (a character encoding scheme). 41 * With UTF-16, all code points can be represented with either one 42 * or two code units ("surrogates"). 43 * String storage is typically based on code units, while properties 44 * of characters are typically determined using code point values. 45 * Some processes may be designed to work with sequences of code units, 46 * or it may be known that all characters that are important to an 47 * algorithm can be represented with single code units. 48 * Other processes will need to use the code point access functions.
ForwardCharacterIterator provides nextPostInc() to access 51 * a code unit and advance an internal position into the text object, 52 * similar to a return text[position++]. 53 * It provides next32PostInc() to access a code point and advance an internal 54 * position.
return text[position++]
next32PostInc() assumes that the current position is that of 57 * the beginning of a code point, i.e., of its first code unit. 58 * After next32PostInc(), this will be true again. 59 * In general, access to code units and code points in the same 60 * iteration loop should not be mixed. In UTF-16, if the current position 61 * is on a second code unit (Low Surrogate), then only that code unit 62 * is returned even by next32PostInc().
For iteration with either function, there are two ways to 65 * check for the end of the iteration. When there are no more 66 * characters in the text object: 67 *
Despite the fact that this function is public, 141 * DO NOT CONSIDER IT PART OF CHARACTERITERATOR'S API! 142 * @return a UClassID for this ForwardCharacterIterator 143 * @stable ICU 2.0 144 */ 145 virtual UClassID getDynamicClassID(void) const override = 0; 146 147 /** 148 * Gets the current code unit for returning and advances to the next code unit 149 * in the iteration range 150 * (toward endIndex()). If there are 151 * no more code units to return, returns DONE. 152 * @return the current code unit. 153 * @stable ICU 2.0 154 */ 155 virtual char16_t nextPostInc(void) = 0; 156 157 /** 158 * Gets the current code point for returning and advances to the next code point 159 * in the iteration range 160 * (toward endIndex()). If there are 161 * no more code points to return, returns DONE. 162 * @return the current code point. 163 * @stable ICU 2.0 164 */ 165 virtual UChar32 next32PostInc(void) = 0; 166 167 /** 168 * Returns false if there are no more code units or code points 169 * at or after the current position in the iteration range. 170 * This is used with nextPostInc() or next32PostInc() in forward 171 * iteration. 172 * @returns false if there are no more code units or code points 173 * at or after the current position in the iteration range. 174 * @stable ICU 2.0 175 */ 176 virtual UBool hasNext() = 0; 177 178 protected: 179 /** Default constructor to be overridden in the implementing class. @stable ICU 2.0*/ 180 ForwardCharacterIterator(); 181 182 /** Copy constructor to be overridden in the implementing class. @stable ICU 2.0*/ 183 ForwardCharacterIterator(const ForwardCharacterIterator &other); 184 185 /** 186 * Assignment operator to be overridden in the implementing class. 187 * @stable ICU 2.0 188 */ 189 ForwardCharacterIterator &operator=(const ForwardCharacterIterator&) { return *this; } 190 }; 191 192 /** 193 * Abstract class that defines an API for iteration 194 * on text objects. 195 * This is an interface for forward and backward iteration 196 * and random access into a text object. 197 * 198 *
The API provides backward compatibility to the Java and older ICU 199 * CharacterIterator classes but extends them significantly: 200 *
Examples for some of the new functions:
Examples, especially for the old API:
288 * \code 289 * void processChar( char16_t c ) 290 * { 291 * cout << " " << c; 292 * } 293 * \endcode 294 *
297 * \code 298 * void traverseForward(CharacterIterator& iter) 299 * { 300 * for(char16_t c = iter.first(); c != CharacterIterator::DONE; c = iter.next()) { 301 * processChar(c); 302 * } 303 * } 304 * \endcode 305 *
308 * \code 309 * void traverseBackward(CharacterIterator& iter) 310 * { 311 * for(char16_t c = iter.last(); c != CharacterIterator::DONE; c = iter.previous()) { 312 * processChar(c); 313 * } 314 * } 315 * \endcode 316 *
320 * \code 321 * void traverseOut(CharacterIterator& iter, int32_t pos) 322 * { 323 * char16_t c; 324 * for (c = iter.setIndex(pos); 325 * c != CharacterIterator::DONE && (Unicode::isLetter(c) || Unicode::isDigit(c)); 326 * c = iter.next()) {} 327 * int32_t end = iter.getIndex(); 328 * for (c = iter.setIndex(pos); 329 * c != CharacterIterator::DONE && (Unicode::isLetter(c) || Unicode::isDigit(c)); 330 * c = iter.previous()) {} 331 * int32_t start = iter.getIndex() + 1; 332 * 333 * cout << "start: " << start << " end: " << end << endl; 334 * for (c = iter.setIndex(start); iter.getIndex() < end; c = iter.next() ) { 335 * processChar(c); 336 * } 337 * } 338 * \endcode 339 *
342 * \code 343 * void CharacterIterator_Example( void ) 344 * { 345 * cout << endl << "===== CharacterIterator_Example: =====" << endl; 346 * UnicodeString text("Ein kleiner Satz."); 347 * StringCharacterIterator iterator(text); 348 * cout << "----- traverseForward: -----------" << endl; 349 * traverseForward( iterator ); 350 * cout << endl << endl << "----- traverseBackward: ----------" << endl; 351 * traverseBackward( iterator ); 352 * cout << endl << endl << "----- traverseOut: ---------------" << endl; 353 * traverseOut( iterator, 7 ); 354 * cout << endl << endl << "-----" << endl; 355 * } 356 * \endcode 357 *