ICU 68.2  68.2
Data Structures | Namespaces | Macros | Typedefs | Enumerations | Functions
uiter.h File Reference

C API: Unicode Character Iteration. More...

#include "unicode/utypes.h"

Go to the source code of this file.

Data Structures

struct  UCharIterator
 C API for code unit iteration. More...
 

Namespaces

 icu
 File coll.h.
 

Macros

#define UITER_NO_STATE   ((uint32_t)0xffffffff)
 Constant for UCharIterator getState() indicating an error or an unknown state. More...
 

Typedefs

typedef struct UCharIterator UCharIterator
 C typedef for struct UCharIterator. More...
 
typedef enum UCharIteratorOrigin UCharIteratorOrigin
 Origin constants for UCharIterator.getIndex() and UCharIterator.move(). More...
 
typedef int32_t UCharIteratorGetIndex(UCharIterator *iter, UCharIteratorOrigin origin)
 Function type declaration for UCharIterator.getIndex(). More...
 
typedef int32_t UCharIteratorMove(UCharIterator *iter, int32_t delta, UCharIteratorOrigin origin)
 Function type declaration for UCharIterator.move(). More...
 
typedef UBool UCharIteratorHasNext(UCharIterator *iter)
 Function type declaration for UCharIterator.hasNext(). More...
 
typedef UBool UCharIteratorHasPrevious(UCharIterator *iter)
 Function type declaration for UCharIterator.hasPrevious(). More...
 
typedef UChar32 UCharIteratorCurrent(UCharIterator *iter)
 Function type declaration for UCharIterator.current(). More...
 
typedef UChar32 UCharIteratorNext(UCharIterator *iter)
 Function type declaration for UCharIterator.next(). More...
 
typedef UChar32 UCharIteratorPrevious(UCharIterator *iter)
 Function type declaration for UCharIterator.previous(). More...
 
typedef int32_t UCharIteratorReserved(UCharIterator *iter, int32_t something)
 Function type declaration for UCharIterator.reservedFn(). More...
 
typedef uint32_t UCharIteratorGetState(const UCharIterator *iter)
 Function type declaration for UCharIterator.getState(). More...
 
typedef void UCharIteratorSetState(UCharIterator *iter, uint32_t state, UErrorCode *pErrorCode)
 Function type declaration for UCharIterator.setState(). More...
 

Enumerations

enum  UCharIteratorOrigin {
  UITER_START, UITER_CURRENT, UITER_LIMIT, UITER_ZERO,
  UITER_LENGTH
}
 Origin constants for UCharIterator.getIndex() and UCharIterator.move(). More...
 
enum  {
  UMSGPAT_ARG_NAME_NOT_NUMBER =-1, UMSGPAT_ARG_NAME_NOT_VALID =-2, U_PARSE_CONTEXT_LEN = 16, UIDNA_DEFAULT =0,
  UIDNA_ALLOW_UNASSIGNED =1, UIDNA_USE_STD3_RULES =2, UIDNA_CHECK_BIDI =4, UIDNA_CHECK_CONTEXTJ =8,
  UIDNA_NONTRANSITIONAL_TO_ASCII =0x10, UIDNA_NONTRANSITIONAL_TO_UNICODE =0x20, UIDNA_CHECK_CONTEXTO =0x40, UITER_UNKNOWN_INDEX =-2,
  UNORM_UNICODE_3_2 =0x20, USET_IGNORE_SPACE = 1, USET_CASE_INSENSITIVE = 2, USET_ADD_CASE_MAPPINGS = 4,
  UTEXT_PROVIDER_LENGTH_IS_EXPENSIVE = 1, UTEXT_PROVIDER_STABLE_CHUNKS = 2, UTEXT_PROVIDER_WRITABLE = 3, UTEXT_PROVIDER_HAS_META_DATA = 4,
  UTEXT_PROVIDER_OWNS_TEXT = 5
}
 Constants for UCharIterator. More...
 

Functions

U_CAPI UChar32 uiter_current32 (UCharIterator *iter)
 Helper function for UCharIterator to get the code point at the current index. More...
 
U_CAPI UChar32 uiter_next32 (UCharIterator *iter)
 Helper function for UCharIterator to get the next code point. More...
 
U_CAPI UChar32 uiter_previous32 (UCharIterator *iter)
 Helper function for UCharIterator to get the previous code point. More...
 
U_CAPI uint32_t uiter_getState (const UCharIterator *iter)
 Get the "state" of the iterator in the form of a single 32-bit word. More...
 
U_CAPI void uiter_setState (UCharIterator *iter, uint32_t state, UErrorCode *pErrorCode)
 Restore the "state" of the iterator using a state word from a getState() call. More...
 
U_CAPI void uiter_setString (UCharIterator *iter, const UChar *s, int32_t length)
 Set up a UCharIterator to iterate over a string. More...
 
U_CAPI void uiter_setUTF16BE (UCharIterator *iter, const char *s, int32_t length)
 Set up a UCharIterator to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per UChar). More...
 
U_CAPI void uiter_setUTF8 (UCharIterator *iter, const char *s, int32_t length)
 Set up a UCharIterator to iterate over a UTF-8 string. More...
 
U_CAPI void uiter_setCharacterIterator (UCharIterator *iter, icu::CharacterIterator *charIter)
 Set up a UCharIterator to wrap around a C++ CharacterIterator. More...
 
U_CAPI void uiter_setReplaceable (UCharIterator *iter, const icu::Replaceable *rep)
 Set up a UCharIterator to iterate over a C++ Replaceable. More...
 

Detailed Description

C API: Unicode Character Iteration.

See also
UCharIterator

Definition in file uiter.h.

Macro Definition Documentation

◆ UITER_NO_STATE

#define UITER_NO_STATE   ((uint32_t)0xffffffff)

Constant for UCharIterator getState() indicating an error or an unknown state.

Returned by uiter_getState()/UCharIteratorGetState when an error occurs. Also, some UCharIterator implementations may not be able to return a valid state for each position. This will be clearly documented for each such iterator (none of the public ones here).

Stable:
ICU 2.6

Definition at line 86 of file uiter.h.

Typedef Documentation

◆ UCharIterator

typedef struct UCharIterator UCharIterator

C typedef for struct UCharIterator.

Stable:
ICU 2.1

Definition at line 1 of file uiter.h.

◆ UCharIteratorCurrent

typedef UChar32 UCharIteratorCurrent(UCharIterator *iter)

Function type declaration for UCharIterator.current().

Return the code unit at the current position, or U_SENTINEL if there is none (index is at the limit).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the current code unit
See also
UCharIterator
Stable:
ICU 2.1

Definition at line 188 of file uiter.h.

◆ UCharIteratorGetIndex

typedef int32_t UCharIteratorGetIndex(UCharIterator *iter, UCharIteratorOrigin origin)

Function type declaration for UCharIterator.getIndex().

Gets the current position, or the start or limit of the iteration range.

This function may perform slowly for UITER_CURRENT after setState() was called, or for UITER_LENGTH, because an iterator implementation may have to count UChars if the underlying storage is not UTF-16.

Parameters
iterthe UCharIterator structure ("this pointer")
originget the 0, start, limit, length, or current index
Returns
the requested index, or U_SENTINEL in an error condition
See also
UCharIteratorOrigin
UCharIterator
Stable:
ICU 2.1

Definition at line 107 of file uiter.h.

◆ UCharIteratorGetState

typedef uint32_t UCharIteratorGetState(const UCharIterator *iter)

Function type declaration for UCharIterator.getState().

Get the "state" of the iterator in the form of a single 32-bit word. It is recommended that the state value be calculated to be as small as is feasible. For strings with limited lengths, fewer than 32 bits may be sufficient.

This is used together with setState()/UCharIteratorSetState to save and restore the iterator position more efficiently than with getIndex()/move().

The iterator state is defined as a uint32_t value because it is designed for use in ucol_nextSortKeyPart() which provides 32 bits to store the state of the character iterator.

With some UCharIterator implementations (e.g., UTF-8), getting and setting the UTF-16 index with existing functions (getIndex(UITER_CURRENT) followed by move(pos, UITER_ZERO)) is possible but relatively slow because the iterator has to "walk" from a known index to the requested one. This takes more time the farther it needs to go.

An opaque state value allows an iterator implementation to provide an internal index (UTF-8: the source byte array index) for fast, constant-time restoration.

After calling setState(), a getIndex(UITER_CURRENT) may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.

Some UCharIterator implementations may not be able to return a valid state for each position, in which case they return UITER_NO_STATE instead. This will be clearly documented for each such iterator (none of the public ones here).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the state word
See also
UCharIterator
UCharIteratorSetState
UITER_NO_STATE
Stable:
ICU 2.6

Definition at line 281 of file uiter.h.

◆ UCharIteratorHasNext

typedef UBool UCharIteratorHasNext(UCharIterator *iter)

Function type declaration for UCharIterator.hasNext().

Check if current() and next() can still return another code unit.

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
boolean value for whether current() and next() can still return another code unit
See also
UCharIterator
Stable:
ICU 2.1

Definition at line 159 of file uiter.h.

◆ UCharIteratorHasPrevious

typedef UBool UCharIteratorHasPrevious(UCharIterator *iter)

Function type declaration for UCharIterator.hasPrevious().

Check if previous() can still return another code unit.

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
boolean value for whether previous() can still return another code unit
See also
UCharIterator
Stable:
ICU 2.1

Definition at line 173 of file uiter.h.

◆ UCharIteratorMove

typedef int32_t UCharIteratorMove(UCharIterator *iter, int32_t delta, UCharIteratorOrigin origin)

Function type declaration for UCharIterator.move().

Use iter->move(iter, index, UITER_ZERO) like CharacterIterator::setIndex(index).

Moves the current position relative to the start or limit of the iteration range, or relative to the current position itself. The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta. Out of bounds movement will be pinned to the start or limit.

This function may perform slowly for moving relative to UITER_LENGTH because an iterator implementation may have to count the rest of the UChars if the native storage is not UTF-16.

When moving relative to the limit or length, or relative to the current position after setState() was called, move() may return UITER_UNKNOWN_INDEX (-2) to avoid an inefficient determination of the actual UTF-16 index. The actual index can be determined with getIndex(UITER_CURRENT) which will count the UChars if necessary. See UITER_UNKNOWN_INDEX for details.

Parameters
iterthe UCharIterator structure ("this pointer")
deltacan be positive, zero, or negative
originmove relative to the 0, start, limit, length, or current index
Returns
the new index, or U_SENTINEL on an error condition, or UITER_UNKNOWN_INDEX when the index is not known.
See also
UCharIteratorOrigin
UCharIterator
UITER_UNKNOWN_INDEX
Stable:
ICU 2.1

Definition at line 144 of file uiter.h.

◆ UCharIteratorNext

typedef UChar32 UCharIteratorNext(UCharIterator *iter)

Function type declaration for UCharIterator.next().

Return the code unit at the current index and increment the index (post-increment, like s[i++]), or return U_SENTINEL if there is none (index is at the limit).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the current code unit (and post-increment the current index)
See also
UCharIterator
Stable:
ICU 2.1

Definition at line 204 of file uiter.h.

◆ UCharIteratorOrigin

◆ UCharIteratorPrevious

typedef UChar32 UCharIteratorPrevious(UCharIterator *iter)

Function type declaration for UCharIterator.previous().

Decrement the index and return the code unit from there (pre-decrement, like s[–i]), or return U_SENTINEL if there is none (index is at the start).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the previous code unit (after pre-decrementing the current index)
See also
UCharIterator
Stable:
ICU 2.1

Definition at line 220 of file uiter.h.

◆ UCharIteratorReserved

typedef int32_t UCharIteratorReserved(UCharIterator *iter, int32_t something)

Function type declaration for UCharIterator.reservedFn().

Reserved for future use.

Parameters
iterthe UCharIterator structure ("this pointer")
somethingsome integer argument
Returns
some integer
See also
UCharIterator
Stable:
ICU 2.1

Definition at line 234 of file uiter.h.

◆ UCharIteratorSetState

typedef void UCharIteratorSetState(UCharIterator *iter, uint32_t state, UErrorCode *pErrorCode)

Function type declaration for UCharIterator.setState().

Restore the "state" of the iterator using a state word from a getState() call. The iterator object need not be the same one as for which getState() was called, but it must be of the same type (set up using the same uiter_setXYZ function) and it must iterate over the same string (binary identical regardless of memory address). For more about the state word see UCharIteratorGetState.

After calling setState(), a getIndex(UITER_CURRENT) may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.

Parameters
iterthe UCharIterator structure ("this pointer")
statethe state word from a getState() call on a same-type, same-string iterator
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
UCharIterator
UCharIteratorGetState
Stable:
ICU 2.6

Definition at line 309 of file uiter.h.

Enumeration Type Documentation

◆ anonymous enum

anonymous enum

Constants for UCharIterator.

Stable:
ICU 2.6
Enumerator
UMSGPAT_ARG_NAME_NOT_NUMBER 

Return value from MessagePattern.validateArgumentName() for when the string is a valid "pattern identifier" but not a number.

Stable:
ICU 4.8
UMSGPAT_ARG_NAME_NOT_VALID 

Return value from MessagePattern.validateArgumentName() for when the string is invalid.

It might not be a valid "pattern identifier", or it have only ASCII digits but there is a leading zero or the number is too large.

Stable:
ICU 4.8
UIDNA_DEFAULT 

Default options value: None of the other options are set.

For use in static worker and factory methods.

Stable:
ICU 2.6
UIDNA_ALLOW_UNASSIGNED 

Option to allow unassigned code points in domain names and labels.

For use in static worker and factory methods.

This option is ignored by the UTS46 implementation. (UTS #46 disallows unassigned code points.)

Deprecated:
ICU 55 Use UTS #46 instead via uidna_openUTS46() or class IDNA.
UIDNA_USE_STD3_RULES 

Option to check whether the input conforms to the STD3 ASCII rules, for example the restriction of labels to LDH characters (ASCII Letters, Digits and Hyphen-Minus).

For use in static worker and factory methods.

Stable:
ICU 2.6
UIDNA_CHECK_BIDI 

IDNA option to check for whether the input conforms to the BiDi rules.

For use in static worker and factory methods.

This option is ignored by the IDNA2003 implementation. (IDNA2003 always performs a BiDi check.)

Stable:
ICU 4.6
UIDNA_CHECK_CONTEXTJ 

IDNA option to check for whether the input conforms to the CONTEXTJ rules.

For use in static worker and factory methods.

This option is ignored by the IDNA2003 implementation. (The CONTEXTJ check is new in IDNA2008.)

Stable:
ICU 4.6
UIDNA_NONTRANSITIONAL_TO_ASCII 

IDNA option for nontransitional processing in ToASCII().

For use in static worker and factory methods.

By default, ToASCII() uses transitional processing.

This option is ignored by the IDNA2003 implementation. (This is only relevant for compatibility of newer IDNA implementations with IDNA2003.)

Stable:
ICU 4.6
UIDNA_NONTRANSITIONAL_TO_UNICODE 

IDNA option for nontransitional processing in ToUnicode().

For use in static worker and factory methods.

By default, ToUnicode() uses transitional processing.

This option is ignored by the IDNA2003 implementation. (This is only relevant for compatibility of newer IDNA implementations with IDNA2003.)

Stable:
ICU 4.6
UIDNA_CHECK_CONTEXTO 

IDNA option to check for whether the input conforms to the CONTEXTO rules.

For use in static worker and factory methods.

This option is ignored by the IDNA2003 implementation. (The CONTEXTO check is new in IDNA2008.)

This is for use by registries for IDNA2008 conformance. UTS #46 does not require the CONTEXTO check.

Stable:
ICU 49
UITER_UNKNOWN_INDEX 

Constant value that may be returned by UCharIteratorMove indicating that the final UTF-16 index is not known, but that the move succeeded.

This can occur when moving relative to limit or length, or when moving relative to the current index after a setState() when the current UTF-16 index is not known.

It would be very inefficient to have to count from the beginning of the text just to get the current/limit/length index after moving relative to it. The actual index can be determined with getIndex(UITER_CURRENT) which will count the UChars if necessary.

Stable:
ICU 2.6
UNORM_UNICODE_3_2 

Options bit set value to select Unicode 3.2 normalization (except NormalizationCorrections).

At most one Unicode version can be selected at a time.

Deprecated:
ICU 56 Use unorm2.h instead.
USET_IGNORE_SPACE 

Ignore white space within patterns unless quoted or escaped.

Stable:
ICU 2.4
USET_CASE_INSENSITIVE 

Enable case insensitive matching.

E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. This performs a full closure over case mappings, e.g. U+017F for s.

The resulting set is a superset of the input for the code points but not for the strings. It performs a case mapping closure of the code points and adds full case folding strings for the code points, and reduces strings of the original set to their full case folding equivalents.

This is designed for case-insensitive matches, for example in regular expressions. The full code point case closure allows checking of an input character directly against the closure set. Strings are matched by comparing the case-folded form from the closure set with an incremental case folding of the string in question.

The closure set will also contain single code points if the original set contained case-equivalent strings (like U+00DF for "ss" or "Ss" etc.). This is not necessary (that is, redundant) for the above matching method but results in the same closure sets regardless of whether the original set contained the code point or a string.

Stable:
ICU 2.4
USET_ADD_CASE_MAPPINGS 

Enable case insensitive matching.

E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. This adds the lower-, title-, and uppercase mappings as well as the case folding of each existing element in the set.

Stable:
ICU 3.2
UTEXT_PROVIDER_LENGTH_IS_EXPENSIVE 

It is potentially time consuming for the provider to determine the length of the text.

Stable:
ICU 3.4
UTEXT_PROVIDER_STABLE_CHUNKS 

Text chunks remain valid and usable until the text object is modified or deleted, not just until the next time the access() function is called (which is the default).

Stable:
ICU 3.4
UTEXT_PROVIDER_WRITABLE 

The provider supports modifying the text via the replace() and copy() functions.

See also
Replaceable
Stable:
ICU 3.4
UTEXT_PROVIDER_HAS_META_DATA 

There is meta data associated with the text.

See also
Replaceable::hasMetaData()
Stable:
ICU 3.4
UTEXT_PROVIDER_OWNS_TEXT 

Text provider owns the text storage.

Generally occurs as the result of a deep clone of the UText. When closing the UText, the associated text must also be closed/deleted/freed/ whatever is appropriate.

Stable:
ICU 3.6

Definition at line 56 of file uiter.h.

◆ UCharIteratorOrigin

Origin constants for UCharIterator.getIndex() and UCharIterator.move().

See also
UCharIteratorMove
UCharIterator
Stable:
ICU 2.1

Definition at line 51 of file uiter.h.

Function Documentation

◆ uiter_current32()

U_CAPI UChar32 uiter_current32 ( UCharIterator iter)

Helper function for UCharIterator to get the code point at the current index.

Return the code point that includes the code unit at the current position, or U_SENTINEL if there is none (index is at the limit). If the current code unit is a lead or trail surrogate, then the following or preceding surrogate is used to form the code point value.

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the current code point
See also
UCharIterator
U16_GET
UnicodeString::char32At()
Stable:
ICU 2.1

◆ uiter_getState()

U_CAPI uint32_t uiter_getState ( const UCharIterator iter)

Get the "state" of the iterator in the form of a single 32-bit word.

This is a convenience function that calls iter->getState(iter) if iter->getState is not NULL; if it is NULL or any other error occurs, then UITER_NO_STATE is returned.

Some UCharIterator implementations may not be able to return a valid state for each position, in which case they return UITER_NO_STATE instead. This will be clearly documented for each such iterator (none of the public ones here).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the state word
See also
UCharIterator
UCharIteratorGetState
UITER_NO_STATE
Stable:
ICU 2.6

◆ uiter_next32()

U_CAPI UChar32 uiter_next32 ( UCharIterator iter)

Helper function for UCharIterator to get the next code point.

Return the code point at the current index and increment the index (post-increment, like s[i++]), or return U_SENTINEL if there is none (index is at the limit).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the current code point (and post-increment the current index)
See also
UCharIterator
U16_NEXT
Stable:
ICU 2.1

◆ uiter_previous32()

U_CAPI UChar32 uiter_previous32 ( UCharIterator iter)

Helper function for UCharIterator to get the previous code point.

Decrement the index and return the code point from there (pre-decrement, like s[–i]), or return U_SENTINEL if there is none (index is at the start).

Parameters
iterthe UCharIterator structure ("this pointer")
Returns
the previous code point (after pre-decrementing the current index)
See also
UCharIterator
U16_PREV
Stable:
ICU 2.1

◆ uiter_setCharacterIterator()

U_CAPI void uiter_setCharacterIterator ( UCharIterator iter,
icu::CharacterIterator charIter 
)

Set up a UCharIterator to wrap around a C++ CharacterIterator.

Sets the UCharIterator function pointers for iteration using the CharacterIterator charIter.

The CharacterIterator pointer charIter is set into UCharIterator.context without copying or cloning the CharacterIterator object. The other "protected" UCharIterator fields are set to 0 and will be ignored. The iteration index and boundaries are controlled by the CharacterIterator.

getState() simply returns the current index. move() will always return the final index.

Parameters
iterUCharIterator structure to be set for iteration
charIterCharacterIterator to wrap
See also
UCharIterator
Stable:
ICU 2.1

◆ uiter_setReplaceable()

U_CAPI void uiter_setReplaceable ( UCharIterator iter,
const icu::Replaceable rep 
)

Set up a UCharIterator to iterate over a C++ Replaceable.

Sets the UCharIterator function pointers for iteration over the Replaceable rep with iteration boundaries start=index=0 and length=limit=rep->length(). The "provider" may set the start, index, and limit values at any time within the range 0..length=rep->length(). The length field will be ignored.

The Replaceable pointer rep is set into UCharIterator.context without copying or cloning/reallocating the Replaceable object.

getState() simply returns the current index. move() will always return the final index.

Parameters
iterUCharIterator structure to be set for iteration
repReplaceable to iterate over
See also
UCharIterator
Stable:
ICU 2.1

◆ uiter_setState()

U_CAPI void uiter_setState ( UCharIterator iter,
uint32_t  state,
UErrorCode pErrorCode 
)

Restore the "state" of the iterator using a state word from a getState() call.

This is a convenience function that calls iter->setState(iter, state, pErrorCode) if iter->setState is not NULL; if it is NULL, then U_UNSUPPORTED_ERROR is set.

Parameters
iterthe UCharIterator structure ("this pointer")
statethe state word from a getState() call on a same-type, same-string iterator
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
UCharIterator
UCharIteratorSetState
Stable:
ICU 2.6

◆ uiter_setString()

U_CAPI void uiter_setString ( UCharIterator iter,
const UChar s,
int32_t  length 
)

Set up a UCharIterator to iterate over a string.

Sets the UCharIterator function pointers for iteration over the string s with iteration boundaries start=index=0 and length=limit=string length. The "provider" may set the start, index, and limit values at any time within the range 0..length. The length field will be ignored.

The string pointer s is set into UCharIterator.context without copying or reallocating the string contents.

getState() simply returns the current index. move() will always return the final index.

Parameters
iterUCharIterator structure to be set for iteration
sString to iterate over
lengthLength of s, or -1 if NUL-terminated
See also
UCharIterator
Stable:
ICU 2.1

◆ uiter_setUTF16BE()

U_CAPI void uiter_setUTF16BE ( UCharIterator iter,
const char *  s,
int32_t  length 
)

Set up a UCharIterator to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per UChar).

Everything works just like with a normal UChar iterator (uiter_setString), except that UChars are assembled from byte pairs, and that the length argument here indicates an even number of bytes.

getState() simply returns the current index. move() will always return the final index.

Parameters
iterUCharIterator structure to be set for iteration
sUTF-16BE string to iterate over
lengthLength of s as an even number of bytes, or -1 if NUL-terminated (NUL means pair of 0 bytes at even index from s)
See also
UCharIterator
uiter_setString
Stable:
ICU 2.6

◆ uiter_setUTF8()

U_CAPI void uiter_setUTF8 ( UCharIterator iter,
const char *  s,
int32_t  length 
)

Set up a UCharIterator to iterate over a UTF-8 string.

Sets the UCharIterator function pointers for iteration over the UTF-8 string s with UTF-8 iteration boundaries 0 and length. The implementation counts the UTF-16 index on the fly and lazily evaluates the UTF-16 length of the text.

The start field is used as the UTF-8 offset, the limit field as the UTF-8 length. When the reservedField is not 0, then it contains a supplementary code point and the UTF-16 index is between the two corresponding surrogates. At that point, the UTF-8 index is behind that code point.

The UTF-8 string pointer s is set into UCharIterator.context without copying or reallocating the string contents.

getState() returns a state value consisting of

  • the current UTF-8 source byte index (bits 31..1)
  • a flag (bit 0) that indicates whether the UChar position is in the middle of a surrogate pair (from a 4-byte UTF-8 sequence for the corresponding supplementary code point)

getState() cannot also encode the UTF-16 index in the state value. move(relative to limit or length), or move(relative to current) after setState(), may return UITER_UNKNOWN_INDEX.

Parameters
iterUCharIterator structure to be set for iteration
sUTF-8 string to iterate over
lengthLength of s in bytes, or -1 if NUL-terminated
See also
UCharIterator
Stable:
ICU 2.6