ICU 68.2
68.2
|
C API: Parse Error Information. More...
#include "unicode/utypes.h"
Go to the source code of this file.
Data Structures | |
struct | UParseError |
A UParseError struct is used to returned detailed information about parsing errors. More... | |
Typedefs | |
typedef struct UParseError | UParseError |
A UParseError struct is used to returned detailed information about parsing errors. More... | |
Enumerations | |
enum | { UMSGPAT_ARG_NAME_NOT_NUMBER =-1, UMSGPAT_ARG_NAME_NOT_VALID =-2, U_PARSE_CONTEXT_LEN = 16, UIDNA_DEFAULT =0, UIDNA_ALLOW_UNASSIGNED =1, UIDNA_USE_STD3_RULES =2, UIDNA_CHECK_BIDI =4, UIDNA_CHECK_CONTEXTJ =8, UIDNA_NONTRANSITIONAL_TO_ASCII =0x10, UIDNA_NONTRANSITIONAL_TO_UNICODE =0x20, UIDNA_CHECK_CONTEXTO =0x40, UITER_UNKNOWN_INDEX =-2, UNORM_UNICODE_3_2 =0x20, USET_IGNORE_SPACE = 1, USET_CASE_INSENSITIVE = 2, USET_ADD_CASE_MAPPINGS = 4, UTEXT_PROVIDER_LENGTH_IS_EXPENSIVE = 1, UTEXT_PROVIDER_STABLE_CHUNKS = 2, UTEXT_PROVIDER_WRITABLE = 3, UTEXT_PROVIDER_HAS_META_DATA = 4, UTEXT_PROVIDER_OWNS_TEXT = 5 } |
The capacity of the context strings in UParseError. More... | |
C API: Parse Error Information.
Definition in file parseerr.h.
typedef struct UParseError UParseError |
A UParseError struct is used to returned detailed information about parsing errors.
It is used by ICU parsing engines that parse long rules, patterns, or programs, where the text being parsed is long enough that more information than a UErrorCode is needed to localize the error.
The line, offset, and context fields are optional; parsing engines may choose not to use to use them.
The preContext and postContext strings include some part of the context surrounding the error. If the source text is "let for=7" and "for" is the error (e.g., because it is a reserved word), then some examples of what a parser might produce are the following:
preContext postContext "" "" The parser does not support context "let " "=7" Pre- and post-context only "let " "for=7" Pre- and post-context and error text "" "for" Error text only
Examples of engines which use UParseError (or may use it in the future) are Transliterator, RuleBasedBreakIterator, and RegexPattern.
anonymous enum |
The capacity of the context strings in UParseError.
Enumerator | |
---|---|
UMSGPAT_ARG_NAME_NOT_NUMBER | Return value from MessagePattern.validateArgumentName() for when the string is a valid "pattern identifier" but not a number.
|
UMSGPAT_ARG_NAME_NOT_VALID | Return value from MessagePattern.validateArgumentName() for when the string is invalid. It might not be a valid "pattern identifier", or it have only ASCII digits but there is a leading zero or the number is too large.
|
UIDNA_DEFAULT | Default options value: None of the other options are set. For use in static worker and factory methods.
|
UIDNA_ALLOW_UNASSIGNED | Option to allow unassigned code points in domain names and labels. For use in static worker and factory methods. This option is ignored by the UTS46 implementation. (UTS #46 disallows unassigned code points.)
|
UIDNA_USE_STD3_RULES | Option to check whether the input conforms to the STD3 ASCII rules, for example the restriction of labels to LDH characters (ASCII Letters, Digits and Hyphen-Minus). For use in static worker and factory methods.
|
UIDNA_CHECK_BIDI | IDNA option to check for whether the input conforms to the BiDi rules. For use in static worker and factory methods. This option is ignored by the IDNA2003 implementation. (IDNA2003 always performs a BiDi check.)
|
UIDNA_CHECK_CONTEXTJ | IDNA option to check for whether the input conforms to the CONTEXTJ rules. For use in static worker and factory methods. This option is ignored by the IDNA2003 implementation. (The CONTEXTJ check is new in IDNA2008.)
|
UIDNA_NONTRANSITIONAL_TO_ASCII | IDNA option for nontransitional processing in ToASCII(). For use in static worker and factory methods. By default, ToASCII() uses transitional processing. This option is ignored by the IDNA2003 implementation. (This is only relevant for compatibility of newer IDNA implementations with IDNA2003.)
|
UIDNA_NONTRANSITIONAL_TO_UNICODE | IDNA option for nontransitional processing in ToUnicode(). For use in static worker and factory methods. By default, ToUnicode() uses transitional processing. This option is ignored by the IDNA2003 implementation. (This is only relevant for compatibility of newer IDNA implementations with IDNA2003.)
|
UIDNA_CHECK_CONTEXTO | IDNA option to check for whether the input conforms to the CONTEXTO rules. For use in static worker and factory methods. This option is ignored by the IDNA2003 implementation. (The CONTEXTO check is new in IDNA2008.) This is for use by registries for IDNA2008 conformance. UTS #46 does not require the CONTEXTO check.
|
UITER_UNKNOWN_INDEX | Constant value that may be returned by UCharIteratorMove indicating that the final UTF-16 index is not known, but that the move succeeded. This can occur when moving relative to limit or length, or when moving relative to the current index after a setState() when the current UTF-16 index is not known. It would be very inefficient to have to count from the beginning of the text just to get the current/limit/length index after moving relative to it. The actual index can be determined with getIndex(UITER_CURRENT) which will count the UChars if necessary.
|
UNORM_UNICODE_3_2 | Options bit set value to select Unicode 3.2 normalization (except NormalizationCorrections). At most one Unicode version can be selected at a time.
|
USET_IGNORE_SPACE | Ignore white space within patterns unless quoted or escaped.
|
USET_CASE_INSENSITIVE | Enable case insensitive matching. E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. This performs a full closure over case mappings, e.g. U+017F for s. The resulting set is a superset of the input for the code points but not for the strings. It performs a case mapping closure of the code points and adds full case folding strings for the code points, and reduces strings of the original set to their full case folding equivalents. This is designed for case-insensitive matches, for example in regular expressions. The full code point case closure allows checking of an input character directly against the closure set. Strings are matched by comparing the case-folded form from the closure set with an incremental case folding of the string in question. The closure set will also contain single code points if the original set contained case-equivalent strings (like U+00DF for "ss" or "Ss" etc.). This is not necessary (that is, redundant) for the above matching method but results in the same closure sets regardless of whether the original set contained the code point or a string.
|
USET_ADD_CASE_MAPPINGS | Enable case insensitive matching. E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. This adds the lower-, title-, and uppercase mappings as well as the case folding of each existing element in the set.
|
UTEXT_PROVIDER_LENGTH_IS_EXPENSIVE | It is potentially time consuming for the provider to determine the length of the text.
|
UTEXT_PROVIDER_STABLE_CHUNKS | Text chunks remain valid and usable until the text object is modified or deleted, not just until the next time the access() function is called (which is the default).
|
UTEXT_PROVIDER_WRITABLE | The provider supports modifying the text via the replace() and copy() functions.
|
UTEXT_PROVIDER_HAS_META_DATA | There is meta data associated with the text.
|
UTEXT_PROVIDER_OWNS_TEXT | Text provider owns the text storage. Generally occurs as the result of a deep clone of the UText. When closing the UText, the associated text must also be closed/deleted/freed/ whatever is appropriate.
|
Definition at line 27 of file parseerr.h.