ICU 68.2  68.2
Data Structures | Namespaces | Macros | Typedefs | Enumerations
messagepattern.h File Reference

C++ API: MessagePattern class: Parses and represents ICU MessageFormat patterns. More...

#include "unicode/utypes.h"
#include "unicode/parseerr.h"
#include "unicode/unistr.h"

Go to the source code of this file.

Data Structures

class  icu::MessagePattern
 Parses and represents ICU MessageFormat patterns. More...
 
class  icu::MessagePattern::Part
 A message pattern "part", representing a pattern parsing event. More...
 

Namespaces

 icu
 File coll.h.
 

Macros

#define UMSGPAT_ARG_TYPE_HAS_PLURAL_STYLE(argType)    ((argType)==UMSGPAT_ARG_TYPE_PLURAL || (argType)==UMSGPAT_ARG_TYPE_SELECTORDINAL)
 Returns true if the argument type has a plural style part sequence and semantics, for example UMSGPAT_ARG_TYPE_PLURAL and UMSGPAT_ARG_TYPE_SELECTORDINAL. More...
 
#define UMSGPAT_NO_NUMERIC_VALUE   ((double)(-123456789))
 Special value that is returned by getNumericValue(Part) when no numeric value is defined for a part. More...
 

Typedefs

typedef enum UMessagePatternApostropheMode UMessagePatternApostropheMode
 
typedef enum UMessagePatternPartType UMessagePatternPartType
 
typedef enum UMessagePatternArgType UMessagePatternArgType
 

Enumerations

enum  UMessagePatternApostropheMode { UMSGPAT_APOS_DOUBLE_OPTIONAL, UMSGPAT_APOS_DOUBLE_REQUIRED }
 Mode for when an apostrophe starts quoted literal text for MessageFormat output. More...
 
enum  UMessagePatternPartType {
  UMSGPAT_PART_TYPE_MSG_START, UMSGPAT_PART_TYPE_MSG_LIMIT, UMSGPAT_PART_TYPE_SKIP_SYNTAX, UMSGPAT_PART_TYPE_INSERT_CHAR,
  UMSGPAT_PART_TYPE_REPLACE_NUMBER, UMSGPAT_PART_TYPE_ARG_START, UMSGPAT_PART_TYPE_ARG_LIMIT, UMSGPAT_PART_TYPE_ARG_NUMBER,
  UMSGPAT_PART_TYPE_ARG_NAME, UMSGPAT_PART_TYPE_ARG_TYPE, UMSGPAT_PART_TYPE_ARG_STYLE, UMSGPAT_PART_TYPE_ARG_SELECTOR,
  UMSGPAT_PART_TYPE_ARG_INT, UMSGPAT_PART_TYPE_ARG_DOUBLE
}
 MessagePattern::Part type constants. More...
 
enum  UMessagePatternArgType {
  UMSGPAT_ARG_TYPE_NONE, UMSGPAT_ARG_TYPE_SIMPLE, UMSGPAT_ARG_TYPE_CHOICE, UMSGPAT_ARG_TYPE_PLURAL,
  UMSGPAT_ARG_TYPE_SELECT, UMSGPAT_ARG_TYPE_SELECTORDINAL
}
 Argument type constants. More...
 
enum  {
  UMSGPAT_ARG_NAME_NOT_NUMBER =-1, UMSGPAT_ARG_NAME_NOT_VALID =-2, U_PARSE_CONTEXT_LEN = 16, UIDNA_DEFAULT =0,
  UIDNA_ALLOW_UNASSIGNED =1, UIDNA_USE_STD3_RULES =2, UIDNA_CHECK_BIDI =4, UIDNA_CHECK_CONTEXTJ =8,
  UIDNA_NONTRANSITIONAL_TO_ASCII =0x10, UIDNA_NONTRANSITIONAL_TO_UNICODE =0x20, UIDNA_CHECK_CONTEXTO =0x40, UITER_UNKNOWN_INDEX =-2,
  UNORM_UNICODE_3_2 =0x20, USET_IGNORE_SPACE = 1, USET_CASE_INSENSITIVE = 2, USET_ADD_CASE_MAPPINGS = 4,
  UTEXT_PROVIDER_LENGTH_IS_EXPENSIVE = 1, UTEXT_PROVIDER_STABLE_CHUNKS = 2, UTEXT_PROVIDER_WRITABLE = 3, UTEXT_PROVIDER_HAS_META_DATA = 4,
  UTEXT_PROVIDER_OWNS_TEXT = 5
}
 

Detailed Description

C++ API: MessagePattern class: Parses and represents ICU MessageFormat patterns.

Definition in file messagepattern.h.

Macro Definition Documentation

◆ UMSGPAT_ARG_TYPE_HAS_PLURAL_STYLE

#define UMSGPAT_ARG_TYPE_HAS_PLURAL_STYLE (   argType)     ((argType)==UMSGPAT_ARG_TYPE_PLURAL || (argType)==UMSGPAT_ARG_TYPE_SELECTORDINAL)

Returns true if the argument type has a plural style part sequence and semantics, for example UMSGPAT_ARG_TYPE_PLURAL and UMSGPAT_ARG_TYPE_SELECTORDINAL.

Stable:
ICU 50

Definition at line 272 of file messagepattern.h.

◆ UMSGPAT_NO_NUMERIC_VALUE

#define UMSGPAT_NO_NUMERIC_VALUE   ((double)(-123456789))

Special value that is returned by getNumericValue(Part) when no numeric value is defined for a part.

See also
MessagePattern.getNumericValue()
Stable:
ICU 4.8

Definition at line 299 of file messagepattern.h.

Typedef Documentation

◆ UMessagePatternApostropheMode

Stable:
ICU 4.8

Definition at line 1 of file messagepattern.h.

◆ UMessagePatternArgType

Stable:
ICU 4.8

Definition at line 1 of file messagepattern.h.

◆ UMessagePatternPartType

Stable:
ICU 4.8

Definition at line 1 of file messagepattern.h.

Enumeration Type Documentation

◆ anonymous enum

anonymous enum
Enumerator
UMSGPAT_ARG_NAME_NOT_NUMBER 

Return value from MessagePattern.validateArgumentName() for when the string is a valid "pattern identifier" but not a number.

Stable:
ICU 4.8
UMSGPAT_ARG_NAME_NOT_VALID 

Return value from MessagePattern.validateArgumentName() for when the string is invalid.

It might not be a valid "pattern identifier", or it have only ASCII digits but there is a leading zero or the number is too large.

Stable:
ICU 4.8
UIDNA_DEFAULT 

Default options value: None of the other options are set.

For use in static worker and factory methods.

Stable:
ICU 2.6
UIDNA_ALLOW_UNASSIGNED 

Option to allow unassigned code points in domain names and labels.

For use in static worker and factory methods.

This option is ignored by the UTS46 implementation. (UTS #46 disallows unassigned code points.)

Deprecated:
ICU 55 Use UTS #46 instead via uidna_openUTS46() or class IDNA.
UIDNA_USE_STD3_RULES 

Option to check whether the input conforms to the STD3 ASCII rules, for example the restriction of labels to LDH characters (ASCII Letters, Digits and Hyphen-Minus).

For use in static worker and factory methods.

Stable:
ICU 2.6
UIDNA_CHECK_BIDI 

IDNA option to check for whether the input conforms to the BiDi rules.

For use in static worker and factory methods.

This option is ignored by the IDNA2003 implementation. (IDNA2003 always performs a BiDi check.)

Stable:
ICU 4.6
UIDNA_CHECK_CONTEXTJ 

IDNA option to check for whether the input conforms to the CONTEXTJ rules.

For use in static worker and factory methods.

This option is ignored by the IDNA2003 implementation. (The CONTEXTJ check is new in IDNA2008.)

Stable:
ICU 4.6
UIDNA_NONTRANSITIONAL_TO_ASCII 

IDNA option for nontransitional processing in ToASCII().

For use in static worker and factory methods.

By default, ToASCII() uses transitional processing.

This option is ignored by the IDNA2003 implementation. (This is only relevant for compatibility of newer IDNA implementations with IDNA2003.)

Stable:
ICU 4.6
UIDNA_NONTRANSITIONAL_TO_UNICODE 

IDNA option for nontransitional processing in ToUnicode().

For use in static worker and factory methods.

By default, ToUnicode() uses transitional processing.

This option is ignored by the IDNA2003 implementation. (This is only relevant for compatibility of newer IDNA implementations with IDNA2003.)

Stable:
ICU 4.6
UIDNA_CHECK_CONTEXTO 

IDNA option to check for whether the input conforms to the CONTEXTO rules.

For use in static worker and factory methods.

This option is ignored by the IDNA2003 implementation. (The CONTEXTO check is new in IDNA2008.)

This is for use by registries for IDNA2008 conformance. UTS #46 does not require the CONTEXTO check.

Stable:
ICU 49
UITER_UNKNOWN_INDEX 

Constant value that may be returned by UCharIteratorMove indicating that the final UTF-16 index is not known, but that the move succeeded.

This can occur when moving relative to limit or length, or when moving relative to the current index after a setState() when the current UTF-16 index is not known.

It would be very inefficient to have to count from the beginning of the text just to get the current/limit/length index after moving relative to it. The actual index can be determined with getIndex(UITER_CURRENT) which will count the UChars if necessary.

Stable:
ICU 2.6
UNORM_UNICODE_3_2 

Options bit set value to select Unicode 3.2 normalization (except NormalizationCorrections).

At most one Unicode version can be selected at a time.

Deprecated:
ICU 56 Use unorm2.h instead.
USET_IGNORE_SPACE 

Ignore white space within patterns unless quoted or escaped.

Stable:
ICU 2.4
USET_CASE_INSENSITIVE 

Enable case insensitive matching.

E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. This performs a full closure over case mappings, e.g. U+017F for s.

The resulting set is a superset of the input for the code points but not for the strings. It performs a case mapping closure of the code points and adds full case folding strings for the code points, and reduces strings of the original set to their full case folding equivalents.

This is designed for case-insensitive matches, for example in regular expressions. The full code point case closure allows checking of an input character directly against the closure set. Strings are matched by comparing the case-folded form from the closure set with an incremental case folding of the string in question.

The closure set will also contain single code points if the original set contained case-equivalent strings (like U+00DF for "ss" or "Ss" etc.). This is not necessary (that is, redundant) for the above matching method but results in the same closure sets regardless of whether the original set contained the code point or a string.

Stable:
ICU 2.4
USET_ADD_CASE_MAPPINGS 

Enable case insensitive matching.

E.g., "[ab]" with this flag will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will match all except 'a', 'A', 'b', and 'B'. This adds the lower-, title-, and uppercase mappings as well as the case folding of each existing element in the set.

Stable:
ICU 3.2
UTEXT_PROVIDER_LENGTH_IS_EXPENSIVE 

It is potentially time consuming for the provider to determine the length of the text.

Stable:
ICU 3.4
UTEXT_PROVIDER_STABLE_CHUNKS 

Text chunks remain valid and usable until the text object is modified or deleted, not just until the next time the access() function is called (which is the default).

Stable:
ICU 3.4
UTEXT_PROVIDER_WRITABLE 

The provider supports modifying the text via the replace() and copy() functions.

See also
Replaceable
Stable:
ICU 3.4
UTEXT_PROVIDER_HAS_META_DATA 

There is meta data associated with the text.

See also
Replaceable::hasMetaData()
Stable:
ICU 3.4
UTEXT_PROVIDER_OWNS_TEXT 

Text provider owns the text storage.

Generally occurs as the result of a deep clone of the UText. When closing the UText, the associated text must also be closed/deleted/freed/ whatever is appropriate.

Stable:
ICU 3.6

Definition at line 275 of file messagepattern.h.

◆ UMessagePatternApostropheMode

Mode for when an apostrophe starts quoted literal text for MessageFormat output.

The default is DOUBLE_OPTIONAL unless overridden via uconfig.h (UCONFIG_MSGPAT_DEFAULT_APOSTROPHE_MODE).

A pair of adjacent apostrophes always results in a single apostrophe in the output, even when the pair is between two single, text-quoting apostrophes.

The following table shows examples of desired MessageFormat.format() output with the pattern strings that yield that output.

Desired output DOUBLE_OPTIONAL DOUBLE_REQUIRED
I see {many} I see '{many}' (same)
I said {'Wow!'} I said '{''Wow!''}' (same)
I don't know I don't know OR
I don''t know
I don''t know
Stable:
ICU 4.8
See also
UCONFIG_MSGPAT_DEFAULT_APOSTROPHE_MODE
Enumerator
UMSGPAT_APOS_DOUBLE_OPTIONAL 

A literal apostrophe is represented by either a single or a double apostrophe pattern character.

Within a MessageFormat pattern, a single apostrophe only starts quoted literal text if it immediately precedes a curly brace {}, or a pipe symbol | if inside a choice format, or a pound symbol # if inside a plural format.

This is the default behavior starting with ICU 4.8.

Stable:
ICU 4.8
UMSGPAT_APOS_DOUBLE_REQUIRED 

A literal apostrophe must be represented by a double apostrophe pattern character.

A single apostrophe always starts quoted literal text.

This is the behavior of ICU 4.6 and earlier, and of the JDK.

Stable:
ICU 4.8

Definition at line 70 of file messagepattern.h.

◆ UMessagePatternArgType

Argument type constants.

Returned by Part.getArgType() for ARG_START and ARG_LIMIT parts.

Messages nested inside an argument are each delimited by MSG_START and MSG_LIMIT, with a nesting level one greater than the surrounding message.

Stable:
ICU 4.8
Enumerator
UMSGPAT_ARG_TYPE_NONE 

The argument has no specified type.

Stable:
ICU 4.8
UMSGPAT_ARG_TYPE_SIMPLE 

The argument has a "simple" type which is provided by the ARG_TYPE part.

An ARG_STYLE part might follow that.

Stable:
ICU 4.8
UMSGPAT_ARG_TYPE_CHOICE 

The argument is a ChoiceFormat with one or more ((ARG_INT | ARG_DOUBLE), ARG_SELECTOR, message) tuples.

Stable:
ICU 4.8
UMSGPAT_ARG_TYPE_PLURAL 

The argument is a cardinal-number PluralFormat with an optional ARG_INT or ARG_DOUBLE offset (e.g., offset:1) and one or more (ARG_SELECTOR [explicit-value] message) tuples.

If the selector has an explicit value (e.g., =2), then that value is provided by the ARG_INT or ARG_DOUBLE part preceding the message. Otherwise the message immediately follows the ARG_SELECTOR.

Stable:
ICU 4.8
UMSGPAT_ARG_TYPE_SELECT 

The argument is a SelectFormat with one or more (ARG_SELECTOR, message) pairs.

Stable:
ICU 4.8
UMSGPAT_ARG_TYPE_SELECTORDINAL 

The argument is an ordinal-number PluralFormat with the same style parts sequence and semantics as UMSGPAT_ARG_TYPE_PLURAL.

Stable:
ICU 50

Definition at line 221 of file messagepattern.h.

◆ UMessagePatternPartType

MessagePattern::Part type constants.

Stable:
ICU 4.8
Enumerator
UMSGPAT_PART_TYPE_MSG_START 

Start of a message pattern (main or nested).

The length is 0 for the top-level message and for a choice argument sub-message, otherwise 1 for the '{'. The value indicates the nesting level, starting with 0 for the main message.

There is always a later MSG_LIMIT part.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_MSG_LIMIT 

End of a message pattern (main or nested).

The length is 0 for the top-level message and the last sub-message of a choice argument, otherwise 1 for the '}' or (in a choice argument style) the '|'. The value indicates the nesting level, starting with 0 for the main message.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_SKIP_SYNTAX 

Indicates a substring of the pattern string which is to be skipped when formatting.

For example, an apostrophe that begins or ends quoted text would be indicated with such a part. The value is undefined and currently always 0.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_INSERT_CHAR 

Indicates that a syntax character needs to be inserted for auto-quoting.

The length is 0. The value is the character code of the insertion character. (U+0027=APOSTROPHE)

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_REPLACE_NUMBER 

Indicates a syntactic (non-escaped) # symbol in a plural variant.

When formatting, replace this part's substring with the (value-offset) for the plural argument value. The value is undefined and currently always 0.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_START 

Start of an argument.

The length is 1 for the '{'. The value is the ordinal value of the ArgType. Use getArgType().

This part is followed by either an ARG_NUMBER or ARG_NAME, followed by optional argument sub-parts (see UMessagePatternArgType constants) and finally an ARG_LIMIT part.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_LIMIT 

End of an argument.

The length is 1 for the '}'. The value is the ordinal value of the ArgType. Use getArgType().

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_NUMBER 

The argument number, provided by the value.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_NAME 

The argument name.

The value is undefined and currently always 0.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_TYPE 

The argument type.

The value is undefined and currently always 0.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_STYLE 

The argument style text.

The value is undefined and currently always 0.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_SELECTOR 

A selector substring in a "complex" argument style.

The value is undefined and currently always 0.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_INT 

An integer value, for example the offset or an explicit selector value in a PluralFormat style.

The part value is the integer value.

Stable:
ICU 4.8
UMSGPAT_PART_TYPE_ARG_DOUBLE 

A numeric value, for example the offset or an explicit selector value in a PluralFormat style.

The part value is an index into an internal array of numeric values; use getNumericValue().

Stable:
ICU 4.8

Definition at line 102 of file messagepattern.h.