Class KoreanTokenizerFactory

  • All Implemented Interfaces:
    ResourceLoaderAware

    public class KoreanTokenizerFactory
    extends TokenizerFactory
    implements ResourceLoaderAware
    Factory for KoreanTokenizer.
     <fieldType name="text_ko" class="solr.TextField">
       <analyzer>
         <tokenizer class="solr.KoreanTokenizerFactory"
                    decompoundMode="discard"
                    userDictionary="user.txt"
                    userDictionaryEncoding="UTF-8"
                    outputUnknownUnigrams="false"
                    discardPunctuation="true"
         />
      </analyzer>
     </fieldType>
     

    Supports the following attributes:

    • userDictionary: User dictionary path.
    • userDictionaryEncoding: User dictionary encoding.
    • decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See KoreanTokenizer.DecompoundMode
    • outputUnknownUnigrams: If true outputs unigrams for unknown words.
    • discardPunctuation: true if punctuation tokens should be dropped from the output.
    Since:
    7.4.0
    • Field Detail

      • USER_DICT_ENCODING

        private static final java.lang.String USER_DICT_ENCODING
        See Also:
        Constant Field Values
      • OUTPUT_UNKNOWN_UNIGRAMS

        private static final java.lang.String OUTPUT_UNKNOWN_UNIGRAMS
        See Also:
        Constant Field Values
      • DISCARD_PUNCTUATION

        private static final java.lang.String DISCARD_PUNCTUATION
        See Also:
        Constant Field Values
      • userDictionaryPath

        private final java.lang.String userDictionaryPath
      • userDictionaryEncoding

        private final java.lang.String userDictionaryEncoding
      • outputUnknownUnigrams

        private final boolean outputUnknownUnigrams
      • discardPunctuation

        private final boolean discardPunctuation
    • Constructor Detail

      • KoreanTokenizerFactory

        public KoreanTokenizerFactory​(java.util.Map<java.lang.String,​java.lang.String> args)
        Creates a new KoreanTokenizerFactory
      • KoreanTokenizerFactory

        public KoreanTokenizerFactory()
        Default ctor for compatibility with SPI