Class HMMChineseTokenizerFactory


  • public final class HMMChineseTokenizerFactory
    extends TokenizerFactory
    Factory for HMMChineseTokenizer

    Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"

    Since:
    4.10.0
    • Constructor Detail

      • HMMChineseTokenizerFactory

        public HMMChineseTokenizerFactory​(java.util.Map<java.lang.String,​java.lang.String> args)
        Creates a new HMMChineseTokenizerFactory
      • HMMChineseTokenizerFactory

        public HMMChineseTokenizerFactory()
        Default ctor for compatibility with SPI