Class WordSegmenter


  • class WordSegmenter
    extends java.lang.Object
    Segment a sentence of Chinese text into words.
    • Constructor Detail

      • WordSegmenter

        WordSegmenter()
    • Method Detail

      • segmentSentence

        public java.util.List<SegToken> segmentSentence​(java.lang.String sentence,
                                                        int startOffset)
        Segment a sentence into words with HHMMSegmenter
        Parameters:
        sentence - input sentence
        startOffset - start offset of sentence
        Returns:
        List of SegToken
      • convertSegToken

        public SegToken convertSegToken​(SegToken st,
                                        java.lang.String sentence,
                                        int sentenceStartOffset)
        Process a SegToken so that it is ready for indexing.

        This method calculates offsets and normalizes the token with SegTokenFilter.

        Parameters:
        st - input SegToken
        sentence - associated Sentence
        sentenceStartOffset - offset into sentence
        Returns:
        Lucene SegToken