Class PersianStemmer


  • public class PersianStemmer
    extends java.lang.Object
    Stemmer for Persian.

    Stemming is done in-place for efficiency, operating on a termbuffer.

    Stemming is defined as:

    • Removal of attached definite article, conjunction, and prepositions.
    • Stemming of common suffixes.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static char ALEF  
      private static char HEH  
      private static char NOON  
      private static char REH  
      private static char[][] suffixes  
      private static char TEH  
      private static char YEH  
      private static char ZWNJ  
    • Constructor Summary

      Constructors 
      Constructor Description
      PersianStemmer()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private boolean endsWithCheckLength​(char[] s, int len, char[] suffix)
      Returns true if the suffix matches and can be stemmed
      int stem​(char[] s, int len)
      Stem an input buffer of Persian text.
      private int stemSuffix​(char[] s, int len)
      Stem suffix(es) off a Persian word.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • PersianStemmer

        public PersianStemmer()
    • Method Detail

      • stem

        public int stem​(char[] s,
                        int len)
        Stem an input buffer of Persian text.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        length of input buffer after normalization
      • stemSuffix

        private int stemSuffix​(char[] s,
                               int len)
        Stem suffix(es) off a Persian word.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        new length of input buffer after stemming
      • endsWithCheckLength

        private boolean endsWithCheckLength​(char[] s,
                                            int len,
                                            char[] suffix)
        Returns true if the suffix matches and can be stemmed
        Parameters:
        s - input buffer
        len - length of input buffer
        suffix - suffix to check
        Returns:
        true if the suffix matches and can be stemmed