Class Stemmer
- java.lang.Object
-
- org.apache.lucene.analysis.hunspell.Stemmer
-
final class Stemmer extends java.lang.Object
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word. It conforms to the algorithm in the original hunspell algorithm, including recursive suffix stripping.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static interface
Stemmer.CaseVariationProcessor
(package private) static interface
Stemmer.RootProcessor
-
Field Summary
Fields Modifier and Type Field Description private Dictionary
dictionary
private int
formStep
-
Constructor Summary
Constructors Constructor Description Stemmer(Dictionary dictionary)
Constructs a new Stemmer which will use the provided Dictionary to create its stems.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
applyAffix(char[] strippedWord, int offset, int length, WordContext context, int affix, int previousAffix, int prefixId, int recursionDepth, boolean prefix, Stemmer.RootProcessor processor)
Applies the affix rule to the given word, producing a list of stems if any are foundprivate boolean
callProcessor(char[] word, int offset, int length, Stemmer.RootProcessor processor, IntsRef forms, int i)
private static char[]
capitalizeAfterApostrophe(char[] word, int length)
private char[]
caseFoldLower(char[] word, int length)
folds lowercase variant of word (title cased) to lowerBufferprivate char[]
caseFoldTitle(char[] word, int length)
folds titlecase variant of word to titleBuffer(package private) WordCase
caseOf(char[] word, int length)
returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word(package private) boolean
doStem(char[] word, int offset, int length, WordContext context, Stemmer.RootProcessor processor)
private boolean
isAffixCompatible(int affix, char prevFlag, int recursionDepth, boolean isPrefix, boolean previousWasPrefix, WordContext context)
private boolean
isFlagAppendedByAffix(int affixId, char flag)
private boolean
isRootCompatibleWithContext(WordContext context, int lastAffix, int entryId)
private boolean
needsAnotherAffix(int affix, int previousAffix, boolean isSuffix, int prefixId)
private CharsRef
newStem(CharsRef stem, int morphDataId)
java.util.List<CharsRef>
stem(char[] word, int length)
Find the stem(s) of the provided wordprivate boolean
stem(char[] word, int offset, int length, WordContext context, int previous, char prevFlag, int prefixId, int recursionDepth, boolean doPrefix, boolean previousWasPrefix, Stemmer.RootProcessor processor)
Generates a list of stems for the provided wordjava.util.List<CharsRef>
stem(java.lang.String word)
Find the stem(s) of the provided word.private java.lang.String
stemException(int morphDataId)
private char[]
stripAffix(char[] word, int offset, int length, int affixLen, int affix, boolean isPrefix)
java.util.List<CharsRef>
uniqueStems(char[] word, int length)
Find the unique stem(s) of the provided word(package private) boolean
varyCase(char[] word, int length, WordCase wordCase, Stemmer.CaseVariationProcessor processor)
private boolean
varySharpS(char[] word, int length, Stemmer.CaseVariationProcessor processor)
-
-
-
Field Detail
-
dictionary
private final Dictionary dictionary
-
formStep
private final int formStep
-
-
Constructor Detail
-
Stemmer
public Stemmer(Dictionary dictionary)
Constructs a new Stemmer which will use the provided Dictionary to create its stems.- Parameters:
dictionary
- Dictionary that will be used to create the stems
-
-
Method Detail
-
stem
public java.util.List<CharsRef> stem(java.lang.String word)
Find the stem(s) of the provided word.- Parameters:
word
- Word to find the stems for- Returns:
- List of stems for the word
-
stem
public java.util.List<CharsRef> stem(char[] word, int length)
Find the stem(s) of the provided word- Parameters:
word
- Word to find the stems for- Returns:
- List of stems for the word
-
varyCase
boolean varyCase(char[] word, int length, WordCase wordCase, Stemmer.CaseVariationProcessor processor)
-
caseOf
WordCase caseOf(char[] word, int length)
returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word
-
caseFoldTitle
private char[] caseFoldTitle(char[] word, int length)
folds titlecase variant of word to titleBuffer
-
caseFoldLower
private char[] caseFoldLower(char[] word, int length)
folds lowercase variant of word (title cased) to lowerBuffer
-
capitalizeAfterApostrophe
private static char[] capitalizeAfterApostrophe(char[] word, int length)
-
varySharpS
private boolean varySharpS(char[] word, int length, Stemmer.CaseVariationProcessor processor)
-
doStem
boolean doStem(char[] word, int offset, int length, WordContext context, Stemmer.RootProcessor processor)
-
uniqueStems
public java.util.List<CharsRef> uniqueStems(char[] word, int length)
Find the unique stem(s) of the provided word- Parameters:
word
- Word to find the stems for- Returns:
- List of stems for the word
-
stemException
private java.lang.String stemException(int morphDataId)
-
stem
private boolean stem(char[] word, int offset, int length, WordContext context, int previous, char prevFlag, int prefixId, int recursionDepth, boolean doPrefix, boolean previousWasPrefix, Stemmer.RootProcessor processor)
Generates a list of stems for the provided word- Parameters:
word
- Word to generate the stems forprevious
- previous affix that was removed (so we dont remove same one twice)prevFlag
- Flag from a previous stemming step that need to be cross-checked with any affixes in this recursive stepprefixId
- ID of the most inner removed prefix, so that when removing a suffix, it's also checked against the wordrecursionDepth
- current recursiondepthdoPrefix
- true if we should remove prefixespreviousWasPrefix
- true if the previous removal was a prefix: if we are removing a suffix, and it has no continuation requirements, it's ok. but two prefixes (COMPLEXPREFIXES) or two suffixes must have continuation requirements to recurse.- Returns:
- whether the processing should be continued
-
stripAffix
private char[] stripAffix(char[] word, int offset, int length, int affixLen, int affix, boolean isPrefix)
- Returns:
- null if affix conditions isn't met; a reference to the same char[] if the affix has no strip data and can thus be simply removed, or a new char[] containing the word affix removal
-
isAffixCompatible
private boolean isAffixCompatible(int affix, char prevFlag, int recursionDepth, boolean isPrefix, boolean previousWasPrefix, WordContext context)
-
applyAffix
private boolean applyAffix(char[] strippedWord, int offset, int length, WordContext context, int affix, int previousAffix, int prefixId, int recursionDepth, boolean prefix, Stemmer.RootProcessor processor)
Applies the affix rule to the given word, producing a list of stems if any are found- Parameters:
strippedWord
- Char array containing the word with the affix removed and the strip addedoffset
- where the word actually starts in the arraylength
- the length of the stripped wordaffix
- HunspellAffix representing the affix rule itselfprefixId
- when we already stripped a prefix, we can't simply recurse and check the suffix, unless both are compatible so we must check dictionary form against both to add it as a stem!recursionDepth
- current recursion depthprefix
- true if we are removing a prefix (false if it's a suffix)- Returns:
- whether the processing should be continued
-
isRootCompatibleWithContext
private boolean isRootCompatibleWithContext(WordContext context, int lastAffix, int entryId)
-
callProcessor
private boolean callProcessor(char[] word, int offset, int length, Stemmer.RootProcessor processor, IntsRef forms, int i)
-
needsAnotherAffix
private boolean needsAnotherAffix(int affix, int previousAffix, boolean isSuffix, int prefixId)
-
isFlagAppendedByAffix
private boolean isFlagAppendedByAffix(int affixId, char flag)
-
-