Class Hunspell
- java.lang.Object
-
- org.apache.lucene.analysis.hunspell.Hunspell
-
public class Hunspell extends java.lang.Object
A spell checker based on Hunspell dictionaries. This class can be used in place of native Hunspell for many languages for spell-checking and suggesting purposes. Note that not all languages are supported yet. For example:- Hungarian (as it doesn't only rely on dictionaries, but has some logic directly in the source code
- Languages with Unicode characters outside of the Basic Multilingual Plane
- PHONE affix file option for suggestions
The objects of this class are thread-safe.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
Hunspell.CompoundPart
-
Field Summary
Fields Modifier and Type Field Description (package private) java.lang.Runnable
checkCanceled
(package private) Dictionary
dictionary
private TimeoutPolicy
policy
(package private) Stemmer
stemmer
(package private) static long
SUGGEST_TIME_LIMIT
-
Constructor Summary
Constructors Constructor Description Hunspell(Dictionary dictionary)
Hunspell(Dictionary dictionary, TimeoutPolicy policy, java.lang.Runnable checkCanceled)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
acceptCase(WordCase originalCase, int entryId, CharsRef root)
(package private) boolean
acceptsStem(int formID)
private boolean
canBeBrokenAt(java.lang.String word, java.lang.String breakStr, int breakPos)
private boolean
checkCompoundPatternReplacements(CharsRef word, int pos, WordCase originalCase, Hunspell.CompoundPart prev)
private boolean
checkCompoundRules(char[] wordChars, int offset, int length, java.util.List<IntsRef> words)
private boolean
checkCompounds(char[] wordChars, int length, WordCase originalCase)
private boolean
checkCompounds(CharsRef word, WordCase originalCase, Hunspell.CompoundPart prev)
private boolean
checkCompoundsAfter(WordCase originalCase, Hunspell.CompoundPart prev)
private boolean
checkLastCompoundPart(char[] wordChars, int start, int length, java.util.List<IntsRef> words)
(package private) java.lang.Boolean
checkSimpleWord(char[] wordChars, int length, WordCase originalCase)
private java.lang.Runnable
checkTimeLimit(java.lang.String word, java.util.Set<Suggestion> suggestions, long timeLimitMs)
private boolean
checkWord(char[] wordChars, int length, WordCase originalCase)
(package private) boolean
checkWord(java.lang.String word)
private boolean
containsSharpS(char[] word, int offset, int length)
private void
doSuggest(java.lang.String word, WordCase wordCase, java.util.LinkedHashSet<Suggestion> suggestions, java.lang.Runnable checkCanceled)
private Root<CharsRef>
findStem(char[] wordChars, int offset, int length, WordCase originalCase, WordContext context)
java.util.List<java.lang.String>
getRoots(java.lang.String word)
Find all roots that could result in the given word after case conversion and adding affixes.private boolean
hasForceUCaseProblem(Root<?> root, WordCase originalCase, char[] wordChars)
private boolean
hasTooManyBreakOccurrences(java.lang.String word)
private static boolean
isDigit(char c)
private static boolean
isNumber(java.lang.String s)
private boolean
mayBreakIntoCompounds(char[] chars, int offset, int length, int breakPos)
private java.util.List<java.lang.String>
modifyChunksBetweenDashes(java.lang.String word)
private java.util.List<java.lang.String>
postprocess(java.util.Collection<Suggestion> suggestions)
boolean
spell(java.lang.String word)
private boolean
spellClean(java.lang.String word)
private boolean
spellWithTrailingDots(java.lang.String word)
java.util.List<java.lang.String>
suggest(java.lang.String word)
java.util.List<java.lang.String>
suggest(java.lang.String word, long timeLimitMs)
private boolean
tryBreaks(java.lang.String word)
-
-
-
Field Detail
-
SUGGEST_TIME_LIMIT
static final long SUGGEST_TIME_LIMIT
- See Also:
- Constant Field Values
-
dictionary
final Dictionary dictionary
-
stemmer
final Stemmer stemmer
-
policy
private final TimeoutPolicy policy
-
checkCanceled
final java.lang.Runnable checkCanceled
-
-
Constructor Detail
-
Hunspell
public Hunspell(Dictionary dictionary)
-
Hunspell
public Hunspell(Dictionary dictionary, TimeoutPolicy policy, java.lang.Runnable checkCanceled)
- Parameters:
policy
- a strategy determining what to do when API calls take too much timecheckCanceled
- an object that's periodically called, allowing to interrupt spell-checking or suggestion generation by throwing an exception
-
-
Method Detail
-
spell
public boolean spell(java.lang.String word)
- Returns:
- whether the given word's spelling is considered correct according to Hunspell rules
-
spellClean
private boolean spellClean(java.lang.String word)
-
spellWithTrailingDots
private boolean spellWithTrailingDots(java.lang.String word)
-
checkWord
boolean checkWord(java.lang.String word)
-
checkSimpleWord
java.lang.Boolean checkSimpleWord(char[] wordChars, int length, WordCase originalCase)
-
checkWord
private boolean checkWord(char[] wordChars, int length, WordCase originalCase)
-
checkCompounds
private boolean checkCompounds(char[] wordChars, int length, WordCase originalCase)
-
findStem
private Root<CharsRef> findStem(char[] wordChars, int offset, int length, WordCase originalCase, WordContext context)
-
containsSharpS
private boolean containsSharpS(char[] word, int offset, int length)
-
acceptsStem
boolean acceptsStem(int formID)
-
checkCompounds
private boolean checkCompounds(CharsRef word, WordCase originalCase, Hunspell.CompoundPart prev)
-
checkCompoundPatternReplacements
private boolean checkCompoundPatternReplacements(CharsRef word, int pos, WordCase originalCase, Hunspell.CompoundPart prev)
-
checkCompoundsAfter
private boolean checkCompoundsAfter(WordCase originalCase, Hunspell.CompoundPart prev)
-
hasForceUCaseProblem
private boolean hasForceUCaseProblem(Root<?> root, WordCase originalCase, char[] wordChars)
-
getRoots
public java.util.List<java.lang.String> getRoots(java.lang.String word)
Find all roots that could result in the given word after case conversion and adding affixes. This corresponds to the originalhunspell -s
(stemming) functionality.Some affix rules are relaxed in this stemming process: e.g. explicitly forbidden words are still returned. Some of the returned roots may be synthetic and not directly occur in the *.dic file (but differ from some existing entries in case). No roots are returned for compound words.
The returned roots may be used to retrieve morphological data via
Dictionary.lookupEntries(java.lang.String)
.
-
mayBreakIntoCompounds
private boolean mayBreakIntoCompounds(char[] chars, int offset, int length, int breakPos)
-
checkCompoundRules
private boolean checkCompoundRules(char[] wordChars, int offset, int length, java.util.List<IntsRef> words)
-
checkLastCompoundPart
private boolean checkLastCompoundPart(char[] wordChars, int start, int length, java.util.List<IntsRef> words)
-
isNumber
private static boolean isNumber(java.lang.String s)
-
isDigit
private static boolean isDigit(char c)
-
tryBreaks
private boolean tryBreaks(java.lang.String word)
-
hasTooManyBreakOccurrences
private boolean hasTooManyBreakOccurrences(java.lang.String word)
-
canBeBrokenAt
private boolean canBeBrokenAt(java.lang.String word, java.lang.String breakStr, int breakPos)
-
suggest
public java.util.List<java.lang.String> suggest(java.lang.String word) throws SuggestionTimeoutException
- Returns:
- suggestions for the given misspelled word
- Throws:
SuggestionTimeoutException
- if the computation takes too long andTimeoutPolicy.THROW_EXCEPTION
was specified in the constructor
-
suggest
public java.util.List<java.lang.String> suggest(java.lang.String word, long timeLimitMs) throws SuggestionTimeoutException
- Parameters:
word
- the misspelled word to calculate suggestions fortimeLimitMs
- the duration limit in milliseconds, after which the associatedTimeoutPolicy
's effects (exception or partial result) may kick in- Throws:
SuggestionTimeoutException
- if the computation takes too long andTimeoutPolicy.THROW_EXCEPTION
was specified in the constructor
-
doSuggest
private void doSuggest(java.lang.String word, WordCase wordCase, java.util.LinkedHashSet<Suggestion> suggestions, java.lang.Runnable checkCanceled)
-
checkTimeLimit
private java.lang.Runnable checkTimeLimit(java.lang.String word, java.util.Set<Suggestion> suggestions, long timeLimitMs)
-
postprocess
private java.util.List<java.lang.String> postprocess(java.util.Collection<Suggestion> suggestions)
-
modifyChunksBetweenDashes
private java.util.List<java.lang.String> modifyChunksBetweenDashes(java.lang.String word)
-
-