Class Hunspell


  • public class Hunspell
    extends java.lang.Object
    A spell checker based on Hunspell dictionaries. This class can be used in place of native Hunspell for many languages for spell-checking and suggesting purposes. Note that not all languages are supported yet. For example:
    • Hungarian (as it doesn't only rely on dictionaries, but has some logic directly in the source code
    • Languages with Unicode characters outside of the Basic Multilingual Plane
    • PHONE affix file option for suggestions

    The objects of this class are thread-safe.

    • Constructor Detail

      • Hunspell

        public Hunspell​(Dictionary dictionary)
      • Hunspell

        public Hunspell​(Dictionary dictionary,
                        TimeoutPolicy policy,
                        java.lang.Runnable checkCanceled)
        Parameters:
        policy - a strategy determining what to do when API calls take too much time
        checkCanceled - an object that's periodically called, allowing to interrupt spell-checking or suggestion generation by throwing an exception
    • Method Detail

      • spell

        public boolean spell​(java.lang.String word)
        Returns:
        whether the given word's spelling is considered correct according to Hunspell rules
      • spellClean

        private boolean spellClean​(java.lang.String word)
      • spellWithTrailingDots

        private boolean spellWithTrailingDots​(java.lang.String word)
      • checkWord

        boolean checkWord​(java.lang.String word)
      • checkSimpleWord

        java.lang.Boolean checkSimpleWord​(char[] wordChars,
                                          int length,
                                          WordCase originalCase)
      • checkWord

        private boolean checkWord​(char[] wordChars,
                                  int length,
                                  WordCase originalCase)
      • checkCompounds

        private boolean checkCompounds​(char[] wordChars,
                                       int length,
                                       WordCase originalCase)
      • acceptCase

        private boolean acceptCase​(WordCase originalCase,
                                   int entryId,
                                   CharsRef root)
      • containsSharpS

        private boolean containsSharpS​(char[] word,
                                       int offset,
                                       int length)
      • acceptsStem

        boolean acceptsStem​(int formID)
      • hasForceUCaseProblem

        private boolean hasForceUCaseProblem​(Root<?> root,
                                             WordCase originalCase,
                                             char[] wordChars)
      • getRoots

        public java.util.List<java.lang.String> getRoots​(java.lang.String word)
        Find all roots that could result in the given word after case conversion and adding affixes. This corresponds to the original hunspell -s (stemming) functionality.

        Some affix rules are relaxed in this stemming process: e.g. explicitly forbidden words are still returned. Some of the returned roots may be synthetic and not directly occur in the *.dic file (but differ from some existing entries in case). No roots are returned for compound words.

        The returned roots may be used to retrieve morphological data via Dictionary.lookupEntries(java.lang.String).

      • mayBreakIntoCompounds

        private boolean mayBreakIntoCompounds​(char[] chars,
                                              int offset,
                                              int length,
                                              int breakPos)
      • checkCompoundRules

        private boolean checkCompoundRules​(char[] wordChars,
                                           int offset,
                                           int length,
                                           java.util.List<IntsRef> words)
      • checkLastCompoundPart

        private boolean checkLastCompoundPart​(char[] wordChars,
                                              int start,
                                              int length,
                                              java.util.List<IntsRef> words)
      • isNumber

        private static boolean isNumber​(java.lang.String s)
      • isDigit

        private static boolean isDigit​(char c)
      • tryBreaks

        private boolean tryBreaks​(java.lang.String word)
      • hasTooManyBreakOccurrences

        private boolean hasTooManyBreakOccurrences​(java.lang.String word)
      • canBeBrokenAt

        private boolean canBeBrokenAt​(java.lang.String word,
                                      java.lang.String breakStr,
                                      int breakPos)
      • suggest

        public java.util.List<java.lang.String> suggest​(java.lang.String word,
                                                        long timeLimitMs)
                                                 throws SuggestionTimeoutException
        Parameters:
        word - the misspelled word to calculate suggestions for
        timeLimitMs - the duration limit in milliseconds, after which the associated TimeoutPolicy's effects (exception or partial result) may kick in
        Throws:
        SuggestionTimeoutException - if the computation takes too long and TimeoutPolicy.THROW_EXCEPTION was specified in the constructor
      • doSuggest

        private void doSuggest​(java.lang.String word,
                               WordCase wordCase,
                               java.util.LinkedHashSet<Suggestion> suggestions,
                               java.lang.Runnable checkCanceled)
      • checkTimeLimit

        private java.lang.Runnable checkTimeLimit​(java.lang.String word,
                                                  java.util.Set<Suggestion> suggestions,
                                                  long timeLimitMs)
      • postprocess

        private java.util.List<java.lang.String> postprocess​(java.util.Collection<Suggestion> suggestions)
      • modifyChunksBetweenDashes

        private java.util.List<java.lang.String> modifyChunksBetweenDashes​(java.lang.String word)