Class LevenshteinAutomata


  • public class LevenshteinAutomata
    extends java.lang.Object
    Class to construct DFAs that match a word within some edit distance.

    Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      (package private) static class  LevenshteinAutomata.ParametricDescription
      A ParametricDescription describes the structure of a Levenshtein DFA for some degree n.
    • Constructor Summary

      Constructors 
      Constructor Description
      LevenshteinAutomata​(int[] word, int alphaMax, boolean withTranspositions)
      Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
      LevenshteinAutomata​(java.lang.String input, boolean withTranspositions)
      Create a new LevenshteinAutomata for some input String.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static int[] codePoints​(java.lang.String input)  
      (package private) int getVector​(int x, int pos, int end)
      Get the characteristic vector X(x, V) where V is substring(pos, end)
      Automaton toAutomaton​(int n)
      Compute a DFA that accepts all strings within an edit distance of n.
      Automaton toAutomaton​(int n, java.lang.String prefix)
      Compute a DFA that accepts all strings within an edit distance of n, matching the specified exact prefix.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • MAXIMUM_SUPPORTED_DISTANCE

        public static final int MAXIMUM_SUPPORTED_DISTANCE
        Maximum edit distance this class can generate an automaton for.
        See Also:
        Constant Field Values
      • word

        final int[] word
      • alphabet

        final int[] alphabet
      • alphaMax

        final int alphaMax
      • rangeLower

        final int[] rangeLower
      • rangeUpper

        final int[] rangeUpper
      • numRanges

        int numRanges
    • Constructor Detail

      • LevenshteinAutomata

        public LevenshteinAutomata​(java.lang.String input,
                                   boolean withTranspositions)
        Create a new LevenshteinAutomata for some input String. Optionally count transpositions as a primitive edit.
      • LevenshteinAutomata

        public LevenshteinAutomata​(int[] word,
                                   int alphaMax,
                                   boolean withTranspositions)
        Expert: specify a custom maximum possible symbol (alphaMax); default is Character.MAX_CODE_POINT.
    • Method Detail

      • codePoints

        private static int[] codePoints​(java.lang.String input)
      • toAutomaton

        public Automaton toAutomaton​(int n)
        Compute a DFA that accepts all strings within an edit distance of n.

        All automata have the following properties:

        • They are deterministic (DFA).
        • There are no transitions to dead states.
        • They are not minimal (some transitions could be combined).
      • toAutomaton

        public Automaton toAutomaton​(int n,
                                     java.lang.String prefix)
        Compute a DFA that accepts all strings within an edit distance of n, matching the specified exact prefix.

        All automata have the following properties:

        • They are deterministic (DFA).
        • There are no transitions to dead states.
        • They are not minimal (some transitions could be combined).
      • getVector

        int getVector​(int x,
                      int pos,
                      int end)
        Get the characteristic vector X(x, V) where V is substring(pos, end)