Package org.apache.lucene.util.fst
Class FST<T>
- java.lang.Object
-
- org.apache.lucene.util.fst.FST<T>
-
- All Implemented Interfaces:
Accountable
public final class FST<T> extends java.lang.Object implements Accountable
Represents an finite state machine (FST), using a compact byte[] format.The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).
See the
package documentation
for some simple examples.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FST.Arc<T>
Represents a single arc.static class
FST.BytesReader
Reads bytes stored in an FST.static class
FST.INPUT_TYPE
Specifies allowed range of each int input label for this FST.
-
Field Summary
Fields Modifier and Type Field Description static byte
ARCS_FOR_BINARY_SEARCH
Value of the arc flags to declare a node with fixed length arcs designed for binary search.(package private) static byte
ARCS_FOR_DIRECT_ADDRESSING
Value of the arc flags to declare a node with fixed length arcs and bit table designed for direct addressing.private static long
BASE_RAM_BYTES_USED
private static int
BIT_ARC_HAS_FINAL_OUTPUT
static int
BIT_ARC_HAS_OUTPUT
This flag is set if the arc has an output.private static int
BIT_FINAL_ARC
(package private) static int
BIT_LAST_ARC
private static int
BIT_STOP_NODE
(package private) static int
BIT_TARGET_NEXT
(package private) BytesStore
bytes
ABytesStore
, used during building, or during reading when the FST is very large (more than 1 GB).private static int
DEFAULT_MAX_BLOCK_BITS
private static float
DIRECT_ADDRESSING_MAX_OVERSIZE_WITH_CREDIT_FACTOR
Maximum oversizing factor allowed for direct addressing compared to binary search when expansion credits allow the oversizing.(package private) T
emptyOutput
static int
END_LABEL
If arc has this label then that arc is final/acceptedprivate static java.lang.String
FILE_FORMAT_NAME
private static long
FINAL_END_NODE
(package private) static int
FIXED_LENGTH_ARC_DEEP_NUM_ARCS
(package private) static int
FIXED_LENGTH_ARC_SHALLOW_DEPTH
(package private) static int
FIXED_LENGTH_ARC_SHALLOW_NUM_ARCS
private FSTStore
fstStore
(package private) FST.INPUT_TYPE
inputType
private static long
NON_FINAL_END_NODE
Outputs<T>
outputs
private long
startNode
private int
version
private static int
VERSION_CURRENT
private static int
VERSION_LITTLE_ENDIAN
private static int
VERSION_START
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description FST(DataInput metaIn, DataInput in, Outputs<T> outputs)
Load a previously saved FST.FST(DataInput metaIn, DataInput in, Outputs<T> outputs, FSTStore fstStore)
Load a previously saved FST; maxBlockBits allows you to control the size of the byte[] pages used to hold the FST bytes.FST(FST.INPUT_TYPE inputType, Outputs<T> outputs, int bytesPageBits)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) long
addNode(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn)
FST.Arc<T>
findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Finds an arc leaving the incoming arc, replacing the arc in place.(package private) void
finish(long newStartNode)
private static boolean
flag(int flags, int bit)
FST.BytesReader
getBytesReader()
Returns aFST.BytesReader
for this FST, positioned at position 0.T
getEmptyOutput()
FST.Arc<T>
getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start nodeprivate static int
getNumPresenceBytes(int labelRange)
Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.(package private) boolean
isExpandedTarget(FST.Arc<T> follow, FST.BytesReader in)
Returns whetherarc
's target points to a node in expanded format (fixed length arcs).long
ramBytesUsed()
Return the memory usage of this object in bytes.static <T> FST<T>
read(java.nio.file.Path path, Outputs<T> outputs)
Reads an automaton from a file.private FST.Arc<T>
readArc(FST.Arc<T> arc, FST.BytesReader in)
Reads an arc.FST.Arc<T>
readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex)
Reads a present direct addressing node arc, with the provided index in the label range.private FST.Arc<T>
readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex)
Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).FST.Arc<T>
readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx)
(package private) static <T> FST.Arc<T>
readEndArc(FST.Arc<T> follow, FST.Arc<T> arc)
FST.Arc<T>
readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in)
FST.Arc<T>
readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.int
readLabel(DataInput in)
Reads one BYTE1/2/4 label from the providedDataInput
.FST.Arc<T>
readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in)
Reads the last arc of a direct addressing node.(package private) FST.Arc<T>
readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
Follows thefollow
arc and reads the last arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.FST.Arc<T>
readNextArc(FST.Arc<T> arc, FST.BytesReader in)
In-place read; returns the arc.(package private) int
readNextArcLabel(FST.Arc<T> arc, FST.BytesReader in)
Peeks at next arc's label; does not alter arc.FST.Arc<T>
readNextRealArc(FST.Arc<T> arc, FST.BytesReader in)
Never returns null, but you should never call this if arc.isLast() is true.private void
readPresenceBytes(FST.Arc<T> arc, FST.BytesReader in)
Reads the presence bits of a direct-addressing node.private long
readUnpackedNodeTarget(FST.BytesReader in)
void
save(java.nio.file.Path path)
Writes an automaton to a file.void
save(DataOutput metaOut, DataOutput out)
private void
seekToNextNode(FST.BytesReader in)
(package private) void
setEmptyOutput(T v)
private boolean
shouldExpandNodeWithDirectAddressing(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, int numBytesPerArc, int maxBytesPerArcWithoutLabel, int labelRange)
Returns whether the given node should be expanded with direct addressing instead of binary search.private boolean
shouldExpandNodeWithFixedLengthArcs(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> node)
Returns whether the given node should be expanded with fixed length arcs.static <T> boolean
targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcsjava.lang.String
toString()
private void
writeLabel(DataOutput out, int v)
private void
writeNodeForBinarySearch(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, long startAddress, int maxBytesPerArc)
private void
writeNodeForDirectAddressing(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, long startAddress, int maxBytesPerArcWithoutLabel, int labelRange)
private void
writePresenceBits(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, long dest, int numPresenceBytes)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
BASE_RAM_BYTES_USED
private static final long BASE_RAM_BYTES_USED
-
BIT_FINAL_ARC
private static final int BIT_FINAL_ARC
- See Also:
- Constant Field Values
-
BIT_LAST_ARC
static final int BIT_LAST_ARC
- See Also:
- Constant Field Values
-
BIT_TARGET_NEXT
static final int BIT_TARGET_NEXT
- See Also:
- Constant Field Values
-
BIT_STOP_NODE
private static final int BIT_STOP_NODE
- See Also:
- Constant Field Values
-
BIT_ARC_HAS_OUTPUT
public static final int BIT_ARC_HAS_OUTPUT
This flag is set if the arc has an output.- See Also:
- Constant Field Values
-
BIT_ARC_HAS_FINAL_OUTPUT
private static final int BIT_ARC_HAS_FINAL_OUTPUT
- See Also:
- Constant Field Values
-
ARCS_FOR_BINARY_SEARCH
public static final byte ARCS_FOR_BINARY_SEARCH
Value of the arc flags to declare a node with fixed length arcs designed for binary search.- See Also:
- Constant Field Values
-
ARCS_FOR_DIRECT_ADDRESSING
static final byte ARCS_FOR_DIRECT_ADDRESSING
Value of the arc flags to declare a node with fixed length arcs and bit table designed for direct addressing.- See Also:
- Constant Field Values
-
FIXED_LENGTH_ARC_SHALLOW_DEPTH
static final int FIXED_LENGTH_ARC_SHALLOW_DEPTH
-
FIXED_LENGTH_ARC_SHALLOW_NUM_ARCS
static final int FIXED_LENGTH_ARC_SHALLOW_NUM_ARCS
-
FIXED_LENGTH_ARC_DEEP_NUM_ARCS
static final int FIXED_LENGTH_ARC_DEEP_NUM_ARCS
-
DIRECT_ADDRESSING_MAX_OVERSIZE_WITH_CREDIT_FACTOR
private static final float DIRECT_ADDRESSING_MAX_OVERSIZE_WITH_CREDIT_FACTOR
Maximum oversizing factor allowed for direct addressing compared to binary search when expansion credits allow the oversizing. This factor prevents expansions that are obviously too costly even if there are sufficient credits.
-
FILE_FORMAT_NAME
private static final java.lang.String FILE_FORMAT_NAME
- See Also:
- Constant Field Values
-
VERSION_START
private static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_LITTLE_ENDIAN
private static final int VERSION_LITTLE_ENDIAN
- See Also:
- Constant Field Values
-
VERSION_CURRENT
private static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
FINAL_END_NODE
private static final long FINAL_END_NODE
- See Also:
- Constant Field Values
-
NON_FINAL_END_NODE
private static final long NON_FINAL_END_NODE
- See Also:
- Constant Field Values
-
END_LABEL
public static final int END_LABEL
If arc has this label then that arc is final/accepted- See Also:
- Constant Field Values
-
inputType
final FST.INPUT_TYPE inputType
-
emptyOutput
T emptyOutput
-
bytes
final BytesStore bytes
ABytesStore
, used during building, or during reading when the FST is very large (more than 1 GB). If the FST is less than 1 GB then bytesArray is set instead.
-
fstStore
private final FSTStore fstStore
-
startNode
private long startNode
-
version
private final int version
-
DEFAULT_MAX_BLOCK_BITS
private static final int DEFAULT_MAX_BLOCK_BITS
-
-
Method Detail
-
flag
private static boolean flag(int flags, int bit)
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
finish
void finish(long newStartNode) throws java.io.IOException
- Throws:
java.io.IOException
-
getEmptyOutput
public T getEmptyOutput()
-
setEmptyOutput
void setEmptyOutput(T v)
-
save
public void save(DataOutput metaOut, DataOutput out) throws java.io.IOException
- Throws:
java.io.IOException
-
save
public void save(java.nio.file.Path path) throws java.io.IOException
Writes an automaton to a file.- Throws:
java.io.IOException
-
read
public static <T> FST<T> read(java.nio.file.Path path, Outputs<T> outputs) throws java.io.IOException
Reads an automaton from a file.- Throws:
java.io.IOException
-
writeLabel
private void writeLabel(DataOutput out, int v) throws java.io.IOException
- Throws:
java.io.IOException
-
readLabel
public int readLabel(DataInput in) throws java.io.IOException
Reads one BYTE1/2/4 label from the providedDataInput
.- Throws:
java.io.IOException
-
targetHasArcs
public static <T> boolean targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcs
-
addNode
long addNode(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn) throws java.io.IOException
- Throws:
java.io.IOException
-
shouldExpandNodeWithFixedLengthArcs
private boolean shouldExpandNodeWithFixedLengthArcs(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> node)
Returns whether the given node should be expanded with fixed length arcs. Nodes will be expanded depending on their depth (distance from the root node) and their number of arcs.Nodes with fixed length arcs use more space, because they encode all arcs with a fixed number of bytes, but they allow either binary search or direct addressing on the arcs (instead of linear scan) on lookup by arc label.
-
shouldExpandNodeWithDirectAddressing
private boolean shouldExpandNodeWithDirectAddressing(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, int numBytesPerArc, int maxBytesPerArcWithoutLabel, int labelRange)
Returns whether the given node should be expanded with direct addressing instead of binary search.Prefer direct addressing for performance if it does not oversize binary search byte size too much, so that the arcs can be directly addressed by label.
-
writeNodeForBinarySearch
private void writeNodeForBinarySearch(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, long startAddress, int maxBytesPerArc)
-
writeNodeForDirectAddressing
private void writeNodeForDirectAddressing(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, long startAddress, int maxBytesPerArcWithoutLabel, int labelRange)
-
writePresenceBits
private void writePresenceBits(FSTCompiler<T> fstCompiler, FSTCompiler.UnCompiledNode<T> nodeIn, long dest, int numPresenceBytes)
-
getNumPresenceBytes
private static int getNumPresenceBytes(int labelRange)
Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.
-
readPresenceBytes
private void readPresenceBytes(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads the presence bits of a direct-addressing node. Actually we don't read them here, we just keep the pointer to the bit-table start and we skip them.- Throws:
java.io.IOException
-
getFirstArc
public FST.Arc<T> getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
-
readLastTargetArc
FST.Arc<T> readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Follows thefollow
arc and reads the last arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
java.io.IOException
-
readUnpackedNodeTarget
private long readUnpackedNodeTarget(FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
readFirstTargetArc
public FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Follow thefollow
arc and read the first arc of its target; this changes the providedarc
(2nd arg) in-place and returns it.- Returns:
- Returns the second argument (
arc
). - Throws:
java.io.IOException
-
readFirstRealTargetArc
public FST.Arc<T> readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
isExpandedTarget
boolean isExpandedTarget(FST.Arc<T> follow, FST.BytesReader in) throws java.io.IOException
Returns whetherarc
's target points to a node in expanded format (fixed length arcs).- Throws:
java.io.IOException
-
readNextArc
public FST.Arc<T> readNextArc(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
In-place read; returns the arc.- Throws:
java.io.IOException
-
readNextArcLabel
int readNextArcLabel(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!- Throws:
java.io.IOException
-
readArcByIndex
public FST.Arc<T> readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx) throws java.io.IOException
- Throws:
java.io.IOException
-
readArcByDirectAddressing
public FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws java.io.IOException
Reads a present direct addressing node arc, with the provided index in the label range.- Parameters:
rangeIndex
- The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.- Throws:
java.io.IOException
-
readArcByDirectAddressing
private FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex) throws java.io.IOException
Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).- Throws:
java.io.IOException
-
readLastArcByDirectAddressing
public FST.Arc<T> readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads the last arc of a direct addressing node. This method is equivalent to callreadArcByDirectAddressing(Arc, BytesReader, int)
withrangeIndex
equal toarc.numArcs() - 1
, but it is faster.- Throws:
java.io.IOException
-
readNextRealArc
public FST.Arc<T> readNextRealArc(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Never returns null, but you should never call this if arc.isLast() is true.- Throws:
java.io.IOException
-
readArc
private FST.Arc<T> readArc(FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Reads an arc.
Precondition: The arc flags byte has already been read and set; the given BytesReader is positioned just after the arc flags byte.- Throws:
java.io.IOException
-
findTargetArc
public FST.Arc<T> findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws java.io.IOException
Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.- Throws:
java.io.IOException
-
seekToNextNode
private void seekToNextNode(FST.BytesReader in) throws java.io.IOException
- Throws:
java.io.IOException
-
getBytesReader
public FST.BytesReader getBytesReader()
Returns aFST.BytesReader
for this FST, positioned at position 0.
-
-