Class FSTCompletionLookup
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.fst.FSTCompletionLookup
-
- All Implemented Interfaces:
Accountable
public class FSTCompletionLookup extends Lookup
An adapter fromLookup
API toFSTCompletion
.This adapter differs from
FSTCompletion
in that it attempts to discretize any "weights" as passed from inInputIterator.weight()
to match the number of buckets. For the rationale for bucketing, seeFSTCompletion
.Note:Discretization requires an additional sorting pass.
The range of weights for bucketing/ discretization is determined by sorting the input by weight and then dividing into equal ranges. Then, scores within each range are assigned to that bucket.
Note that this means that even large differences in weights may be lost during automaton construction, but the overall distinction between "classes" of weights will be preserved regardless of the distribution of weights.
For fine-grained control over which weights are assigned to which buckets, use
FSTCompletion
directly orTSTLookup
, for example.- See Also:
FSTCompletion
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
-
Field Summary
Fields Modifier and Type Field Description private int
buckets
private long
count
Number of entries the lookup was built withprivate boolean
exactMatchFirst
private FSTCompletion
higherWeightsCompletion
Automaton used for completions with higher weights reordering.private static int
INVALID_BUCKETS_COUNT
An invalid bucket count if we're creating an object of this class from an existing FST.private FSTCompletion
normalCompletion
Automaton used for normal completions.private static int
sharedTailLength
Shared tail length for conflating in the created automaton.private Directory
tempDir
private java.lang.String
tempFileNamePrefix
-
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description FSTCompletionLookup()
This constructor should only be used to read a previously saved suggester.FSTCompletionLookup(Directory tempDir, java.lang.String tempFileNamePrefix)
This constructor prepares for creating a suggested FST using thebuild(InputIterator)
method.FSTCompletionLookup(Directory tempDir, java.lang.String tempFileNamePrefix, int buckets, boolean exactMatchFirst)
This constructor prepares for creating a suggested FST using thebuild(InputIterator)
method.FSTCompletionLookup(Directory tempDir, java.lang.String tempFileNamePrefix, FSTCompletion completion, boolean exactMatchFirst)
This constructor takes a pre-built automaton.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
build(InputIterator iterator)
Builds up a new internalLookup
representation based on the givenInputIterator
.private static int
encodeWeight(long value)
weight -> costjava.lang.Object
get(java.lang.CharSequence key)
Returns the bucket (weight) as a Long for the provided key if it exists, otherwise null if it does not.java.util.Collection<Accountable>
getChildResources()
Returns nested resources of this class.long
getCount()
Get the number of entries the lookup was built withboolean
load(DataInput input)
Discard current lookup data and load it from a previously saved copy.java.util.List<Lookup.LookupResult>
lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean higherWeightsFirst, int num)
Look up a key and return possible completion for this key.long
ramBytesUsed()
Return the memory usage of this object in bytes.boolean
store(DataOutput output)
Persist the constructed lookup data to a directory.
-
-
-
Field Detail
-
INVALID_BUCKETS_COUNT
private static int INVALID_BUCKETS_COUNT
An invalid bucket count if we're creating an object of this class from an existing FST.
-
sharedTailLength
private static final int sharedTailLength
Shared tail length for conflating in the created automaton. Setting this to larger values (Integer.MAX_VALUE
) will create smaller (or minimal) automata at the cost of RAM for keeping nodes hash in theFST
.Empirical pick.
- See Also:
- Constant Field Values
-
tempDir
private final Directory tempDir
-
tempFileNamePrefix
private final java.lang.String tempFileNamePrefix
-
buckets
private int buckets
-
exactMatchFirst
private boolean exactMatchFirst
-
higherWeightsCompletion
private FSTCompletion higherWeightsCompletion
Automaton used for completions with higher weights reordering.
-
normalCompletion
private FSTCompletion normalCompletion
Automaton used for normal completions.
-
count
private volatile long count
Number of entries the lookup was built with
-
-
Constructor Detail
-
FSTCompletionLookup
public FSTCompletionLookup()
This constructor should only be used to read a previously saved suggester.
-
FSTCompletionLookup
public FSTCompletionLookup(Directory tempDir, java.lang.String tempFileNamePrefix)
This constructor prepares for creating a suggested FST using thebuild(InputIterator)
method. The number of weight discretization buckets is set toFSTCompletion.DEFAULT_BUCKETS
and exact matches are promoted to the top of the suggestions list.
-
FSTCompletionLookup
public FSTCompletionLookup(Directory tempDir, java.lang.String tempFileNamePrefix, int buckets, boolean exactMatchFirst)
This constructor prepares for creating a suggested FST using thebuild(InputIterator)
method.- Parameters:
buckets
- The number of weight discretization buckets (seeFSTCompletion
for details).exactMatchFirst
- Iftrue
exact matches are promoted to the top of the suggestions list. Otherwise they appear in the order of discretized weight and alphabetical within the bucket.
-
FSTCompletionLookup
public FSTCompletionLookup(Directory tempDir, java.lang.String tempFileNamePrefix, FSTCompletion completion, boolean exactMatchFirst)
This constructor takes a pre-built automaton.- Parameters:
completion
- An instance ofFSTCompletion
.exactMatchFirst
- Iftrue
exact matches are promoted to the top of the suggestions list. Otherwise they appear in the order of discretized weight and alphabetical within the bucket.
-
-
Method Detail
-
build
public void build(InputIterator iterator) throws java.io.IOException
Description copied from class:Lookup
Builds up a new internalLookup
representation based on the givenInputIterator
. The implementation might re-sort the data internally.
-
encodeWeight
private static int encodeWeight(long value)
weight -> cost
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean higherWeightsFirst, int num)
Description copied from class:Lookup
Look up a key and return possible completion for this key.- Specified by:
lookup
in classLookup
- Parameters:
key
- lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.contexts
- contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a matchhigherWeightsFirst
- return only more popular resultsnum
- maximum number of results to return- Returns:
- a list of possible completions, with their relative weight (e.g. popularity)
-
get
public java.lang.Object get(java.lang.CharSequence key)
Returns the bucket (weight) as a Long for the provided key if it exists, otherwise null if it does not.
-
store
public boolean store(DataOutput output) throws java.io.IOException
Description copied from class:Lookup
Persist the constructed lookup data to a directory. Optional operation.- Specified by:
store
in classLookup
- Parameters:
output
-DataOutput
to write the data to.- Returns:
- true if successful, false if unsuccessful or not supported.
- Throws:
java.io.IOException
- when fatal IO error occurs.
-
load
public boolean load(DataInput input) throws java.io.IOException
Description copied from class:Lookup
Discard current lookup data and load it from a previously saved copy. Optional operation.
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- See Also:
Accountables
-
-