Package org.apache.lucene.util.hnsw
Class HnswGraphBuilder
- java.lang.Object
-
- org.apache.lucene.util.hnsw.HnswGraphBuilder
-
public final class HnswGraphBuilder extends java.lang.Object
Builder for HNSW graph. SeeHnswGraph
for a gloss on the algorithm and the meaning of the hyperparameters.
-
-
Field Summary
Fields Modifier and Type Field Description private int
beamWidth
private BoundsChecker
bound
private RandomAccessVectorValues
buildVectors
private static long
DEFAULT_RAND_SEED
Default random seed for level generation *private HnswGraphSearcher
graphSearcher
(package private) OnHeapHnswGraph
hnsw
static java.lang.String
HNSW_COMPONENT
A name for the HNSW component for the info-stream *private InfoStream
infoStream
private int
M
private double
ml
private java.util.SplittableRandom
random
static long
randSeed
Random seed for level generation; public to expose for testing *private NeighborArray
scratch
private VectorSimilarityFunction
similarityFunction
private RandomAccessVectorValues
vectorValues
-
Constructor Summary
Constructors Constructor Description HnswGraphBuilder(RandomAccessVectorValuesProducer vectors, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed)
Reads all the vectors from a VectorValues, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addDiverseNeighbors(int level, int node, NeighborQueue candidates)
(package private) void
addGraphNode(int node, float[] value)
Inserts a doc with vector value to the graphOnHeapHnswGraph
build(RandomAccessVectorValues vectors)
Reads all the vectors from two copies of a random access VectorValues.private boolean
diversityCheck(float[] candidate, float score, NeighborArray neighbors, RandomAccessVectorValues vectorValues)
private int
findWorstNonDiverse(NeighborArray neighbors)
Find first non-diverse neighbour among the list of neighbors starting from the most distant neighboursprivate static int
getRandomGraphLevel(double ml, java.util.SplittableRandom random)
private void
popToScratch(NeighborQueue candidates)
private long
printGraphBuildStatus(int node, long start, long t)
private void
selectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel)
void
setInfoStream(InfoStream infoStream)
Set info-stream to output debugging information *
-
-
-
Field Detail
-
DEFAULT_RAND_SEED
private static final long DEFAULT_RAND_SEED
Default random seed for level generation *- See Also:
- Constant Field Values
-
HNSW_COMPONENT
public static final java.lang.String HNSW_COMPONENT
A name for the HNSW component for the info-stream *- See Also:
- Constant Field Values
-
randSeed
public static long randSeed
Random seed for level generation; public to expose for testing *
-
M
private final int M
-
beamWidth
private final int beamWidth
-
ml
private final double ml
-
scratch
private final NeighborArray scratch
-
similarityFunction
private final VectorSimilarityFunction similarityFunction
-
vectorValues
private final RandomAccessVectorValues vectorValues
-
random
private final java.util.SplittableRandom random
-
bound
private final BoundsChecker bound
-
graphSearcher
private final HnswGraphSearcher graphSearcher
-
hnsw
final OnHeapHnswGraph hnsw
-
infoStream
private InfoStream infoStream
-
buildVectors
private RandomAccessVectorValues buildVectors
-
-
Constructor Detail
-
HnswGraphBuilder
public HnswGraphBuilder(RandomAccessVectorValuesProducer vectors, VectorSimilarityFunction similarityFunction, int M, int beamWidth, long seed) throws java.io.IOException
Reads all the vectors from a VectorValues, builds a graph connecting them by their dense ordinals, using the given hyperparameter settings, and returns the resulting graph.- Parameters:
vectors
- the vectors whose relations are represented by the graph - must provide a different view over those vectors than the one used to add via addGraphNode.M
- – graph fanout parameter used to calculate the maximum number of connections a node can have – M on upper layers, and M * 2 on the lowest level.beamWidth
- the size of the beam search to use when finding nearest neighbors.seed
- the seed for a random number generator used during graph construction. Provide this to ensure repeatable construction.- Throws:
java.io.IOException
-
-
Method Detail
-
build
public OnHeapHnswGraph build(RandomAccessVectorValues vectors) throws java.io.IOException
Reads all the vectors from two copies of a random access VectorValues. Providing two copies enables efficient retrieval without extra data copying, while avoiding collision of the returned values.- Parameters:
vectors
- the vectors for which to build a nearest neighbors graph. Must be an independet accessor for the vectors- Throws:
java.io.IOException
-
setInfoStream
public void setInfoStream(InfoStream infoStream)
Set info-stream to output debugging information *
-
addGraphNode
void addGraphNode(int node, float[] value) throws java.io.IOException
Inserts a doc with vector value to the graph- Throws:
java.io.IOException
-
printGraphBuildStatus
private long printGraphBuildStatus(int node, long start, long t)
-
addDiverseNeighbors
private void addDiverseNeighbors(int level, int node, NeighborQueue candidates) throws java.io.IOException
- Throws:
java.io.IOException
-
selectAndLinkDiverse
private void selectAndLinkDiverse(NeighborArray neighbors, NeighborArray candidates, int maxConnOnLevel) throws java.io.IOException
- Throws:
java.io.IOException
-
popToScratch
private void popToScratch(NeighborQueue candidates)
-
diversityCheck
private boolean diversityCheck(float[] candidate, float score, NeighborArray neighbors, RandomAccessVectorValues vectorValues) throws java.io.IOException
- Parameters:
candidate
- the vector of a new candidate neighbor of a node nscore
- the score of the new candidate and node n, to be compared with scores of the candidate and n's neighborsneighbors
- the neighbors selected so farvectorValues
- source of values used for making comparisons between candidate and existing neighbors- Returns:
- whether the candidate is diverse given the existing neighbors
- Throws:
java.io.IOException
-
findWorstNonDiverse
private int findWorstNonDiverse(NeighborArray neighbors) throws java.io.IOException
Find first non-diverse neighbour among the list of neighbors starting from the most distant neighbours- Throws:
java.io.IOException
-
getRandomGraphLevel
private static int getRandomGraphLevel(double ml, java.util.SplittableRandom random)
-
-