Class WeightedSpanTermExtractor
- java.lang.Object
-
- org.apache.lucene.search.highlight.WeightedSpanTermExtractor
-
public class WeightedSpanTermExtractor extends java.lang.Object
Class used to extractWeightedSpanTerm
s from aQuery
based on whetherTerm
s from theQuery
are contained in a suppliedTokenStream
.In order to support additional, by default unsupported queries, subclasses can override
extract(Query, float, Map)
for extracting wrapped or delegate queries andextractUnknownQuery(Query, Map)
to process custom leaf queries:WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() { protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException { if (query instanceof QueryWrapper) { extract(((QueryWrapper)query).getQuery(), boost, terms); } else { super.extract(query, boost, terms); } } protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException { if (query instanceOf CustomTermQuery) { Term term = ((CustomTermQuery) query).getTerm(); terms.put(term.field(), new WeightedSpanTerm(1, term.text())); } } }; }
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
WeightedSpanTermExtractor.DelegatingLeafReader
protected static class
WeightedSpanTermExtractor.PositionCheckingMap<K>
This class makes sure that if both position sensitive and insensitive versions of the same term are added, the position insensitive one wins.
-
Field Summary
Fields Modifier and Type Field Description private boolean
cachedTokenStream
private java.lang.String
defaultField
private boolean
expandMultiTermQuery
private java.lang.String
fieldName
private LeafReader
internalReader
private int
maxDocCharsToAnalyze
private TokenStream
tokenStream
private boolean
usePayloads
private boolean
wrapToCaching
-
Constructor Summary
Constructors Constructor Description WeightedSpanTermExtractor()
WeightedSpanTermExtractor(java.lang.String defaultField)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
collectSpanQueryFields(SpanQuery spanQuery, java.util.Set<java.lang.String> fieldNames)
protected void
extract(Query query, float boost, java.util.Map<java.lang.String,WeightedSpanTerm> terms)
protected void
extractUnknownQuery(Query query, java.util.Map<java.lang.String,WeightedSpanTerm> terms)
protected void
extractWeightedSpanTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost)
protected void
extractWeightedTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, Query query, float boost)
protected boolean
fieldNameComparator(java.lang.String fieldNameToCheck)
Necessary to implement matches for queries againstdefaultField
boolean
getExpandMultiTermQuery()
protected LeafReaderContext
getLeafContext()
TokenStream
getTokenStream()
Returns the tokenStream which may have been wrapped in a CachingTokenFilter.java.util.Map<java.lang.String,WeightedSpanTerm>
getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream)
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.java.util.Map<java.lang.String,WeightedSpanTerm>
getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName)
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.java.util.Map<java.lang.String,WeightedSpanTerm>
getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName, IndexReader reader)
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.boolean
isCachedTokenStream()
protected boolean
isQueryUnsupported(java.lang.Class<? extends Query> clazz)
boolean
isUsePayloads()
protected boolean
mustRewriteQuery(SpanQuery spanQuery)
void
setExpandMultiTermQuery(boolean expandMultiTermQuery)
protected void
setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)
A threshold of number of characters to analyze.void
setUsePayloads(boolean usePayloads)
void
setWrapIfNotCachingTokenFilter(boolean wrap)
By default,TokenStream
s that are not of the typeCachingTokenFilter
are wrapped in aCachingTokenFilter
to ensure an efficient reset - if you are already using a different cachingTokenStream
impl and you don't want it to be wrapped, set this to false.
-
-
-
Field Detail
-
fieldName
private java.lang.String fieldName
-
tokenStream
private TokenStream tokenStream
-
defaultField
private java.lang.String defaultField
-
expandMultiTermQuery
private boolean expandMultiTermQuery
-
cachedTokenStream
private boolean cachedTokenStream
-
wrapToCaching
private boolean wrapToCaching
-
maxDocCharsToAnalyze
private int maxDocCharsToAnalyze
-
usePayloads
private boolean usePayloads
-
internalReader
private LeafReader internalReader
-
-
Method Detail
-
extract
protected void extract(Query query, float boost, java.util.Map<java.lang.String,WeightedSpanTerm> terms) throws java.io.IOException
- Parameters:
query
- Query to extract Terms fromterms
- Map to place created WeightedSpanTerms in- Throws:
java.io.IOException
- If there is a low-level I/O error
-
isQueryUnsupported
protected boolean isQueryUnsupported(java.lang.Class<? extends Query> clazz)
-
extractUnknownQuery
protected void extractUnknownQuery(Query query, java.util.Map<java.lang.String,WeightedSpanTerm> terms) throws java.io.IOException
- Throws:
java.io.IOException
-
extractWeightedSpanTerms
protected void extractWeightedSpanTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost) throws java.io.IOException
- Parameters:
terms
- Map to place created WeightedSpanTerms inspanQuery
- SpanQuery to extract Terms from- Throws:
java.io.IOException
- If there is a low-level I/O error
-
extractWeightedTerms
protected void extractWeightedTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, Query query, float boost) throws java.io.IOException
- Parameters:
terms
- Map to place created WeightedSpanTerms inquery
- Query to extract Terms from- Throws:
java.io.IOException
- If there is a low-level I/O error
-
fieldNameComparator
protected boolean fieldNameComparator(java.lang.String fieldNameToCheck)
Necessary to implement matches for queries againstdefaultField
-
getLeafContext
protected LeafReaderContext getLeafContext() throws java.io.IOException
- Throws:
java.io.IOException
-
getWeightedSpanTerms
public java.util.Map<java.lang.String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream) throws java.io.IOException
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.- Parameters:
query
- that caused hittokenStream
- of text to be highlighted- Returns:
- Map containing WeightedSpanTerms
- Throws:
java.io.IOException
- If there is a low-level I/O error
-
getWeightedSpanTerms
public java.util.Map<java.lang.String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName) throws java.io.IOException
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.- Parameters:
query
- that caused hittokenStream
- of text to be highlightedfieldName
- restricts Term's used based on field name- Returns:
- Map containing WeightedSpanTerms
- Throws:
java.io.IOException
- If there is a low-level I/O error
-
getWeightedSpanTermsWithScores
public java.util.Map<java.lang.String,WeightedSpanTerm> getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName, IndexReader reader) throws java.io.IOException
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
. Uses a suppliedIndexReader
to properly weight terms (for gradient highlighting).- Parameters:
query
- that caused hittokenStream
- of text to be highlightedfieldName
- restricts Term's used based on field namereader
- to use for scoring- Returns:
- Map of WeightedSpanTerms with quasi tf/idf scores
- Throws:
java.io.IOException
- If there is a low-level I/O error
-
collectSpanQueryFields
protected void collectSpanQueryFields(SpanQuery spanQuery, java.util.Set<java.lang.String> fieldNames)
-
mustRewriteQuery
protected boolean mustRewriteQuery(SpanQuery spanQuery)
-
getExpandMultiTermQuery
public boolean getExpandMultiTermQuery()
-
setExpandMultiTermQuery
public void setExpandMultiTermQuery(boolean expandMultiTermQuery)
-
isUsePayloads
public boolean isUsePayloads()
-
setUsePayloads
public void setUsePayloads(boolean usePayloads)
-
isCachedTokenStream
public boolean isCachedTokenStream()
-
getTokenStream
public TokenStream getTokenStream()
Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before.
-
setWrapIfNotCachingTokenFilter
public void setWrapIfNotCachingTokenFilter(boolean wrap)
By default,TokenStream
s that are not of the typeCachingTokenFilter
are wrapped in aCachingTokenFilter
to ensure an efficient reset - if you are already using a different cachingTokenStream
impl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently.
-
setMaxDocCharsToAnalyze
protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)
A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.
-
-