Class CapitalizationFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.TokenFilterFactory
-
- org.apache.lucene.analysis.miscellaneous.CapitalizationFilterFactory
-
public class CapitalizationFilterFactory extends TokenFilterFactory
Factory forCapitalizationFilter
.The factory takes parameters:
- "onlyFirstWord" - should each word be capitalized or all of the words?
- "keep" - a keep word list. Each word that should be kept separated by whitespace.
- "keepIgnoreCase - true or false. If true, the keep list will be considered case-insensitive.
- "forceFirstLetter" - Force the first letter to be capitalized even if it is in the keep list
- "okPrefix" - do not change word capitalization if a word begins with something in this list. for example if "McK" is on the okPrefix list, the word "McKinley" should not be changed to "Mckinley"
- "minWordLength" - how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or"
- "maxWordCount" - if the token contains more then maxWordCount words, the capitalization is assumed to be correct.
<fieldType name="text_cptlztn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.CapitalizationFilterFactory" onlyFirstWord="true" keep="java solr lucene" keepIgnoreCase="false" okPrefix="McK McD McA"/> </analyzer> </fieldType>
- Since:
- solr 1.3
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
FORCE_FIRST_LETTER
(package private) boolean
forceFirstLetter
(package private) CharArraySet
keep
static java.lang.String
KEEP
static java.lang.String
KEEP_IGNORE_CASE
static java.lang.String
MAX_TOKEN_LENGTH
static java.lang.String
MAX_WORD_COUNT
(package private) int
maxTokenLength
(package private) int
maxWordCount
static java.lang.String
MIN_WORD_LENGTH
(package private) int
minWordLength
static java.lang.String
NAME
SPI namestatic java.lang.String
OK_PREFIX
(package private) java.util.Collection<char[]>
okPrefix
static java.lang.String
ONLY_FIRST_WORD
(package private) boolean
onlyFirstWord
-
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description CapitalizationFilterFactory()
Default ctor for compatibility with SPICapitalizationFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new CapitalizationFilterFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CapitalizationFilter
create(TokenStream input)
Transform the specified input TokenStream-
Methods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final java.lang.String NAME
SPI name- See Also:
- Constant Field Values
-
KEEP
public static final java.lang.String KEEP
- See Also:
- Constant Field Values
-
KEEP_IGNORE_CASE
public static final java.lang.String KEEP_IGNORE_CASE
- See Also:
- Constant Field Values
-
OK_PREFIX
public static final java.lang.String OK_PREFIX
- See Also:
- Constant Field Values
-
MIN_WORD_LENGTH
public static final java.lang.String MIN_WORD_LENGTH
- See Also:
- Constant Field Values
-
MAX_WORD_COUNT
public static final java.lang.String MAX_WORD_COUNT
- See Also:
- Constant Field Values
-
MAX_TOKEN_LENGTH
public static final java.lang.String MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
-
ONLY_FIRST_WORD
public static final java.lang.String ONLY_FIRST_WORD
- See Also:
- Constant Field Values
-
FORCE_FIRST_LETTER
public static final java.lang.String FORCE_FIRST_LETTER
- See Also:
- Constant Field Values
-
keep
CharArraySet keep
-
okPrefix
java.util.Collection<char[]> okPrefix
-
minWordLength
final int minWordLength
-
maxWordCount
final int maxWordCount
-
maxTokenLength
final int maxTokenLength
-
onlyFirstWord
final boolean onlyFirstWord
-
forceFirstLetter
final boolean forceFirstLetter
-
-
Method Detail
-
create
public CapitalizationFilter create(TokenStream input)
Description copied from class:TokenFilterFactory
Transform the specified input TokenStream- Specified by:
create
in classTokenFilterFactory
-
-