Package org.apache.lucene.analysis.br
Class BrazilianStemmer
- java.lang.Object
-
- org.apache.lucene.analysis.br.BrazilianStemmer
-
public class BrazilianStemmer extends java.lang.Object
A stemmer for Brazilian Portuguese words.
-
-
Constructor Summary
Constructors Constructor Description BrazilianStemmer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private java.lang.String
changeTerm(java.lang.String value)
1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> cprivate void
createCT(java.lang.String term)
Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.private java.lang.String
getR1(java.lang.String value)
Gets R1private java.lang.String
getRV(java.lang.String value)
Gets RVprivate boolean
isIndexable(java.lang.String term)
Checks a term if it can be processed indexed.private boolean
isStemmable(java.lang.String term)
Checks a term if it can be processed correctly.private boolean
isVowel(char value)
See if string is 'a','e','i','o','u'java.lang.String
log()
For log and debug purposeprivate java.lang.String
removeSuffix(java.lang.String value, java.lang.String toRemove)
Remove a string suffixprivate java.lang.String
replaceSuffix(java.lang.String value, java.lang.String toReplace, java.lang.String changeTo)
Replace a string suffix by anotherprotected java.lang.String
stem(java.lang.String term)
Stems the given term to an uniquediscriminator
.private boolean
step1()
Standard suffix removal.private boolean
step2()
Verb suffixes.private void
step3()
Delete suffix 'i' if in RV and preceded by 'c'private void
step4()
Residual suffixprivate void
step5()
If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')private boolean
suffix(java.lang.String value, java.lang.String suffix)
Check if a string ends with a suffixprivate boolean
suffixPreceded(java.lang.String value, java.lang.String suffix, java.lang.String preceded)
See if a suffix is preceded by a String
-
-
-
Method Detail
-
stem
protected java.lang.String stem(java.lang.String term)
Stems the given term to an uniquediscriminator
.- Parameters:
term
- The term that should be stemmed.- Returns:
- Discriminator for
term
-
isStemmable
private boolean isStemmable(java.lang.String term)
Checks a term if it can be processed correctly.- Returns:
- true if, and only if, the given term consists in letters.
-
isIndexable
private boolean isIndexable(java.lang.String term)
Checks a term if it can be processed indexed.- Returns:
- true if it can be indexed
-
isVowel
private boolean isVowel(char value)
See if string is 'a','e','i','o','u'- Returns:
- true if is vowel
-
getR1
private java.lang.String getR1(java.lang.String value)
Gets R1R1 - is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
- Returns:
- null or a string representing R1
-
getRV
private java.lang.String getRV(java.lang.String value)
Gets RVRV - IF the second letter is a consonant, RV is the region after the next following vowel,
OR if the first two letters are vowels, RV is the region after the next consonant,
AND otherwise (consonant-vowel case) RV is the region after the third letter.
BUT RV is the end of the word if this positions cannot be found.
- Returns:
- null or a string representing RV
-
changeTerm
private java.lang.String changeTerm(java.lang.String value)
1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> c- Returns:
- null or a string transformed
-
suffix
private boolean suffix(java.lang.String value, java.lang.String suffix)
Check if a string ends with a suffix- Returns:
- true if the string ends with the specified suffix
-
replaceSuffix
private java.lang.String replaceSuffix(java.lang.String value, java.lang.String toReplace, java.lang.String changeTo)
Replace a string suffix by another- Returns:
- the replaced String
-
removeSuffix
private java.lang.String removeSuffix(java.lang.String value, java.lang.String toRemove)
Remove a string suffix- Returns:
- the String without the suffix
-
suffixPreceded
private boolean suffixPreceded(java.lang.String value, java.lang.String suffix, java.lang.String preceded)
See if a suffix is preceded by a String- Returns:
- true if the suffix is preceded
-
createCT
private void createCT(java.lang.String term)
Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.
-
step1
private boolean step1()
Standard suffix removal. Search for the longest among the following suffixes, and perform the following actions:- Returns:
- false if no ending was removed
-
step2
private boolean step2()
Verb suffixes.Search for the longest among the following suffixes in RV, and if found, delete.
- Returns:
- false if no ending was removed
-
step3
private void step3()
Delete suffix 'i' if in RV and preceded by 'c'
-
step4
private void step4()
Residual suffixIf the word ends with one of the suffixes (os a i o á í ó) in RV, delete it
-
step5
private void step5()
If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')Or if the word ends ç remove the cedilha
-
log
public java.lang.String log()
For log and debug purpose- Returns:
- TERM, CT, RV, R1 and R2
-
-