public class Stemmer
extends java.lang.Object
The static methods getStem(String term)
and getStems(String[] terms)
can be used to quickly convert a word or words to their root form. Example code:
import org.dlese.dpc.index.Stemmer;
...
String word = "oceanic";
String stem = Stemmer.getStem(word); // stem now equals 'ocean'
String string = "A group of words that need to be stemmed";
String[] words = string.split("\\s+"); // Split on white space
String[] stems = Stemmer.getStems(words);
for(int i = 0; i < stems.length; i++){
... do something with the stems ...
}
For more information about the Porter stemming algorithm, see http://www.tartarus.org/~martin/PorterStemmer .
Constructor and Description |
---|
Stemmer()
Constructor for the Stemmer object
|
Modifier and Type | Method and Description |
---|---|
void |
add(char ch)
Add a character to the word being stemmed.
|
void |
add(char[] w,
int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[]
array.
|
char[] |
getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming
process.
|
int |
getResultLength()
Returns the length of the word resulting from the stemming process.
|
static java.lang.String |
getStem(java.lang.String term)
Gets the stem of the given english word.
|
static java.lang.String[] |
getStems(java.lang.String[] terms)
Gets the stems of the given english words.
|
static void |
main(java.lang.String[] args)
Test program for demonstrating the Stemmer.
|
void |
stem()
Stem the word placed into the Stemmer buffer through calls to add().
|
static java.lang.String |
stemWordsInLuceneClause(java.lang.String string)
Stems each of the words in a given Lucene clause String, returning the same String
with the word parts in stemmed form.
|
static java.lang.String |
stemWordsInString(java.lang.String string)
Stems each of the words or tokens in a given String, returning a String of stemmed
tokens with all other characters removed.
|
java.lang.String |
toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to
the internal buffer can be retrieved by getResultBuffer and getResultLength (which is
generally more efficient.)
|
public static final java.lang.String getStem(java.lang.String term)
term
- A term in english.public static final java.lang.String[] getStems(java.lang.String[] terms)
terms
- A group of terms in english.public static final java.lang.String stemWordsInString(java.lang.String string)
Example:
oceans and rain AND 44rains http://dlese.org/oceans
is transformed to
ocean and rain AND 44rain http dlese org ocean
string
- A word, phrase, or any arbitrary String.public static final java.lang.String stemWordsInLuceneClause(java.lang.String string)
Example:
titles:("oceans AND oceans44 OR 44oceans and oceanic")^20 or cooled
is transformed to
titles:("ocean AND oceans44 OR 44ocean and ocean")^20 or cool
string
- A word, phrase, Lucene clause, or any arbitrary String.public void add(char ch)
ch
- DESCRIPTIONpublic void add(char[] w, int wLen)
w
- DESCRIPTIONwLen
- DESCRIPTIONpublic java.lang.String toString()
toString
in class java.lang.Object
public int getResultLength()
public char[] getResultBuffer()
public void stem()
public static void main(java.lang.String[] args)
args
- The command line arguments