public class IndexingTools
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
adminDefaultFieldName
Admin default field 'admindefault'
|
static java.lang.String |
defaultFieldName
Default field 'default'
|
static java.lang.String |
PHRASE_SEPARATOR
String used to separate and preserve phrases indexed as text, includes leading and trailing white space.
|
static java.lang.String |
stemsFieldName
Stems field 'stems'
|
Constructor and Description |
---|
IndexingTools() |
Modifier and Type | Method and Description |
---|---|
static void |
addToAdminDefaultField(org.apache.lucene.document.Document myDoc,
java.lang.String content)
Indexes the given text into the admin default field.
|
static void |
addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc,
java.lang.String content)
Indexes the given text into the default and stems fields.
|
static java.lang.String |
encodeToTerm(java.lang.String text)
Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.
|
static java.lang.String |
encodeToTerm(java.lang.String text,
boolean encodeWildCards)
Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.
|
static java.lang.String[] |
extractSeparatePhrasesFromString(java.lang.String separatedPhrases)
Extracts the phrases from a String that was created using the method
makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings) . |
static java.lang.String[] |
extractStringsFromString(java.lang.String separatedWords)
Extracts the words from a String that was created using the method
makeStringFromNodes(List
nodes) . |
static java.lang.String[] |
getAnalyzedTerms(java.lang.String textToParse,
java.lang.String field,
org.apache.lucene.analysis.Analyzer analyzer)
Extracts all terms in any field from a Lucene query using the given
Analyzer . |
static org.apache.lucene.analysis.Token[] |
getAnalyzedTokens(java.lang.String textToParse,
java.lang.String field,
org.apache.lucene.analysis.Analyzer analyzer)
Extracts all
Token s from a Lucene query using the given Analyzer . |
static java.lang.StringBuffer |
getAnalyzerOutput(java.lang.String textToParse,
java.lang.String field,
org.apache.lucene.analysis.Analyzer analyzer)
Creates a StringBuffer to display the tokens created by a given analyzer.
|
static java.lang.String |
makeSeparatePhrasesFromNodes(java.util.List nodes)
Creates a String separated by the phrase separator term from the text of each of the Element or
Attributes dom4j Nodes provided.
|
static java.lang.String |
makeSeparatePhrasesFromStrings(java.util.List strings)
Creates a String separated by the phrase separator term from each of the Strings provided.
|
static java.lang.String |
makeSeparatePhrasesFromStrings(java.lang.String[] strings)
Creates a String separated by the phrase separator term from each of the Strings provided.
|
static java.lang.String |
makeStringFromNodes(java.util.List nodes)
Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes
provided.
|
static java.lang.String |
tokenizeID(java.lang.String ID)
Tokenizes a DLESE ID by replacing the char - with a blank space.
|
static java.lang.String |
tokenizeURI(java.lang.String uri)
Tokenizes a URI by replacing the unindexable chars with a blank space.
|
public static final java.lang.String defaultFieldName
public static final java.lang.String stemsFieldName
public static final java.lang.String adminDefaultFieldName
public static final java.lang.String PHRASE_SEPARATOR
public static final void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, java.lang.String content)
myDoc
- Document to add tocontent
- Content to addpublic static final void addToAdminDefaultField(org.apache.lucene.document.Document myDoc, java.lang.String content)
myDoc
- Document to add tocontent
- Content to addpublic static final java.lang.String makeSeparatePhrasesFromNodes(java.util.List nodes)
A call to this method might look like:
String value = makeIndexPhrasesFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));
nodes
- List of Elements or Attributespublic static final java.lang.String makeSeparatePhrasesFromStrings(java.util.List strings)
strings
- List of Strings or nullpublic static final java.lang.String makeSeparatePhrasesFromStrings(java.lang.String[] strings)
strings
- Array of Strings or nullpublic static final java.lang.String[] extractSeparatePhrasesFromString(java.lang.String separatedPhrases)
makeSeparatePhrasesFromNodes(List nodes)
or makeSeparatePhrasesFromStrings(List strings)
.separatedPhrases
- String that contains the phrase separator to seperate phrasespublic static final java.lang.String makeStringFromNodes(java.util.List nodes)
A call to this method might look like:
String value = makeStringFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));
nodes
- List of dom4j Nodes of Elements or Attributespublic static final java.lang.String[] extractStringsFromString(java.lang.String separatedWords)
makeStringFromNodes(List
nodes)
.separatedWords
- DESCRIPTIONpublic static final java.lang.String tokenizeID(java.lang.String ID)
ID
- The ID Stringpublic static final java.lang.String tokenizeURI(java.lang.String uri)
uri
- A URL or URIpublic static final java.lang.String encodeToTerm(java.lang.String text)
text
- Textpublic static final java.lang.String encodeToTerm(java.lang.String text, boolean encodeWildCards)
text
- TextencodeWildCards
- True to encode the '*' wildcard char, false to leave unencoded.public static final org.apache.lucene.analysis.Token[] getAnalyzedTokens(java.lang.String textToParse, java.lang.String field, org.apache.lucene.analysis.Analyzer analyzer)
Token
s from a Lucene query using the given Analyzer
.textToParse
- The text to analyze with the analyzeranalyzer
- The analyzer to usefield
- The field this Analyzer should interpret the text as, or null to use 'default'public static final java.lang.String[] getAnalyzedTerms(java.lang.String textToParse, java.lang.String field, org.apache.lucene.analysis.Analyzer analyzer)
Analyzer
.textToParse
- The text to analyze with the analyzeranalyzer
- The analyzer to usefield
- The field this Analyzer should interpret the text as, or null to use 'default'public static final java.lang.StringBuffer getAnalyzerOutput(java.lang.String textToParse, java.lang.String field, org.apache.lucene.analysis.Analyzer analyzer)
textToParse
- The text to analyze with the analyzeranalyzer
- The analyzer to usefield
- The lucene field name, or null to use default