public class XMLIndexer
extends java.lang.Object
Document
from any well-formed XML. Individual
field names are derived from the xPath to each element and attribute in the XML instance document. Fields
are encoded to support text, keyword and stemmed search. Also creates standard fields for IDs, URLs, title,
description and geospatial bounding box footprint. The 'default' and 'stems' fields are also indexed as text and stemmed text, respectively.
A XMLIndexerFieldsConfig
may be supplied to configure specific search fields for given XML
formats. If a field is defined in the XMLIndexerFieldsConfig, and content is avialable at the given xPath,
it will override the value set for ids, urls,
title or description. In addition, field values configured by schema override those configured by xmlFormat.
XMLIndexerFieldsConfig
Constructor and Description |
---|
XMLIndexer(org.dom4j.Document localizedXmlDocument,
java.lang.String xmlFormat,
XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
Constructor for the XMLIndexer object
|
XMLIndexer(java.lang.String xmlString,
java.lang.String xmlFormat,
XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
Constructor for the XMLIndexer object
|
XMLIndexer(java.net.URL urlToXml,
java.lang.String xmlFormat,
XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
Constructor for the XMLIndexer object
|
Modifier and Type | Method and Description |
---|---|
BoundingBox |
getBoundingBox()
Returns the value of boundingBox.
|
java.lang.String |
getDescription()
Returns the value of description.
|
java.lang.String |
getFullXmlAttributeContent()
Gets the full content of each Attribute in the XML.
|
java.lang.String |
getFullXmlElementContent()
Gets the full content of each Element in the XML.
|
java.lang.String[] |
getIds()
Returns the value of ids.
|
java.lang.String[] |
getIdsEncoded()
Returns unique IDs for the item being indexed encoded for indexing.
|
java.util.List |
getRelatedIds()
Gets the ids of related records.
|
java.util.Map |
getRelatedIdsMap()
Gets the ids of related records.
|
java.util.List |
getRelatedUrls()
Gets the urls of related records.
|
java.util.Map |
getRelatedUrlsMap()
Gets the urls of related records.
|
java.lang.String |
getTitle()
Returns the value of title.
|
java.lang.String[] |
getUrls()
Returns the value of urls.
|
org.dom4j.Document |
getXmlDocument()
Gets the localized Dom4j Document for this XML instance.
|
java.lang.String |
getXPathFieldsPrefix()
Returns the value of xPathFieldsPrefix, or null if none.
|
void |
indexFields(org.apache.lucene.document.Document luceneDoc)
Indexes the contents of the XML, adding fields to the Lucene Document that is supplied.
|
boolean |
indexJavaBeanFields(org.apache.lucene.document.Document luceneDoc)
Indexes Java Bean XML that was encoded with the java.beans.XMLEncoder class, using the bean properties
as field names.
|
void |
indexXpathFields(org.apache.lucene.document.Document luceneDoc)
Indexes the content of each element and attribute in the source XML as individual search fields, using
the xPath to the element or attribute as the field name.
|
void |
setBoundingBox(BoundingBox boundingBox)
Sets the value of boundingBox.
|
void |
setDescription(java.lang.String description)
Sets the value of description.
|
void |
setIds(java.lang.String[] ids)
Sets the value of ids.
|
void |
setIndexDefaultAndStemsField(boolean indexDefaultAndStemsField)
Sets whether to index the default, admindefault, and stems field for this record.
|
void |
setTitle(java.lang.String title)
Sets the value of title.
|
void |
setUrls(java.lang.String[] urls)
Sets the value of urls.
|
void |
setXPathFieldsPrefix(java.lang.String xPathFieldsPrefix)
Sets the value of xPathFieldsPrefix, which is appended at the front of the xPath fields when indexed.
|
public XMLIndexer(org.dom4j.Document localizedXmlDocument, java.lang.String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig)
localizedXmlDocument
- A localized XML DocumentxmlFormat
- The XML format being indexed, for example adn or oai_dcxmlIndexerFieldsConfig
- The config, or null if not usedpublic XMLIndexer(java.lang.String xmlString, java.lang.String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) throws java.lang.Exception
xmlString
- A valid XML stringxmlFormat
- The XML format being indexed, for example adn or oai_dcxmlIndexerFieldsConfig
- The config, or null if not usedjava.lang.Exception
- If errorpublic XMLIndexer(java.net.URL urlToXml, java.lang.String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) throws java.lang.Exception
urlToXml
- URL to an XML documentxmlFormat
- The XML format being indexed, for example adn or oai_dcxmlIndexerFieldsConfig
- The config, or null if not usedjava.lang.Exception
- If errorpublic void setIndexDefaultAndStemsField(boolean indexDefaultAndStemsField) throws java.lang.IllegalStateException
indexDefaultAndStemsField
- The value to assign indexDefaultAndStemsField.java.lang.IllegalStateException
- If called after method #indexFields has been calledpublic java.lang.String getTitle() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic void setTitle(java.lang.String title) throws java.lang.IllegalStateException
title
- The value to assign title.java.lang.IllegalStateException
- If called after method #indexFields has been calledpublic java.lang.String getDescription() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic void setDescription(java.lang.String description) throws java.lang.IllegalStateException
description
- The value to assign description.java.lang.IllegalStateException
- If called after method #indexFields has been calledpublic java.lang.String[] getUrls() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic void setUrls(java.lang.String[] urls) throws java.lang.IllegalStateException
urls
- The value to assign urls.java.lang.IllegalStateException
- If called after method #indexFields has been calledpublic java.lang.String[] getIds() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic void setIds(java.lang.String[] ids) throws java.lang.IllegalStateException
ids
- The value to assign ids.java.lang.IllegalStateException
- If called after method #indexFields has been calledpublic java.lang.String[] getIdsEncoded() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldsgetIds()
public java.util.List getRelatedIds() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic java.util.List getRelatedUrls() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic java.util.Map getRelatedIdsMap() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic java.util.Map getRelatedUrlsMap() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic java.lang.String getXPathFieldsPrefix()
public void setXPathFieldsPrefix(java.lang.String xPathFieldsPrefix) throws java.lang.IllegalStateException
xPathFieldsPrefix
- The value to append to the xPath fields, or null for nonejava.lang.IllegalStateException
public BoundingBox getBoundingBox()
public void setBoundingBox(BoundingBox boundingBox)
boundingBox
- The value to assign boundingBox.public java.lang.String getFullXmlElementContent() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic java.lang.String getFullXmlAttributeContent() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- If called prior to calling method #indexFieldspublic org.dom4j.Document getXmlDocument()
public void indexFields(org.apache.lucene.document.Document luceneDoc) throws java.lang.Exception
luceneDoc
- The Document
to add fields tojava.lang.Exception
- If error, provides an appropriate message to display in indexing reports.public void indexXpathFields(org.apache.lucene.document.Document luceneDoc) throws java.lang.Exception
luceneDoc
- The Document
to add fields tojava.lang.Exception
- If error, provides an appropriate message to display in indexing reports.setXPathFieldsPrefix(java.lang.String)
public boolean indexJavaBeanFields(org.apache.lucene.document.Document luceneDoc) throws java.lang.Exception
luceneDoc
- The Document
to add fields tojava.lang.Exception
- If error, provides an appropriate message to display in indexing reports.