public abstract class XMLFileIndexingWriter extends FileIndexingServiceWriter
Document
from any XML file by stripping the XML tags
to extract and index the content. The reader for this type of Document is XMLDocReader.
The Lucene Document fields that are created by this class are (in addition the the ones listed for
FileIndexingServiceWriter
):
collection
- The collection associated with this resource.
FileIndexingService
,
XMLDocReader
Constructor and Description |
---|
XMLFileIndexingWriter()
Constructor for the XMLFileIndexingWriter.
|
Modifier and Type | Method and Description |
---|---|
protected abstract java.lang.String[] |
_getIds()
Return unique IDs for the item being indexed, one for each collection that catalogs the resource.
|
protected void |
addCustomFields(org.apache.lucene.document.Document newDoc,
org.apache.lucene.document.Document existingDoc,
java.io.File sourceFile)
Adds the full content of the XML to the default search field.
|
protected abstract void |
addFields(org.apache.lucene.document.Document newDoc,
org.apache.lucene.document.Document existingDoc,
java.io.File sourceFile)
Adds additional fields that are unique the document format being indexed.
|
protected BoundingBox |
getBoundingBox()
Return the geospatial BoundingBox footprint that represnets the resource being indexed, or null if none
apply.
|
protected java.lang.String[] |
getCollections()
Returns unique collection keys for the item being indexed.
|
org.apache.lucene.document.Document |
getDeletedDoc(org.apache.lucene.document.Document existingDoc)
Creates a Lucene Document for the XML that is equal to the exsiting Document.
|
abstract java.lang.String |
getDescription()
Return a description for the document being indexed, or null if none applies.
|
java.lang.String |
getDocGroup()
Gets the collection specifier, for example 'dcc', 'comet'.
|
protected org.dom4j.Document |
getDom4jDoc()
Gets the dom4j Document for use by sub-classes
|
protected java.lang.String |
getFieldContent(java.lang.String[] values,
java.lang.String useVocabMapping,
java.lang.String metadataFormat)
Gets the vocab encoded keys for the given values, separated by the '+' symbol.
|
protected java.lang.String |
getFieldContent(java.lang.String value,
java.lang.String useVocabMapping,
java.lang.String metadataFormat)
Gets the encoded vocab key for the given content.
|
protected java.lang.String |
getFieldName(java.lang.String vocabFieldString,
java.lang.String metadataFormat)
Gets the field ID, for example 'gr', for a given vocab, for example 'gradeRange'.
|
java.lang.String[] |
getIds()
Returns the ids for the item being indexed.
|
protected SimpleLuceneIndex |
getIndex()
Gets the index used by this XML File Indexer
|
protected ResultDocList |
getMyAnnoResultDocs()
Gets the annotations for this record, null or zero length if none available.
|
protected DleseCollectionDocReader |
getMyCollectionDoc()
Gets the DLESECollectionDocReader for the collection in which this item is a part, or null if not
available.
|
static java.lang.String |
getOaiModtime(java.io.File sourceFile,
org.apache.lucene.document.Document existingDoc)
Gets the oaiModtime for the given File or Document, set to 3 minutes in the future to account for any
delay in indexing updates.
|
java.lang.String |
getPrimaryId()
Returns the unique primary record ID for the item being indexed.
|
protected RecordDataService |
getRecordDataService()
Gets the recordDataService used by this XML File Indexer
|
java.util.List |
getRelatedIds()
Gets the ids of related records.
|
java.util.Map |
getRelatedIdsMap()
Gets the ids of related records.
|
java.util.List |
getRelatedUrls()
Gets the urls of related records.
|
java.util.Map |
getRelatedUrlsMap()
Gets the urls of related records.
|
protected java.lang.String |
getTermStringFromStringArray(java.lang.String[] vals)
Gets the appropriate terms from a string array of metadata fields.
|
abstract java.lang.String |
getTitle()
Return a title for the document being indexed, or null if none applies.
|
abstract java.lang.String[] |
getUrls()
Return the URL(s) to the resource being indexed, or null if none apply.
|
protected abstract java.util.Date |
getWhatsNewDate()
Returns the date used to determine "What's new" in the library, or null if none is available.
|
protected abstract java.lang.String |
getWhatsNewType()
Returns the type of category for "What's new" in the library, or null if none is available.
|
protected XMLIndexer |
getXmlIndexer()
Gets the XMLIndexer for use by sub-classes
|
protected XMLIndexerFieldsConfig |
getXmlIndexerFieldsConfig()
Gets the XMLIndexerFieldsConfig to use for XML indexing, or null if none available.
|
abstract boolean |
indexFullContentInDefaultAndStems()
Return true to have the full XML content indexed in the 'default' and 'stems' fields, false if handled by
the sub-class.
|
abstract void |
init(java.io.File source,
org.apache.lucene.document.Document existingDoc)
This method is called prior to processing and may be used to for any necessary set-up.
|
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, destroy, getConfigAttributes, getDocsource, getDocType, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getReaderClass, getSessionAttributes, getSourceDir, getSourceFile, getValidationReport, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
public XMLFileIndexingWriter()
public java.lang.String[] getIds() throws java.lang.Exception
java.lang.Exception
- If errorgetIds()
public java.lang.String getPrimaryId() throws java.lang.Exception
java.lang.Exception
- If errorgetIds()
public java.util.List getRelatedIds() throws java.lang.IllegalStateException, java.lang.Exception
java.lang.IllegalStateException
- If called prior to calling method #indexFieldsjava.lang.Exception
- If errorpublic java.util.List getRelatedUrls() throws java.lang.IllegalStateException, java.lang.Exception
java.lang.IllegalStateException
- If called prior to calling method #indexFieldsjava.lang.Exception
- If errorpublic java.util.Map getRelatedIdsMap() throws java.lang.IllegalStateException, java.lang.Exception
java.lang.IllegalStateException
- If called prior to calling method #indexFieldsjava.lang.Exception
- If errorpublic java.util.Map getRelatedUrlsMap() throws java.lang.IllegalStateException, java.lang.Exception
java.lang.IllegalStateException
- If called prior to calling method #indexFieldsjava.lang.Exception
- If errorprotected java.lang.String[] getCollections() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public java.lang.String getDocGroup() throws java.lang.Exception
getDocGroup
in class FileIndexingServiceWriter
java.lang.Exception
- If error occuredprotected BoundingBox getBoundingBox() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract void init(java.io.File source, org.apache.lucene.document.Document existingDoc) throws java.lang.Exception
init
in class FileIndexingServiceWriter
source
- The source file being indexedexistingDoc
- An existing Document that currently resides in the index for the given resource, or
null if none was previously presentjava.lang.Exception
- If an error occured during set-up.protected abstract java.lang.String[] _getIds() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract java.lang.String getTitle() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract java.lang.String getDescription() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract java.lang.String[] getUrls() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public abstract boolean indexFullContentInDefaultAndStems()
protected abstract java.util.Date getWhatsNewDate() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.protected abstract java.lang.String getWhatsNewType() throws java.lang.Exception
java.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.protected abstract void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, java.io.File sourceFile) throws java.lang.Exception
Document
class to add a Field
.
The following Lucene Field
types are available for indexing with the
Document
:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do
anything you want
Example code:
protected void addCustomFields(Document newDoc, Document existingDoc) throws Exception {
String customContent = "Some content";
newDoc.add(Field.Text("mycustomefield", customContent));
}
newDoc
- The new Document
that is being created for this
resourceexistingDoc
- An existing Document
that currently resides in
the index for the given resource, or null if none was previously presentsourceFile
- The sourceFile that is being indexedjava.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.protected void addCustomFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, java.io.File sourceFile) throws java.lang.Exception
addCustomFields
in class FileIndexingServiceWriter
newDoc
- The new Document
that is being created for this
resourceexistingDoc
- An existing Document
that currently resides in
the index for the given resource, or null if none was previously presentsourceFile
- The feature to be added to the CustomFields attributejava.lang.Exception
- This method should throw and Exception with appropriate error message if an error
occurs.public org.apache.lucene.document.Document getDeletedDoc(org.apache.lucene.document.Document existingDoc) throws java.lang.Throwable
getDeletedDoc
in class FileIndexingServiceWriter
existingDoc
- An existing FileIndexingService Document that currently resides in the index for
the given filejava.lang.Throwable
- Thrown if error occursprotected ResultDocList getMyAnnoResultDocs() throws java.lang.Exception
java.lang.Exception
- NOT YET DOCUMENTEDprotected XMLIndexerFieldsConfig getXmlIndexerFieldsConfig()
protected java.lang.String getFieldContent(java.lang.String[] values, java.lang.String useVocabMapping, java.lang.String metadataFormat) throws java.lang.Exception
values
- The valuse to encode.useVocabMapping
- The mapping to use, for example "contentStandards".metadataFormat
- The metadata format, for example 'adn'java.lang.Exception
- If error.protected java.lang.String getFieldContent(java.lang.String value, java.lang.String useVocabMapping, java.lang.String metadataFormat) throws java.lang.Exception
value
- The value to encodeuseVocabMapping
- The vocab mapping to use, for example 'contentStandard'metadataFormat
- The metadata format, for example 'adn'java.lang.Exception
- If errorprotected java.lang.String getFieldName(java.lang.String vocabFieldString, java.lang.String metadataFormat) throws java.lang.Exception
vocabFieldString
- The field, for example 'gradeRange'metadataFormat
- The metadata format, for example 'adn'java.lang.Exception
- If errorprotected java.lang.String getTermStringFromStringArray(java.lang.String[] vals)
vals
- Metadata fields that must be delemited by colons.protected XMLIndexer getXmlIndexer() throws java.lang.Exception
java.lang.Exception
- If errorprotected org.dom4j.Document getDom4jDoc() throws java.lang.Exception
java.lang.Exception
- If errorprotected DleseCollectionDocReader getMyCollectionDoc()
public static final java.lang.String getOaiModtime(java.io.File sourceFile, org.apache.lucene.document.Document existingDoc)
sourceFile
- The source fileexistingDoc
- The existing Docprotected RecordDataService getRecordDataService()
protected SimpleLuceneIndex getIndex()