XMLFileIndexingWriter (jOAI Java Documentation v3.3)

java.lang.Object
- org.dlese.dpc.index.writer.FileIndexingServiceWriter
- - org.dlese.dpc.index.writer.XMLFileIndexingWriter

All Implemented Interfaces:

DocWriter

Direct Known Subclasses:

DleseAnnoFileIndexingServiceWriter, DleseCollectionFileIndexingWriter, ItemFileIndexingWriter, NCSCollectionFileIndexingWriter, NewsOppsFileIndexingWriter, SimpleXMLFileIndexingWriter
```
public abstract class XMLFileIndexingWriter
extends FileIndexingServiceWriter
```
Creates a Lucene Document from any XML file by stripping the XML tags to extract and index the content. The reader for this type of Document is XMLDocReader.
The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

collection - The collection associated with this resource.

Author:

John Weatherley

See Also:

FileIndexingService, XMLDocReader

Constructor Summary

Constructors
Constructor and Description

XMLFileIndexingWriter()
Constructor for the XMLFileIndexingWriter.

Constructors
Constructor and Description
`XMLFileIndexingWriter()` Constructor for the XMLFileIndexingWriter.

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected abstract java.lang.String[]`	`_getIds()` Return unique IDs for the item being indexed, one for each collection that catalogs the resource.
`protected void`	`addCustomFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, java.io.File sourceFile)` Adds the full content of the XML to the default search field.
`protected abstract void`	`addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, java.io.File sourceFile)` Adds additional fields that are unique the document format being indexed.
`protected BoundingBox`	`getBoundingBox()` Return the geospatial BoundingBox footprint that represnets the resource being indexed, or null if none apply.
`protected java.lang.String[]`	`getCollections()` Returns unique collection keys for the item being indexed.
`org.apache.lucene.document.Document`	`getDeletedDoc(org.apache.lucene.document.Document existingDoc)` Creates a Lucene Document for the XML that is equal to the exsiting Document.
`abstract java.lang.String`	`getDescription()` Return a description for the document being indexed, or null if none applies.
`java.lang.String`	`getDocGroup()` Gets the collection specifier, for example 'dcc', 'comet'.
`protected org.dom4j.Document`	`getDom4jDoc()` Gets the dom4j Document for use by sub-classes
`protected java.lang.String`	`getFieldContent(java.lang.String[] values, java.lang.String useVocabMapping, java.lang.String metadataFormat)` Gets the vocab encoded keys for the given values, separated by the '+' symbol.
`protected java.lang.String`	`getFieldContent(java.lang.String value, java.lang.String useVocabMapping, java.lang.String metadataFormat)` Gets the encoded vocab key for the given content.
`protected java.lang.String`	`getFieldName(java.lang.String vocabFieldString, java.lang.String metadataFormat)` Gets the field ID, for example 'gr', for a given vocab, for example 'gradeRange'.
`java.lang.String[]`	`getIds()` Returns the ids for the item being indexed.
`protected SimpleLuceneIndex`	`getIndex()` Gets the index used by this XML File Indexer
`protected ResultDocList`	`getMyAnnoResultDocs()` Gets the annotations for this record, null or zero length if none available.
`protected DleseCollectionDocReader`	`getMyCollectionDoc()` Gets the DLESECollectionDocReader for the collection in which this item is a part, or null if not available.
`static java.lang.String`	`getOaiModtime(java.io.File sourceFile, org.apache.lucene.document.Document existingDoc)` Gets the oaiModtime for the given File or Document, set to 3 minutes in the future to account for any delay in indexing updates.
`java.lang.String`	`getPrimaryId()` Returns the unique primary record ID for the item being indexed.
`protected RecordDataService`	`getRecordDataService()` Gets the recordDataService used by this XML File Indexer
`java.util.List`	`getRelatedIds()` Gets the ids of related records.
`java.util.Map`	`getRelatedIdsMap()` Gets the ids of related records.
`java.util.List`	`getRelatedUrls()` Gets the urls of related records.
`java.util.Map`	`getRelatedUrlsMap()` Gets the urls of related records.
`protected java.lang.String`	`getTermStringFromStringArray(java.lang.String[] vals)` Gets the appropriate terms from a string array of metadata fields.
`abstract java.lang.String`	`getTitle()` Return a title for the document being indexed, or null if none applies.
`abstract java.lang.String[]`	`getUrls()` Return the URL(s) to the resource being indexed, or null if none apply.
`protected abstract java.util.Date`	`getWhatsNewDate()` Returns the date used to determine "What's new" in the library, or null if none is available.
`protected abstract java.lang.String`	`getWhatsNewType()` Returns the type of category for "What's new" in the library, or null if none is available.
`protected XMLIndexer`	`getXmlIndexer()` Gets the XMLIndexer for use by sub-classes
`protected XMLIndexerFieldsConfig`	`getXmlIndexerFieldsConfig()` Gets the XMLIndexerFieldsConfig to use for XML indexing, or null if none available.
`abstract boolean`	`indexFullContentInDefaultAndStems()` Return true to have the full XML content indexed in the 'default' and 'stems' fields, false if handled by the sub-class.
`abstract void`	`init(java.io.File source, org.apache.lucene.document.Document existingDoc)` This method is called prior to processing and may be used to for any necessary set-up.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - XMLFileIndexingWriter
```
public XMLFileIndexingWriter()
```
    Constructor for the XMLFileIndexingWriter.
- Method Detail
  - getIds
```
public java.lang.String[] getIds()
                          throws java.lang.Exception
```
    Returns the ids for the item being indexed. If more than one record catalogs the same item, this represents the primary ID.
    
    Returns:
    
    The id String
    
    Throws:
    
    java.lang.Exception - If error
    
    See Also:
    
    getIds()
  - getPrimaryId
```
public java.lang.String getPrimaryId()
                              throws java.lang.Exception
```
    Returns the unique primary record ID for the item being indexed. If more than one record catalogs the same item, this represents the primary ID.
    
    Returns:
    
    The id String
    
    Throws:
    
    java.lang.Exception - If error
    
    See Also:
    
    getIds()
  - getRelatedIds
```
public java.util.List getRelatedIds()
                             throws java.lang.IllegalStateException,
                                    java.lang.Exception
```
    Gets the ids of related records.
    
    Returns:
    
    The related ids value, or null if none
    
    Throws:
    
    java.lang.IllegalStateException - If called prior to calling method #indexFields
    
    java.lang.Exception - If error
  - getRelatedUrls
```
public java.util.List getRelatedUrls()
                              throws java.lang.IllegalStateException,
                                     java.lang.Exception
```
    Gets the urls of related records.
    
    Returns:
    
    The related urls value, or null if none
    
    Throws:
    
    java.lang.IllegalStateException - If called prior to calling method #indexFields
    
    java.lang.Exception - If error
  - getRelatedIdsMap
```
public java.util.Map getRelatedIdsMap()
                               throws java.lang.IllegalStateException,
                                      java.lang.Exception
```
    Gets the ids of related records. The Map key contains the relationship (isAnnotatedBy, etc.) and the Map value contains a List of Strings that indicate the ids of the target records.
    
    Returns:
    
    The related ids value, or null if none
    
    Throws:
    
    java.lang.IllegalStateException - If called prior to calling method #indexFields
    
    java.lang.Exception - If error
  - getRelatedUrlsMap
```
public java.util.Map getRelatedUrlsMap()
                                throws java.lang.IllegalStateException,
                                       java.lang.Exception
```
    Gets the urls of related records. The Map key contains the relationship (isAnnotatedBy, etc.) and the Map value contains a List of Strings that indicate the urls of the target records.
    
    Returns:
    
    The related urls value, or null if none
    
    Throws:
    
    java.lang.IllegalStateException - If called prior to calling method #indexFields
    
    java.lang.Exception - If error
  - getCollections
```
protected java.lang.String[] getCollections()
                                     throws java.lang.Exception
```
    Returns unique collection keys for the item being indexed. For example "dcc" (single collection) or "dcc dwel" (multiple collections). If more than one collection is provided, the first one must be the primary collection. May be overridden by sub-classes as appropriate (overridden by ADNFileIndexingWriter).
    
    Returns:
    
    The collection keys
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - getDocGroup
```
public java.lang.String getDocGroup()
                             throws java.lang.Exception
```
    Gets the collection specifier, for example 'dcc', 'comet'.
    
    Specified by:
    
    getDocGroup in class FileIndexingServiceWriter
    
    Returns:
    
    The collection specifier
    
    Throws:
    
    java.lang.Exception - If error occured
  - getBoundingBox
```
protected BoundingBox getBoundingBox()
                              throws java.lang.Exception
```
    Return the geospatial BoundingBox footprint that represnets the resource being indexed, or null if none apply. Override if nessary.
    
    Returns:
    
    BoundingBox, or null
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - init
```
public abstract void init(java.io.File source,
                          org.apache.lucene.document.Document existingDoc)
                   throws java.lang.Exception
```
    This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.
    
    Specified by:
    
    init in class FileIndexingServiceWriter
    
    Parameters:
    
    source - The source file being indexed
    
    existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
    
    Throws:
    
    java.lang.Exception - If an error occured during set-up.
  - _getIds
```
protected abstract java.lang.String[] _getIds()
                                       throws java.lang.Exception
```
    Return unique IDs for the item being indexed, one for each collection that catalogs the resource. For example "DLESE-000-000-000-001" (single ID) or "DLESE-000-000-000-036 COMET-60" (multiple IDs). If more than one ID is present, the first one is the primary.
    
    Returns:
    
    The id(s)
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - getTitle
```
public abstract java.lang.String getTitle()
                                   throws java.lang.Exception
```
    Return a title for the document being indexed, or null if none applies. The String is tokenized, stored and indexed under the field key 'title' and is also indexed in the 'default' field.
    
    Returns:
    
    The title String
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - getDescription
```
public abstract java.lang.String getDescription()
                                         throws java.lang.Exception
```
    Return a description for the document being indexed, or null if none applies. The String is tokenized, stored and indexed under the field key 'description' and is also indexed in the 'default' field.
    
    Returns:
    
    The description String
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - getUrls
```
public abstract java.lang.String[] getUrls()
                                    throws java.lang.Exception
```
    Return the URL(s) to the resource being indexed, or null if none apply. If more than one URL references the resource, the first one is the primary. The URL Strings are tokenized and indexed under the field key 'uri' and is also indexed in the 'default' field. It is also stored in the index untokenized under the field key 'url.'
    
    Returns:
    
    The url String(s)
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - indexFullContentInDefaultAndStems
```
public abstract boolean indexFullContentInDefaultAndStems()
```
    Return true to have the full XML content indexed in the 'default' and 'stems' fields, false if handled by the sub-class. If true, the content is indexed using the #addToDefaultField method.
    
    Returns:
    
    True to have the full XML content indexed in the 'default' and 'stems'
  - getWhatsNewDate
```
protected abstract java.util.Date getWhatsNewDate()
                                           throws java.lang.Exception
```
    Returns the date used to determine "What's new" in the library, or null if none is available.
    
    Returns:
    
    The what's new date for the item or null if not available.
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - getWhatsNewType
```
protected abstract java.lang.String getWhatsNewType()
                                             throws java.lang.Exception
```
    Returns the type of category for "What's new" in the library, or null if none is available. Must be a simple lower case String with no spaces, for example 'itemnew,' 'itemannocomplete,' 'itemannoinprogress,' 'annocomplete,' 'annoinprogress,' 'collection'.
    
    Returns:
    
    The what's new type.
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - addFields
```
protected abstract void addFields(org.apache.lucene.document.Document newDoc,
                                  org.apache.lucene.document.Document existingDoc,
                                  java.io.File sourceFile)
                           throws java.lang.Exception
```
    Adds additional fields that are unique the document format being indexed. When implementing this method, use the add method of the Document class to add a Field.
    The following Lucene Field types are available for indexing with the Document:
    Field.Text(string name, string value) -- tokenized, indexed, stored
    Field.UnStored(string name, string value) -- tokenized, indexed, not stored
    Field.Keyword(string name, string value) -- not tokenized, indexed, stored
    Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
    Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do anything you want
    Example code:
    protected void addCustomFields(Document newDoc, Document existingDoc) throws Exception {
    String customContent = "Some content";
    newDoc.add(Field.Text("mycustomefield", customContent));
    }
    
    Parameters:
    
    newDoc - The new Document that is being created for this resource
    
    existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
    
    sourceFile - The sourceFile that is being indexed
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - addCustomFields
```
protected void addCustomFields(org.apache.lucene.document.Document newDoc,
                               org.apache.lucene.document.Document existingDoc,
                               java.io.File sourceFile)
                        throws java.lang.Exception
```
    Adds the full content of the XML to the default search field. Strips the XML tags to extract the content. Will not work properly if the XML is not well-formed.
    
    Specified by:
    
    addCustomFields in class FileIndexingServiceWriter
    
    Parameters:
    
    newDoc - The new Document that is being created for this resource
    
    existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
    
    sourceFile - The feature to be added to the CustomFields attribute
    
    Throws:
    
    java.lang.Exception - This method should throw and Exception with appropriate error message if an error occurs.
  - getDeletedDoc
```
public org.apache.lucene.document.Document getDeletedDoc(org.apache.lucene.document.Document existingDoc)
                                                  throws java.lang.Throwable
```
    Creates a Lucene Document for the XML that is equal to the exsiting Document.
    
    Overrides:
    
    getDeletedDoc in class FileIndexingServiceWriter
    
    Parameters:
    
    existingDoc - An existing FileIndexingService Document that currently resides in the index for the given file
    
    Returns:
    
    A Lucene FileIndexingService Document
    
    Throws:
    
    java.lang.Throwable - Thrown if error occurs
  - getMyAnnoResultDocs
```
protected ResultDocList getMyAnnoResultDocs()
                                     throws java.lang.Exception
```
    Gets the annotations for this record, null or zero length if none available.
    
    Returns:
    
    The myAnnoResultDocs value
    
    Throws:
    
    java.lang.Exception - NOT YET DOCUMENTED
  - getXmlIndexerFieldsConfig
```
protected XMLIndexerFieldsConfig getXmlIndexerFieldsConfig()
```
    Gets the XMLIndexerFieldsConfig to use for XML indexing, or null if none available.
    
    Returns:
    
    The xmlIndexerFieldsConfig value
  - getFieldContent
```
protected java.lang.String getFieldContent(java.lang.String[] values,
                                           java.lang.String useVocabMapping,
                                           java.lang.String metadataFormat)
                                    throws java.lang.Exception
```
    Gets the vocab encoded keys for the given values, separated by the '+' symbol.
    
    Parameters:
    
    values - The valuse to encode.
    
    useVocabMapping - The mapping to use, for example "contentStandards".
    
    metadataFormat - The metadata format, for example 'adn'
    
    Returns:
    
    The encoded vocab keys.
    
    Throws:
    
    java.lang.Exception - If error.
  - getFieldContent
```
protected java.lang.String getFieldContent(java.lang.String value,
                                           java.lang.String useVocabMapping,
                                           java.lang.String metadataFormat)
                                    throws java.lang.Exception
```
    Gets the encoded vocab key for the given content.
    
    Parameters:
    
    value - The value to encode
    
    useVocabMapping - The vocab mapping to use, for example 'contentStandard'
    
    metadataFormat - The metadata format, for example 'adn'
    
    Returns:
    
    The encoded value, or unchanged if unable to encode
    
    Throws:
    
    java.lang.Exception - If error
  - getFieldName
```
protected java.lang.String getFieldName(java.lang.String vocabFieldString,
                                        java.lang.String metadataFormat)
                                 throws java.lang.Exception
```
    Gets the field ID, for example 'gr', for a given vocab, for example 'gradeRange'. If unable to get the field ID, the vocab field String is returned unchanged.
    
    Parameters:
    
    vocabFieldString - The field, for example 'gradeRange'
    
    metadataFormat - The metadata format, for example 'adn'
    
    Returns:
    
    The field key, for example 'gr', or unchanged if unable to determine
    
    Throws:
    
    java.lang.Exception - If error
  - getTermStringFromStringArray
```
protected java.lang.String getTermStringFromStringArray(java.lang.String[] vals)
```
    Gets the appropriate terms from a string array of metadata fields. Uses all terms found after the last colon ":" found in the string.
    
    Parameters:
    
    vals - Metadata fields that must be delemited by colons.
    
    Returns:
    
    The individual terms used for indexing.
  - getXmlIndexer
```
protected XMLIndexer getXmlIndexer()
                            throws java.lang.Exception
```
    Gets the XMLIndexer for use by sub-classes
    
    Returns:
    
    The XMLIndexer
    
    Throws:
    
    java.lang.Exception - If error
  - getDom4jDoc
```
protected org.dom4j.Document getDom4jDoc()
                                  throws java.lang.Exception
```
    Gets the dom4j Document for use by sub-classes
    
    Returns:
    
    The Document
    
    Throws:
    
    java.lang.Exception - If error
  - getMyCollectionDoc
```
protected DleseCollectionDocReader getMyCollectionDoc()
```
    Gets the DLESECollectionDocReader for the collection in which this item is a part, or null if not available.
    
    Returns:
    
    The myCollectionDoc value
  - getOaiModtime
```
public static final java.lang.String getOaiModtime(java.io.File sourceFile,
                                                   org.apache.lucene.document.Document existingDoc)
```
    Gets the oaiModtime for the given File or Document, set to 3 minutes in the future to account for any delay in indexing updates.
    
    Parameters:
    
    sourceFile - The source file
    
    existingDoc - The existing Doc
    
    Returns:
    
    The oaiModtime value
  - getRecordDataService
```
protected RecordDataService getRecordDataService()
```
    Gets the recordDataService used by this XML File Indexer
    
    Returns:
    
    The recordDataService, or null if not available.
  - getIndex
```
protected SimpleLuceneIndex getIndex()
```
    Gets the index used by this XML File Indexer
    
    Returns:
    
    The index, or null if not available.

Class XMLFileIndexingWriter

Constructor Summary

Method Summary

Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter

Methods inherited from class java.lang.Object

Constructor Detail

XMLFileIndexingWriter

Method Detail

getIds

getPrimaryId

getRelatedIds

getRelatedUrls

getRelatedIdsMap

getRelatedUrlsMap

getCollections

getDocGroup

getBoundingBox

init

_getIds

getTitle

getDescription

getUrls

indexFullContentInDefaultAndStems

getWhatsNewDate

getWhatsNewType

addFields

addCustomFields

getDeletedDoc

getMyAnnoResultDocs

getXmlIndexerFieldsConfig

getFieldContent

getFieldContent

getFieldName

getTermStringFromStringArray

getXmlIndexer

getDom4jDoc

getMyCollectionDoc

getOaiModtime

getRecordDataService

getIndex