File Doc Category Size Date Package
IndexReader.java API Doc Apache Lucene 1.4.3 22305 Wed Apr 21 18:46:30 BST 2004 org.apache.lucene.index

IndexReader

java.lang.Object

public abstract class IndexReader extends Object

IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.

Concrete subclasses of IndexReader are usually constructed with a call to the static method {@link #open}.

For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral--they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.

author: Doug Cutting
version: $Id: IndexReader.java,v 1.32 2004/04/21 16:46:30 goller Exp $

Fields Summary
private final Directory
directory
private final boolean
directoryOwner
private final SegmentInfos
segmentInfos
private Lock
writeLock
private boolean
stale
private boolean
hasChanges
private final boolean
closeDirectory
Constructors Summary
protected IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory. This is used for IndexReaders that are used within other IndexReaders that take care or locking directories.
param
directory Directory where IndexReader files reside.
this.directory = directory; segmentInfos = null; directoryOwner = false; closeDirectory = false; stale = false; hasChanges = false; writeLock = null;
IndexReader(Directory directory, SegmentInfos segmentInfos, boolean closeDirectory)
Constructor used if IndexReader is owner of its directory. If IndexReader is owner of its directory, it locks its directory in case of write operations.
param
directory Directory where IndexReader files reside.
param
segmentInfos Used for write-l
param
closeDirectory
this.directory = directory; this.segmentInfos = segmentInfos; directoryOwner = true; this.closeDirectory = closeDirectory; stale = false; hasChanges = false; writeLock = null;
Methods Summary
private void aquireWriteLock()
Trys to acquire the WriteLock on this directory. this method is only valid if this IndexReader is directory owner.
throws
IOException If WriteLock cannot be acquired.
if (stale) throw new IOException("IndexReader out of date and no longer valid for delete, undelete, or setNorm operations"); if (writeLock == null) { Lock writeLock = directory.makeLock(IndexWriter.WRITE_LOCK_NAME); if (!writeLock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT)) // obtain write lock throw new IOException("Index locked for write: " + writeLock); this.writeLock = writeLock; // we have to check whether index has changed since this reader was opened. // if so, this reader is no longer valid for deletion if (SegmentInfos.readCurrentVersion(directory) > segmentInfos.getVersion()) { stale = true; this.writeLock.release(); this.writeLock = null; throw new IOException("IndexReader out of date and no longer valid for delete, undelete, or setNorm operations"); } }
public final synchronized void close()
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.
commit(); doClose(); if(closeDirectory) directory.close();
protected final synchronized void commit()
Commit changes resulting from delete, undeleteAll, or setNorm operations
throws
IOException
if(hasChanges){ if(directoryOwner){ synchronized (directory) { // in- & inter-process sync new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), IndexWriter.COMMIT_LOCK_TIMEOUT) { public Object doBody() throws IOException { doCommit(); segmentInfos.write(directory); return null; } }.run(); } if (writeLock != null) { writeLock.release(); // release write lock writeLock = null; } } else doCommit(); } hasChanges = false;
public final synchronized void delete(int docNum)
Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the {@link #document} method will result in an error. The presence of this document may still be reflected in the {@link #docFreq} statistic, though this will be corrected eventually as the index is further modified.
if(directoryOwner) aquireWriteLock(); doDelete(docNum); hasChanges = true;
public final int delete(org.apache.lucene.index.Term term)
Deletes all documents containing term. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. Returns the number of documents deleted.
TermDocs docs = termDocs(term); if (docs == null) return 0; int n = 0; try { while (docs.next()) { delete(docs.doc()); n++; } } finally { docs.close(); } return n;
public org.apache.lucene.store.Directory directory()
Returns the directory this index resides in.
return directory;
protected abstract void doClose()
Implements close.
protected abstract void doCommit()
Implements commit.
protected abstract void doDelete(int docNum)
Implements deletion of the document numbered docNum. Applications should call {@link #delete(int)} or {@link #delete(Term)}.
protected abstract void doSetNorm(int doc, java.lang.String field, byte value)
Implements setNorm in subclass.
protected abstract void doUndeleteAll()
Implements actual undeleteAll() in subclass.
public abstract int docFreq(org.apache.lucene.index.Term t)
Returns the number of documents containing the term t.
public abstract org.apache.lucene.document.Document document(int n)
Returns the stored fields of the n^th Document in this index.
protected final void finalize()
Release the write lock, if needed.
if (writeLock != null) { writeLock.release(); // release write lock writeLock = null; }
public static long getCurrentVersion(java.lang.String directory)
Reads version number from segments files. The version number counts the number of changes of the index.
param
directory where the index resides.
return
version number.
throws
IOException if segments file cannot be read
return getCurrentVersion(new File(directory));
public static long getCurrentVersion(java.io.File directory)
Reads version number from segments files. The version number counts the number of changes of the index.
param
directory where the index resides.
return
version number.
throws
IOException if segments file cannot be read
Directory dir = FSDirectory.getDirectory(directory, false); long version = getCurrentVersion(dir); dir.close(); return version;
public static long getCurrentVersion(org.apache.lucene.store.Directory directory)
Reads version number from segments files. The version number counts the number of changes of the index.
param
directory where the index resides.
return
version number.
throws
IOException if segments file cannot be read.
return SegmentInfos.readCurrentVersion(directory);
public abstract java.util.Collection getFieldNames()
Returns a list of all unique field names that exist in the index pointed to by this IndexReader.
return
Collection of Strings indicating the names of the fields
throws
IOException if there is a problem with accessing the index
public abstract java.util.Collection getFieldNames(boolean indexed)
Returns a list of all unique field names that exist in the index pointed to by this IndexReader. The boolean argument specifies whether the fields returned are indexed or not.
param
indexed true if only indexed fields should be returned; false if only unindexed fields should be returned.
return
Collection of Strings indicating the names of the fields
throws
IOException if there is a problem with accessing the index
public abstract java.util.Collection getIndexedFieldNames(boolean storedTermVector)
param
storedTermVector if true, returns only Indexed fields that have term vector info, else only indexed fields without term vector info
return
Collection of Strings indicating the names of the fields
public abstract org.apache.lucene.index.TermFreqVector getTermFreqVector(int docNumber, java.lang.String field)
Return a term frequency vector for the specified document and field. The vector returned contains terms and frequencies for those terms in the specified field of this document, if the field had storeTermVector flag set. If the flag was not set, the method returns null.
see
Field#isTermVectorStored()
public abstract org.apache.lucene.index.TermFreqVector[] getTermFreqVectors(int docNumber)
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null.
see
Field#isTermVectorStored()
public abstract boolean hasDeletions()
Returns true if any documents have been deleted
public static boolean indexExists(java.lang.String directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. false is returned.
param
directory the directory to check for an index
return
true if an index exists; false otherwise
return (new File(directory, "segments")).exists();
public static boolean indexExists(java.io.File directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.
param
directory the directory to check for an index
return
true if an index exists; false otherwise
return (new File(directory, "segments")).exists();
public static boolean indexExists(org.apache.lucene.store.Directory directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.
param
directory the directory to check for an index
return
true if an index exists; false otherwise
throws
IOException if there is a problem with accessing the index
return directory.fileExists("segments");
public abstract boolean isDeleted(int n)
Returns true if document n has been deleted
public static boolean isLocked(org.apache.lucene.store.Directory directory)
Returns true iff the index in the named directory is currently locked.
param
directory the directory to check for a lock
throws
IOException if there is a problem with accessing the index
return directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked() || directory.makeLock(IndexWriter.COMMIT_LOCK_NAME).isLocked();
public static boolean isLocked(java.lang.String directory)
Returns true iff the index in the named directory is currently locked.
param
directory the directory to check for a lock
throws
IOException if there is a problem with accessing the index
Directory dir = FSDirectory.getDirectory(directory, false); boolean result = isLocked(dir); dir.close(); return result;
public static long lastModified(org.apache.lucene.store.Directory directory)
Returns the time the index in the named directory was last modified.
Synchronization of IndexReader and IndexWriter instances is no longer done via time stamps of the segments file since the time resolution depends on the hardware platform. Instead, a version number is maintained within the segments file, which is incremented everytime when the index is changed.
deprecated
Replaced by {@link #getCurrentVersion(Directory)}
return directory.fileModified("segments");
public static long lastModified(java.lang.String directory)
Returns the time the index in the named directory was last modified.
Synchronization of IndexReader and IndexWriter instances is no longer done via time stamps of the segments file since the time resolution depends on the hardware platform. Instead, a version number is maintained within the segments file, which is incremented everytime when the index is changed.
deprecated
Replaced by {@link #getCurrentVersion(String)}
return lastModified(new File(directory));
public static long lastModified(java.io.File directory)
Returns the time the index in the named directory was last modified.
Synchronization of IndexReader and IndexWriter instances is no longer done via time stamps of the segments file since the time resolution depends on the hardware platform. Instead, a version number is maintained within the segments file, which is incremented everytime when the index is changed.
deprecated
Replaced by {@link #getCurrentVersion(File)}
return FSDirectory.fileModified(directory, "segments");
public abstract int maxDoc()
Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.
public abstract byte[] norms(java.lang.String field)
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.
see
Field#setBoost(float)
public abstract void norms(java.lang.String field, byte[] bytes, int offset)
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.
see
Field#setBoost(float)
public abstract int numDocs()
Returns the number of documents in this index.
public static org.apache.lucene.index.IndexReader open(java.lang.String path)
Returns an IndexReader reading the index in an FSDirectory in the named path.
return open(FSDirectory.getDirectory(path, false), true);
public static org.apache.lucene.index.IndexReader open(java.io.File path)
Returns an IndexReader reading the index in an FSDirectory in the named path.
return open(FSDirectory.getDirectory(path, false), true);
public static org.apache.lucene.index.IndexReader open(org.apache.lucene.store.Directory directory)
Returns an IndexReader reading the index in the given Directory.
return open(directory, false);
private static org.apache.lucene.index.IndexReader open(org.apache.lucene.store.Directory directory, boolean closeDirectory)
synchronized (directory) { // in- & inter-process sync return (IndexReader)new Lock.With( directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), IndexWriter.COMMIT_LOCK_TIMEOUT) { public Object doBody() throws IOException { SegmentInfos infos = new SegmentInfos(); infos.read(directory); if (infos.size() == 1) { // index is optimized return new SegmentReader(infos, infos.info(0), closeDirectory); } else { IndexReader[] readers = new IndexReader[infos.size()]; for (int i = 0; i < infos.size(); i++) readers[i] = new SegmentReader(infos.info(i)); return new MultiReader(directory, infos, closeDirectory, readers); } } }.run(); }
public final synchronized void setNorm(int doc, java.lang.String field, byte value)
Expert: Resets the normalization factor for the named field of the named document. The norm represents the product of the field's {@link Field#setBoost(float) boost} and its {@link Similarity#lengthNorm(String, int) length normalization}. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.
see
#norms(String)
see
Similarity#decodeNorm(byte)
if(directoryOwner) aquireWriteLock(); doSetNorm(doc, field, value); hasChanges = true;
public void setNorm(int doc, java.lang.String field, float value)
Expert: Resets the normalization factor for the named field of the named document.
see
#norms(String)
see
Similarity#decodeNorm(byte)
setNorm(doc, field, Similarity.encodeNorm(value));
public org.apache.lucene.index.TermDocs termDocs(org.apache.lucene.index.Term term)
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. Thus, this method implements the mapping:
Term => <docNum, freq>^*

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.
TermDocs termDocs = termDocs(); termDocs.seek(term); return termDocs;
public abstract org.apache.lucene.index.TermDocs termDocs()
Returns an unpositioned {@link TermDocs} enumerator.
public org.apache.lucene.index.TermPositions termPositions(org.apache.lucene.index.Term term)
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method implements the mapping:
Term => <docNum, freq, <pos₁, pos₂, ... pos_freq-1> >^*

This positional information faciliates phrase and proximity searching.
The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.
TermPositions termPositions = termPositions(); termPositions.seek(term); return termPositions;
public abstract org.apache.lucene.index.TermPositions termPositions()
Returns an unpositioned {@link TermPositions} enumerator.
public abstract org.apache.lucene.index.TermEnum terms()
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.
public abstract org.apache.lucene.index.TermEnum terms(org.apache.lucene.index.Term t)
Returns an enumeration of all terms after a given term. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.
public final synchronized void undeleteAll()
Undeletes all documents currently marked as deleted in this index.
if(directoryOwner) aquireWriteLock(); doUndeleteAll(); hasChanges = true;
public static void unlock(org.apache.lucene.store.Directory directory)
Forcibly unlocks the index in the named directory.
Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this index.
directory.makeLock(IndexWriter.WRITE_LOCK_NAME).release(); directory.makeLock(IndexWriter.COMMIT_LOCK_NAME).release();