FileDocCategorySizeDatePackage
IndexReader.javaAPI DocApache Lucene 2.1.032280Wed Feb 14 10:46:40 GMT 2007org.apache.lucene.index

IndexReader

public abstract class IndexReader extends Object
IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.

Concrete subclasses of IndexReader are usually constructed with a call to one of the static open() methods, e.g. {@link #open(String)}.

For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral--they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.

An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to delete documents from the index then.

author
Doug Cutting
version
$Id: IndexReader.java 497612 2007-01-18 22:47:03Z mikemccand $

Fields Summary
private Directory
directory
private boolean
directoryOwner
private boolean
closeDirectory
protected IndexFileDeleter
deleter
private SegmentInfos
segmentInfos
private Lock
writeLock
private boolean
stale
private boolean
hasChanges
private boolean
rollbackHasChanges
Used by commit() to record pre-commit state in case rollback is necessary
private SegmentInfos
rollbackSegmentInfos
Constructors Summary
protected IndexReader(Directory directory)
Constructor used if IndexReader is not owner of its directory. This is used for IndexReaders that are used within other IndexReaders that take care or locking directories.

param
directory Directory where IndexReader files reside.

  

                                         
     
    this.directory = directory;
  
IndexReader(Directory directory, SegmentInfos segmentInfos, boolean closeDirectory)
Constructor used if IndexReader is owner of its directory. If IndexReader is owner of its directory, it locks its directory in case of write operations.

param
directory Directory where IndexReader files reside.
param
segmentInfos Used for write-l
param
closeDirectory

    init(directory, segmentInfos, closeDirectory, true);
  
Methods Summary
private voidaquireWriteLock()
Tries to acquire the WriteLock on this directory. this method is only valid if this IndexReader is directory owner.

throws
IOException If WriteLock cannot be acquired.

    if (stale)
      throw new IOException("IndexReader out of date and no longer valid for delete, undelete, or setNorm operations");

    if (writeLock == null) {
      Lock writeLock = directory.makeLock(IndexWriter.WRITE_LOCK_NAME);
      if (!writeLock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT)) // obtain write lock
        throw new IOException("Index locked for write: " + writeLock);
      this.writeLock = writeLock;

      // we have to check whether index has changed since this reader was opened.
      // if so, this reader is no longer valid for deletion
      if (SegmentInfos.readCurrentVersion(directory) > segmentInfos.getVersion()) {
        stale = true;
        this.writeLock.release();
        this.writeLock = null;
        throw new IOException("IndexReader out of date and no longer valid for delete, undelete, or setNorm operations");
      }
    }
  
public final synchronized voidclose()
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.

    commit();
    doClose();
    if(closeDirectory)
      directory.close();
  
protected final synchronized voidcommit()
Commit changes resulting from delete, undeleteAll, or setNorm operations If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).

throws
IOException

    if(hasChanges){
      if (deleter == null) {
        // In the MultiReader case, we share this deleter
        // across all SegmentReaders:
        setDeleter(new IndexFileDeleter(segmentInfos, directory));
      }
      if(directoryOwner){

        // Should not be necessary: no prior commit should
        // have left pending files, so just defensive:
        deleter.clearPendingFiles();

        String oldInfoFileName = segmentInfos.getCurrentSegmentFileName();
        String nextSegmentsFileName = segmentInfos.getNextSegmentFileName();

        // Checkpoint the state we are about to change, in
        // case we have to roll back:
        startCommit();

        boolean success = false;
        try {
          doCommit();
          segmentInfos.write(directory);
          success = true;
        } finally {

          if (!success) {

            // Rollback changes that were made to
            // SegmentInfos but failed to get [fully]
            // committed.  This way this reader instance
            // remains consistent (matched to what's
            // actually in the index):
            rollbackCommit();

            // Erase any pending files that we were going to delete:
            deleter.clearPendingFiles();

            // Remove possibly partially written next
            // segments file:
            deleter.deleteFile(nextSegmentsFileName);

            // Recompute deletable files & remove them (so
            // partially written .del files, etc, are
            // removed):
            deleter.findDeletableFiles();
            deleter.deleteFiles();
          }
        }

        // Attempt to delete all files we just obsoleted:
        deleter.deleteFile(oldInfoFileName);
        deleter.commitPendingFiles();

        if (writeLock != null) {
          writeLock.release();  // release write lock
          writeLock = null;
        }
      }
      else
        doCommit();
    }
    hasChanges = false;
  
public final synchronized voiddeleteDocument(int docNum)
Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the {@link #document} method will result in an error. The presence of this document may still be reflected in the {@link #docFreq} statistic, though this will be corrected eventually as the index is further modified.

    if(directoryOwner)
      aquireWriteLock();
    hasChanges = true;
    doDelete(docNum);
  
public final intdeleteDocuments(org.apache.lucene.index.Term term)
Deletes all documents that have a given term indexed. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See {@link #deleteDocument(int)} for information about when this deletion will become effective.

return
the number of documents deleted

    TermDocs docs = termDocs(term);
    if (docs == null) return 0;
    int n = 0;
    try {
      while (docs.next()) {
        deleteDocument(docs.doc());
        n++;
      }
    } finally {
      docs.close();
    }
    return n;
  
public org.apache.lucene.store.Directorydirectory()
Returns the directory this index resides in.

 return directory; 
protected abstract voiddoClose()
Implements close.

protected abstract voiddoCommit()
Implements commit.

protected abstract voiddoDelete(int docNum)
Implements deletion of the document numbered docNum. Applications should call {@link #deleteDocument(int)} or {@link #deleteDocuments(Term)}.

protected abstract voiddoSetNorm(int doc, java.lang.String field, byte value)
Implements setNorm in subclass.

protected abstract voiddoUndeleteAll()
Implements actual undeleteAll() in subclass.

public abstract intdocFreq(org.apache.lucene.index.Term t)
Returns the number of documents containing the term t.

public org.apache.lucene.document.Documentdocument(int n)
Returns the stored fields of the nth Document in this index.

    return document(n, null);
  
public abstract org.apache.lucene.document.Documentdocument(int n, org.apache.lucene.document.FieldSelector fieldSelector)
Get the {@link org.apache.lucene.document.Document} at the nth position. The {@link org.apache.lucene.document.FieldSelector} may be used to determine what {@link org.apache.lucene.document.Field}s to load and how they should be loaded. NOTE: If this Reader (more specifically, the underlying {@link FieldsReader} is closed before the lazy {@link org.apache.lucene.document.Field} is loaded an exception may be thrown. If you want the value of a lazy {@link org.apache.lucene.document.Field} to be available after closing you must explicitly load it or fetch the Document again with a new loader.

param
n Get the document at the nth position
param
fieldSelector The {@link org.apache.lucene.document.FieldSelector} to use to determine what Fields should be loaded on the Document. May be null, in which case all Fields will be loaded.
return
The stored fields of the {@link org.apache.lucene.document.Document} at the nth position
throws
IOException If there is a problem reading this document
see
org.apache.lucene.document.Fieldable
see
org.apache.lucene.document.FieldSelector
see
org.apache.lucene.document.SetBasedFieldSelector
see
org.apache.lucene.document.LoadFirstFieldSelector

protected voidfinalize()
Release the write lock, if needed.

    try {
      if (writeLock != null) {
        writeLock.release();                        // release write lock
        writeLock = null;
      }
    } finally {
      super.finalize();
    }
  
public static longgetCurrentVersion(java.lang.String directory)
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

param
directory where the index resides.
return
version number.
throws
IOException if segments file cannot be read

    return getCurrentVersion(new File(directory));
  
public static longgetCurrentVersion(java.io.File directory)
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

param
directory where the index resides.
return
version number.
throws
IOException if segments file cannot be read

    Directory dir = FSDirectory.getDirectory(directory);
    long version = getCurrentVersion(dir);
    dir.close();
    return version;
  
public static longgetCurrentVersion(org.apache.lucene.store.Directory directory)
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

param
directory where the index resides.
return
version number.
throws
IOException if segments file cannot be read.

    return SegmentInfos.readCurrentVersion(directory);
  
protected org.apache.lucene.index.IndexFileDeletergetDeleter()

    return deleter;
  
public abstract java.util.CollectiongetFieldNames(org.apache.lucene.index.IndexReader$FieldOption fldOption)
Get a list of unique field names that exist in this index and have the specified field option information.

param
fldOption specifies which field option should be available for the returned fields
return
Collection of Strings indicating the names of the fields.
see
IndexReader.FieldOption

public abstract org.apache.lucene.index.TermFreqVectorgetTermFreqVector(int docNumber, java.lang.String field)
Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionsVector is returned.

param
docNumber document for which the term frequency vector is returned
param
field field for which the term frequency vector is returned.
return
term frequency vector May be null if field does not exist in the specified document or term vector was not stored.
throws
IOException if index cannot be accessed
see
org.apache.lucene.document.Field.TermVector

public abstract org.apache.lucene.index.TermFreqVector[]getTermFreqVectors(int docNumber)
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned my either be of type TermFreqVector or of type TermPositionsVector if positions or offsets have been stored.

param
docNumber document for which term frequency vectors are returned
return
array of term frequency vectors. May be null if no term vectors have been stored for the specified document.
throws
IOException if index cannot be accessed
see
org.apache.lucene.document.Field.TermVector

public longgetVersion()
Version number when this IndexReader was opened.

    return segmentInfos.getVersion();
  
public abstract booleanhasDeletions()
Returns true if any documents have been deleted

public booleanhasNorms(java.lang.String field)
Returns true if there are norms stored for this field.

    // backward compatible implementation.
    // SegmentReader has an efficient implementation.
    return norms(field) != null;
  
public static booleanindexExists(java.lang.String directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. false is returned.

param
directory the directory to check for an index
return
true if an index exists; false otherwise

    return indexExists(new File(directory));
  
public static booleanindexExists(java.io.File directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

param
directory the directory to check for an index
return
true if an index exists; false otherwise

    return SegmentInfos.getCurrentSegmentGeneration(directory.list()) != -1;
  
public static booleanindexExists(org.apache.lucene.store.Directory directory)
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

param
directory the directory to check for an index
return
true if an index exists; false otherwise
throws
IOException if there is a problem with accessing the index

    return SegmentInfos.getCurrentSegmentGeneration(directory) != -1;
  
voidinit(org.apache.lucene.store.Directory directory, org.apache.lucene.index.SegmentInfos segmentInfos, boolean closeDirectory, boolean directoryOwner)

    this.directory = directory;
    this.segmentInfos = segmentInfos;
    this.directoryOwner = directoryOwner;
    this.closeDirectory = closeDirectory;
  
public booleanisCurrent()
Check whether this IndexReader still works on a current version of the index. If this is not the case you will need to re-open the IndexReader to make sure you see the latest changes made to the index.

throws
IOException

    return SegmentInfos.readCurrentVersion(directory) == segmentInfos.getVersion();
  
public abstract booleanisDeleted(int n)
Returns true if document n has been deleted

public static booleanisLocked(org.apache.lucene.store.Directory directory)
Returns true iff the index in the named directory is currently locked.

param
directory the directory to check for a lock
throws
IOException if there is a problem with accessing the index

    return
      directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked();
  
public static booleanisLocked(java.lang.String directory)
Returns true iff the index in the named directory is currently locked.

param
directory the directory to check for a lock
throws
IOException if there is a problem with accessing the index

    Directory dir = FSDirectory.getDirectory(directory);
    boolean result = isLocked(dir);
    dir.close();
    return result;
  
public booleanisOptimized()
Checks is the index is optimized (if it has a single segment and no deletions)

return
true if the index is optimized; false otherwise

      return segmentInfos.size() == 1 && hasDeletions() == false;
  
public static longlastModified(java.io.File fileDirectory)
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #isCurrent()} instead.

    return ((Long) new SegmentInfos.FindSegmentsFile(fileDirectory) {
        public Object doBody(String segmentFileName) {
          return new Long(FSDirectory.fileModified(fileDirectory, segmentFileName));
        }
      }.run()).longValue();
  
public static longlastModified(org.apache.lucene.store.Directory directory2)
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #isCurrent()} instead.

    return ((Long) new SegmentInfos.FindSegmentsFile(directory2) {
        public Object doBody(String segmentFileName) throws IOException {
          return new Long(directory2.fileModified(segmentFileName));
        }
      }.run()).longValue();
  
public static longlastModified(java.lang.String directory)
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #isCurrent()} instead.

    return lastModified(new File(directory));
  
public static voidmain(java.lang.String[] args)
Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored.

param
args Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile>

    String filename = null;
    boolean extract = false;

    for (int i = 0; i < args.length; ++i) {
      if (args[i].equals("-extract")) {
        extract = true;
      } else if (filename == null) {
        filename = args[i];
      }
    }

    if (filename == null) {
      System.out.println("Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile>");
      return;
    }

    Directory dir = null;
    CompoundFileReader cfr = null;

    try {
      File file = new File(filename);
      String dirname = file.getAbsoluteFile().getParent();
      filename = file.getName();
      dir = FSDirectory.getDirectory(dirname);
      cfr = new CompoundFileReader(dir, filename);

      String [] files = cfr.list();
      Arrays.sort(files);   // sort the array of filename so that the output is more readable

      for (int i = 0; i < files.length; ++i) {
        long len = cfr.fileLength(files[i]);

        if (extract) {
          System.out.println("extract " + files[i] + " with " + len + " bytes to local directory...");
          IndexInput ii = cfr.openInput(files[i]);

          FileOutputStream f = new FileOutputStream(files[i]);

          // read and write with a small buffer, which is more effectiv than reading byte by byte
          byte[] buffer = new byte[1024];
          int chunk = buffer.length;
          while(len > 0) {
            final int bufLen = (int) Math.min(chunk, len);
            ii.readBytes(buffer, 0, bufLen);
            f.write(buffer, 0, bufLen);
            len -= bufLen;
          }

          f.close();
          ii.close();
        }
        else
          System.out.println(files[i] + ": " + len + " bytes");
      }
    } catch (IOException ioe) {
      ioe.printStackTrace();
    }
    finally {
      try {
        if (dir != null)
          dir.close();
        if (cfr != null)
          cfr.close();
      }
      catch (IOException ioe) {
        ioe.printStackTrace();
      }
    }
  
public abstract intmaxDoc()
Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.

public abstract byte[]norms(java.lang.String field)
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

see
org.apache.lucene.document.Field#setBoost(float)

public abstract voidnorms(java.lang.String field, byte[] bytes, int offset)
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

see
org.apache.lucene.document.Field#setBoost(float)

public abstract intnumDocs()
Returns the number of documents in this index.

public static org.apache.lucene.index.IndexReaderopen(java.lang.String path)
Returns an IndexReader reading the index in an FSDirectory in the named path.

    return open(FSDirectory.getDirectory(path), true);
  
public static org.apache.lucene.index.IndexReaderopen(java.io.File path)
Returns an IndexReader reading the index in an FSDirectory in the named path.

    return open(FSDirectory.getDirectory(path), true);
  
public static org.apache.lucene.index.IndexReaderopen(org.apache.lucene.store.Directory directory)
Returns an IndexReader reading the index in the given Directory.

    return open(directory, false);
  
private static org.apache.lucene.index.IndexReaderopen(org.apache.lucene.store.Directory directory, boolean closeDirectory)


    return (IndexReader) new SegmentInfos.FindSegmentsFile(directory) {

      public Object doBody(String segmentFileName) throws IOException {

        SegmentInfos infos = new SegmentInfos();
        infos.read(directory, segmentFileName);

        if (infos.size() == 1) {		  // index is optimized
          return SegmentReader.get(infos, infos.info(0), closeDirectory);
        } else {

          // To reduce the chance of hitting FileNotFound
          // (and having to retry), we open segments in
          // reverse because IndexWriter merges & deletes
          // the newest segments first.

          IndexReader[] readers = new IndexReader[infos.size()];
          for (int i = infos.size()-1; i >= 0; i--) {
            try {
              readers[i] = SegmentReader.get(infos.info(i));
            } catch (IOException e) {
              // Close all readers we had opened:
              for(i++;i<infos.size();i++) {
                readers[i].close();
              }
              throw e;
            }
          }

          return new MultiReader(directory, infos, closeDirectory, readers);
        }
      }
    }.run();
  
voidrollbackCommit()
Rolls back state to just before the commit (this is called by commit() if there is some exception while committing).

    if (directoryOwner) {
      for(int i=0;i<segmentInfos.size();i++) {
        // Rollback each segmentInfo.  Because the
        // SegmentReader holds a reference to the
        // SegmentInfo we can't [easily] just replace
        // segmentInfos, so we reset it in place instead:
        segmentInfos.info(i).reset(rollbackSegmentInfos.info(i));
      }
      rollbackSegmentInfos = null;
    }

    hasChanges = rollbackHasChanges;
  
protected voidsetDeleter(org.apache.lucene.index.IndexFileDeleter deleter)

    this.deleter = deleter;
  
public final synchronized voidsetNorm(int doc, java.lang.String field, byte value)
Expert: Resets the normalization factor for the named field of the named document. The norm represents the product of the field's {@link Fieldable#setBoost(float) boost} and its {@link Similarity#lengthNorm(String, int) length normalization}. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.

see
#norms(String)
see
Similarity#decodeNorm(byte)

    if(directoryOwner)
      aquireWriteLock();
    hasChanges = true;
    doSetNorm(doc, field, value);
  
public voidsetNorm(int doc, java.lang.String field, float value)
Expert: Resets the normalization factor for the named field of the named document.

see
#norms(String)
see
Similarity#decodeNorm(byte)

    setNorm(doc, field, Similarity.encodeNorm(value));
  
voidstartCommit()
Should internally checkpoint state that will change during commit so that we can rollback if necessary.

    if (directoryOwner) {
      rollbackSegmentInfos = (SegmentInfos) segmentInfos.clone();
    }
    rollbackHasChanges = hasChanges;
  
public org.apache.lucene.index.TermDocstermDocs(org.apache.lucene.index.Term term)
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. Thus, this method implements the mapping:

    Term    =>    <docNum, freq>*

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.

    TermDocs termDocs = termDocs();
    termDocs.seek(term);
    return termDocs;
  
public abstract org.apache.lucene.index.TermDocstermDocs()
Returns an unpositioned {@link TermDocs} enumerator.

public org.apache.lucene.index.TermPositionstermPositions(org.apache.lucene.index.Term term)
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method implements the mapping:

    Term    =>    <docNum, freq, <pos1, pos2, ... posfreq-1> >*

This positional information faciliates phrase and proximity searching.

The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.

    TermPositions termPositions = termPositions();
    termPositions.seek(term);
    return termPositions;
  
public abstract org.apache.lucene.index.TermPositionstermPositions()
Returns an unpositioned {@link TermPositions} enumerator.

public abstract org.apache.lucene.index.TermEnumterms()
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

public abstract org.apache.lucene.index.TermEnumterms(org.apache.lucene.index.Term t)
Returns an enumeration of all terms after a given term. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

public final synchronized voidundeleteAll()
Undeletes all documents currently marked as deleted in this index.

    if(directoryOwner)
      aquireWriteLock();
    hasChanges = true;
    doUndeleteAll();
  
public static voidunlock(org.apache.lucene.store.Directory directory)
Forcibly unlocks the index in the named directory.

Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this index.

    directory.makeLock(IndexWriter.WRITE_LOCK_NAME).release();