IndexWriterpublic class IndexWriter extends Object An IndexWriter creates and maintains an index.
The third argument to the constructor
determines whether a new index is created, or whether an existing index is
opened for the addition of new documents.
In either case, documents are added with the addDocument method. When finished adding
documents, close should be called.
If an index will not have more documents added for a while and optimal search
performance is desired, then the optimize
method should be called before the index is closed. |
Fields Summary |
---|
public static long | WRITE_LOCK_TIMEOUTDefault value is 1000. Use org.apache.lucene.writeLockTimeout
system property to override. | public static long | COMMIT_LOCK_TIMEOUTDefault value is 10000. Use org.apache.lucene.commitLockTimeout
system property to override. | public static final String | WRITE_LOCK_NAME | public static final String | COMMIT_LOCK_NAME | public static final int | DEFAULT_MERGE_FACTORDefault value is 10. Use org.apache.lucene.mergeFactor
system property to override. | public static final int | DEFAULT_MIN_MERGE_DOCSDefault value is 10. Use org.apache.lucene.minMergeDocs
system property to override. | public static final int | DEFAULT_MAX_MERGE_DOCSDefault value is {@link Integer#MAX_VALUE}.
Use org.apache.lucene.maxMergeDocs system property to override. | public static final int | DEFAULT_MAX_FIELD_LENGTHDefault value is 10000. Use org.apache.lucene.maxFieldLength
system property to override. | private Directory | directory | private Analyzer | analyzer | private Similarity | similarity | private SegmentInfos | segmentInfos | private final Directory | ramDirectory | private Lock | writeLock | private boolean | useCompoundFileUse compound file setting. Defaults to true, minimizing the number of
files used. Setting this to false may improve indexing performance, but
may also cause file handle problems. | private boolean | closeDir | public int | maxFieldLengthThe maximum number of terms that will be indexed for a single field in a
document. This limits the amount of memory required for indexing, so that
collections with very large files will not crash the indexing process by
running out of memory.
Note that this effectively truncates large documents, excluding from the
index terms that occur further in the document. If you know your source
documents are large, be sure to set this value high enough to accomodate
the expected size. If you set it to Integer.MAX_VALUE, then the only limit
is your memory, but you should anticipate an OutOfMemoryError.
By default, no more than 10,000 terms will be indexed for a field. | public int | mergeFactorDetermines how often segment indices are merged by addDocument(). With
smaller values, less RAM is used while indexing, and searches on
unoptimized indices are faster, but indexing speed is slower. With larger
values, more RAM is used during indexing, and while searches on unoptimized
indices are slower, indexing is faster. Thus larger values (> 10) are best
for batch index creation, and smaller values (< 10) for indices that are
interactively maintained.
This must never be less than 2. The default value is 10. | public int | minMergeDocsDetermines the minimal number of documents required before the buffered
in-memory documents are merging and a new Segment is created.
Since Documents are merged in a {@link org.apache.lucene.store.RAMDirectory},
large value gives faster indexing. At the same time, mergeFactor limits
the number of files open in a FSDirectory.
The default value is 10. | public int | maxMergeDocsDetermines the largest number of documents ever merged by addDocument().
Small values (e.g., less than 10,000) are best for interactive indexing,
as this limits the length of pauses while indexing to a few seconds.
Larger values are best for batched indexing and speedier searches.
The default value is {@link Integer#MAX_VALUE}. | public PrintStream | infoStreamIf non-null, information about merges will be printed to this. |
Constructors Summary |
---|
public IndexWriter(String path, Analyzer a, boolean create)Constructs an IndexWriter for the index in path .
Text will be analyzed with a . If create
is true, then a new, empty index will be created in
path , replacing the index already there, if any.
this(FSDirectory.getDirectory(path, create), a, create, true);
| public IndexWriter(File path, Analyzer a, boolean create)Constructs an IndexWriter for the index in path .
Text will be analyzed with a . If create
is true, then a new, empty index will be created in
path , replacing the index already there, if any.
this(FSDirectory.getDirectory(path, create), a, create, true);
| public IndexWriter(Directory d, Analyzer a, boolean create)Constructs an IndexWriter for the index in d .
Text will be analyzed with a . If create
is true, then a new, empty index will be created in
d , replacing the index already there, if any.
this(d, a, create, false);
| private IndexWriter(Directory d, Analyzer a, boolean create, boolean closeDir)
this.closeDir = closeDir;
directory = d;
analyzer = a;
Lock writeLock = directory.makeLock(IndexWriter.WRITE_LOCK_NAME);
if (!writeLock.obtain(WRITE_LOCK_TIMEOUT)) // obtain write lock
throw new IOException("Index locked for write: " + writeLock);
this.writeLock = writeLock; // save it
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
public Object doBody() throws IOException {
if (create)
segmentInfos.write(directory);
else
segmentInfos.read(directory);
return null;
}
}.run();
}
|
Methods Summary |
---|
public void | addDocument(org.apache.lucene.document.Document doc)Adds a document to this index. If the document contains more than
{@link #maxFieldLength} terms for a given field, the remainder are
discarded.
addDocument(doc, analyzer);
| public void | addDocument(org.apache.lucene.document.Document doc, org.apache.lucene.analysis.Analyzer analyzer)Adds a document to this index, using the provided analyzer instead of the
value of {@link #getAnalyzer()}. If the document contains more than
{@link #maxFieldLength} terms for a given field, the remainder are
discarded.
DocumentWriter dw =
new DocumentWriter(ramDirectory, analyzer, similarity, maxFieldLength);
String segmentName = newSegmentName();
dw.addDocument(segmentName, doc);
synchronized (this) {
segmentInfos.addElement(new SegmentInfo(segmentName, 1, ramDirectory));
maybeMergeSegments();
}
| public synchronized void | addIndexes(org.apache.lucene.store.Directory[] dirs)Merges all segments from an array of indexes into this index.
This may be used to parallelize batch indexing. A large document
collection can be broken into sub-collections. Each sub-collection can be
indexed in parallel, on a different thread, process or machine. The
complete index can then be created by merging sub-collection indexes
with this method.
After this completes, the index is optimized.
optimize(); // start with zero or 1 seg
for (int i = 0; i < dirs.length; i++) {
SegmentInfos sis = new SegmentInfos(); // read infos from dir
sis.read(dirs[i]);
for (int j = 0; j < sis.size(); j++) {
segmentInfos.addElement(sis.info(j)); // add each info
}
}
optimize(); // final cleanup
| public synchronized void | addIndexes(org.apache.lucene.index.IndexReader[] readers)Merges the provided indexes into this index.
After this completes, the index is optimized.
The provided IndexReaders are not closed.
optimize(); // start with zero or 1 seg
String mergedName = newSegmentName();
SegmentMerger merger = new SegmentMerger(directory, mergedName, false);
if (segmentInfos.size() == 1) // add existing index, if any
merger.add(new SegmentReader(segmentInfos.info(0)));
for (int i = 0; i < readers.length; i++) // add new indexes
merger.add(readers[i]);
int docCount = merger.merge(); // merge 'em
segmentInfos.setSize(0); // pop old infos & add new
segmentInfos.addElement(new SegmentInfo(mergedName, docCount, directory));
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock("commit.lock"), COMMIT_LOCK_TIMEOUT) {
public Object doBody() throws IOException {
segmentInfos.write(directory); // commit changes
return null;
}
}.run();
}
| public synchronized void | close()Flushes all changes to an index and closes all associated files.
flushRamSegments();
ramDirectory.close();
writeLock.release(); // release write lock
writeLock = null;
if(closeDir)
directory.close();
| private final void | deleteFiles(java.util.Vector files, org.apache.lucene.store.Directory directory)
for (int i = 0; i < files.size(); i++)
directory.deleteFile((String)files.elementAt(i));
| private final void | deleteFiles(java.util.Vector files, java.util.Vector deletable)
for (int i = 0; i < files.size(); i++) {
String file = (String)files.elementAt(i);
try {
directory.deleteFile(file); // try to delete each file
} catch (IOException e) { // if delete fails
if (directory.fileExists(file)) {
if (infoStream != null)
infoStream.println(e.getMessage() + "; Will re-try later.");
deletable.addElement(file); // add to deletable
}
}
}
| private final void | deleteSegments(java.util.Vector segments)
Vector deletable = new Vector();
deleteFiles(readDeleteableFiles(), deletable); // try to delete deleteable
for (int i = 0; i < segments.size(); i++) {
SegmentReader reader = (SegmentReader)segments.elementAt(i);
if (reader.directory() == this.directory)
deleteFiles(reader.files(), deletable); // try to delete our files
else
deleteFiles(reader.files(), reader.directory()); // delete other files
}
writeDeleteableFiles(deletable); // note files we can't delete
| public synchronized int | docCount()Returns the number of documents currently in this index.
int count = 0;
for (int i = 0; i < segmentInfos.size(); i++) {
SegmentInfo si = segmentInfos.info(i);
count += si.docCount;
}
return count;
| protected void | finalize()Release the write lock, if needed.
if (writeLock != null) {
writeLock.release(); // release write lock
writeLock = null;
}
| private final void | flushRamSegments()Merges all RAM-resident segments.
int minSegment = segmentInfos.size()-1;
int docCount = 0;
while (minSegment >= 0 &&
(segmentInfos.info(minSegment)).dir == ramDirectory) {
docCount += segmentInfos.info(minSegment).docCount;
minSegment--;
}
if (minSegment < 0 || // add one FS segment?
(docCount + segmentInfos.info(minSegment).docCount) > mergeFactor ||
!(segmentInfos.info(segmentInfos.size()-1).dir == ramDirectory))
minSegment++;
if (minSegment >= segmentInfos.size())
return; // none to merge
mergeSegments(minSegment);
| public org.apache.lucene.analysis.Analyzer | getAnalyzer()Returns the analyzer used by this index.
return analyzer;
| final int | getSegmentsCounter()
return segmentInfos.counter;
| public org.apache.lucene.search.Similarity | getSimilarity()Expert: Return the Similarity implementation used by this IndexWriter.
This defaults to the current value of {@link Similarity#getDefault()}.
return this.similarity;
| public boolean | getUseCompoundFile()Setting to turn on usage of a compound file. When on, multiple files
for each segment are merged into a single file once the segment creation
is finished. This is done regardless of what directory is in use.
return useCompoundFile;
| private final void | maybeMergeSegments()Incremental segment merger.
long targetMergeDocs = minMergeDocs;
while (targetMergeDocs <= maxMergeDocs) {
// find segments smaller than current target size
int minSegment = segmentInfos.size();
int mergeDocs = 0;
while (--minSegment >= 0) {
SegmentInfo si = segmentInfos.info(minSegment);
if (si.docCount >= targetMergeDocs)
break;
mergeDocs += si.docCount;
}
if (mergeDocs >= targetMergeDocs) // found a merge to do
mergeSegments(minSegment+1);
else
break;
targetMergeDocs *= mergeFactor; // increase target size
}
| private final void | mergeSegments(int minSegment)Pops segments off of segmentInfos stack down to minSegment, merges them,
and pushes the merged index onto the top of the segmentInfos stack.
String mergedName = newSegmentName();
if (infoStream != null) infoStream.print("merging segments");
SegmentMerger merger =
new SegmentMerger(directory, mergedName, useCompoundFile);
final Vector segmentsToDelete = new Vector();
for (int i = minSegment; i < segmentInfos.size(); i++) {
SegmentInfo si = segmentInfos.info(i);
if (infoStream != null)
infoStream.print(" " + si.name + " (" + si.docCount + " docs)");
IndexReader reader = new SegmentReader(si);
merger.add(reader);
if ((reader.directory() == this.directory) || // if we own the directory
(reader.directory() == this.ramDirectory))
segmentsToDelete.addElement(reader); // queue segment for deletion
}
int mergedDocCount = merger.merge();
if (infoStream != null) {
infoStream.println(" into "+mergedName+" ("+mergedDocCount+" docs)");
}
segmentInfos.setSize(minSegment); // pop old infos & add new
segmentInfos.addElement(new SegmentInfo(mergedName, mergedDocCount,
directory));
// close readers before we attempt to delete now-obsolete segments
merger.closeReaders();
synchronized (directory) { // in- & inter-process sync
new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
public Object doBody() throws IOException {
segmentInfos.write(directory); // commit before deleting
deleteSegments(segmentsToDelete); // delete now-unused segments
return null;
}
}.run();
}
| private final synchronized java.lang.String | newSegmentName()
return "_" + Integer.toString(segmentInfos.counter++, Character.MAX_RADIX);
| public synchronized void | optimize()Merges all segments together into a single segment, optimizing an index
for search.
flushRamSegments();
while (segmentInfos.size() > 1 ||
(segmentInfos.size() == 1 &&
(SegmentReader.hasDeletions(segmentInfos.info(0)) ||
segmentInfos.info(0).dir != directory ||
(useCompoundFile &&
(!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
SegmentReader.hasSeparateNorms(segmentInfos.info(0))))))) {
int minSegment = segmentInfos.size() - mergeFactor;
mergeSegments(minSegment < 0 ? 0 : minSegment);
}
| private final java.util.Vector | readDeleteableFiles()
Vector result = new Vector();
if (!directory.fileExists("deletable"))
return result;
InputStream input = directory.openFile("deletable");
try {
for (int i = input.readInt(); i > 0; i--) // read file names
result.addElement(input.readString());
} finally {
input.close();
}
return result;
| public void | setSimilarity(org.apache.lucene.search.Similarity similarity)Expert: Set the Similarity implementation used by this IndexWriter.
this.similarity = similarity;
| public void | setUseCompoundFile(boolean value)Setting to turn on usage of a compound file. When on, multiple files
for each segment are merged into a single file once the segment creation
is finished. This is done regardless of what directory is in use.
useCompoundFile = value;
| private final void | writeDeleteableFiles(java.util.Vector files)
OutputStream output = directory.createFile("deleteable.new");
try {
output.writeInt(files.size());
for (int i = 0; i < files.size(); i++)
output.writeString((String)files.elementAt(i));
} finally {
output.close();
}
directory.renameFile("deleteable.new", "deletable");
|
|