FileDocCategorySizeDatePackage
SymbolTable.javaAPI DocApache Xerces 3.0.114476Fri Sep 14 20:33:54 BST 2007org.apache.xerces.util

SymbolTable

public class SymbolTable extends Object
This class is a symbol table implementation that guarantees that strings used as identifiers are unique references. Multiple calls to addSymbol will always return the same string reference.

The symbol table performs the same task as String.intern() with the following differences:

  • A new string object does not need to be created in order to retrieve a unique reference. Symbols can be added by using a series of characters in a character array.
  • Users of the symbol table can provide their own symbol hashing implementation. For example, a simple string hashing algorithm may fail to produce a balanced set of hashcodes for symbols that are mostly unique. Strings with similar leading characters are especially prone to this poor hashing behavior.
An instance of SymbolTable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the SymbolTable, and the initial capacity is simply the capacity at the time the SymbolTable is created. Note that the SymbolTable is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. The load factor is a measure of how full the SymbolTable is allowed to get before its capacity is automatically increased. When the number of entries in the SymbolTable exceeds the product of the load factor and the current capacity, the capacity is increased by calling the rehash method.

Generally, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the time cost to look up an entry (which is reflected in most SymbolTable operations, including addSymbol and containsSymbol).

The initial capacity controls a tradeoff between wasted space and the need for rehash operations, which are time-consuming. No rehash operations will ever occur if the initial capacity is greater than the maximum number of entries the Hashtable will contain divided by its load factor. However, setting the initial capacity too high can waste space.

If many entries are to be made into a SymbolTable, creating it with a sufficiently large capacity may allow the entries to be inserted more efficiently than letting it perform automatic rehashing as needed to grow the table.

see
SymbolHash
author
Andy Clark
author
John Kim, IBM
version
$Id: SymbolTable.java 496157 2007-01-14 21:24:28Z mrglavas $

Fields Summary
protected static final int
TABLE_SIZE
Default table size.
protected Entry[]
fBuckets
Buckets.
protected int
fTableSize
actual table size
protected transient int
fCount
The total number of entries in the hash table.
protected int
fThreshold
The table is rehashed when its size exceeds this threshold. (The value of this field is (int)(capacity * loadFactor).)
protected float
fLoadFactor
The load factor for the SymbolTable.
Constructors Summary
public SymbolTable(int initialCapacity, float loadFactor)
Constructs a new, empty SymbolTable with the specified initial capacity and the specified load factor.

param
initialCapacity the initial capacity of the SymbolTable.
param
loadFactor the load factor of the SymbolTable.
throws
IllegalArgumentException if the initial capacity is less than zero, or if the load factor is nonpositive.


    //
    // Constructors
    //
    
                                                                                              
         
        
        if (initialCapacity < 0) {
            throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
        }
        
        if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
            throw new IllegalArgumentException("Illegal Load: " + loadFactor);
        }
        
        if (initialCapacity == 0) {
            initialCapacity = 1;
        }
        
        fLoadFactor = loadFactor;
        fTableSize = initialCapacity;
        fBuckets = new Entry[fTableSize];
        fThreshold = (int)(fTableSize * loadFactor);
        fCount = 0;
    
public SymbolTable(int initialCapacity)
Constructs a new, empty SymbolTable with the specified initial capacity and default load factor, which is 0.75.

param
initialCapacity the initial capacity of the hashtable.
throws
IllegalArgumentException if the initial capacity is less than zero.

        this(initialCapacity, 0.75f);
    
public SymbolTable()
Constructs a new, empty SymbolTable with a default initial capacity (101) and load factor, which is 0.75.

        this(TABLE_SIZE, 0.75f);
    
Methods Summary
public java.lang.StringaddSymbol(java.lang.String symbol)
Adds the specified symbol to the symbol table and returns a reference to the unique symbol. If the symbol already exists, the previous symbol reference is returned instead, in order guarantee that symbol references remain unique.

param
symbol The new symbol.

        
        // search for identical symbol
        int bucket = hash(symbol) % fTableSize;
        for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
            if (entry.symbol.equals(symbol)) {
                return entry.symbol;
            }
        }
        
        if (fCount >= fThreshold) {
            // Rehash the table if the threshold is exceeded
            rehash();
            bucket = hash(symbol) % fTableSize;
        } 
        
        // create new entry
        Entry entry = new Entry(symbol, fBuckets[bucket]);
        fBuckets[bucket] = entry;
        ++fCount;
        return entry.symbol;
        
    
public java.lang.StringaddSymbol(char[] buffer, int offset, int length)
Adds the specified symbol to the symbol table and returns a reference to the unique symbol. If the symbol already exists, the previous symbol reference is returned instead, in order guarantee that symbol references remain unique.

param
buffer The buffer containing the new symbol.
param
offset The offset into the buffer of the new symbol.
param
length The length of the new symbol in the buffer.

        
        // search for identical symbol
        int bucket = hash(buffer, offset, length) % fTableSize;
        OUTER: for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
            if (length == entry.characters.length) {
                for (int i = 0; i < length; i++) {
                    if (buffer[offset + i] != entry.characters[i]) {
                        continue OUTER;
                    }
                }
                return entry.symbol;
            }
        }
        
        if (fCount >= fThreshold) {
            // Rehash the table if the threshold is exceeded
            rehash();
            bucket = hash(buffer, offset, length) % fTableSize;
        } 
        
        // add new entry
        Entry entry = new Entry(buffer, offset, length, fBuckets[bucket]);
        fBuckets[bucket] = entry;
        ++fCount;
        return entry.symbol;
        
    
public booleancontainsSymbol(char[] buffer, int offset, int length)
Returns true if the symbol table already contains the specified symbol.

param
buffer The buffer containing the symbol to look for.
param
offset The offset into the buffer.
param
length The length of the symbol in the buffer.


        // search for identical symbol
        int bucket = hash(buffer, offset, length) % fTableSize;
        OUTER: for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
            if (length == entry.characters.length) {
                for (int i = 0; i < length; i++) {
                    if (buffer[offset + i] != entry.characters[i]) {
                        continue OUTER;
                    }
                }
                return true;
            }
        }

        return false;

    
public booleancontainsSymbol(java.lang.String symbol)
Returns true if the symbol table already contains the specified symbol.

param
symbol The symbol to look for.


        // search for identical symbol
        int bucket = hash(symbol) % fTableSize;
        int length = symbol.length();
        OUTER: for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
            if (length == entry.characters.length) {
                for (int i = 0; i < length; i++) {
                    if (symbol.charAt(i) != entry.characters[i]) {
                        continue OUTER;
                    }
                }
                return true;
            }
        }

        return false;

    
public inthash(java.lang.String symbol)
Returns a hashcode value for the specified symbol. The value returned by this method must be identical to the value returned by the hash(char[],int,int) method when called with the character array that comprises the symbol string.

param
symbol The symbol to hash.

        return symbol.hashCode() & 0x7FFFFFF;
    
public inthash(char[] buffer, int offset, int length)
Returns a hashcode value for the specified symbol information. The value returned by this method must be identical to the value returned by the hash(String) method when called with the string object created from the symbol information.

param
buffer The character buffer containing the symbol.
param
offset The offset into the character buffer of the start of the symbol.
param
length The length of the symbol.


        int code = 0;
        for (int i = 0; i < length; ++i) {
            code = code * 31 + buffer[offset + i];
        }
        return code & 0x7FFFFFF;

    
protected voidrehash()
Increases the capacity of and internally reorganizes this SymbolTable, in order to accommodate and access its entries more efficiently. This method is called automatically when the number of keys in the SymbolTable exceeds this hashtable's capacity and load factor.


        int oldCapacity = fBuckets.length;
        Entry[] oldTable = fBuckets;

        int newCapacity = oldCapacity * 2 + 1;
        Entry[] newTable = new Entry[newCapacity];

        fThreshold = (int)(newCapacity * fLoadFactor);
        fBuckets = newTable;
        fTableSize = fBuckets.length;

        for (int i = oldCapacity ; i-- > 0 ;) {
            for (Entry old = oldTable[i] ; old != null ; ) {
                Entry e = old;
                old = old.next;

                int index = hash(e.characters, 0, e.characters.length) % newCapacity;
                e.next = newTable[index];
                newTable[index] = e;
            }
        }