SymbolTablepublic class SymbolTable extends Object This class is a symbol table implementation that guarantees that
strings used as identifiers are unique references. Multiple calls
to addSymbol will always return the same string
reference.
The symbol table performs the same task as String.intern()
with the following differences:
-
A new string object does not need to be created in order to
retrieve a unique reference. Symbols can be added by using
a series of characters in a character array.
-
Users of the symbol table can provide their own symbol hashing
implementation. For example, a simple string hashing algorithm
may fail to produce a balanced set of hashcodes for symbols
that are mostly unique. Strings with similar leading
characters are especially prone to this poor hashing behavior.
An instance of SymbolTable has two parameters that affect its
performance: initial capacity and load factor. The
capacity is the number of buckets in the SymbolTable, and the
initial capacity is simply the capacity at the time the SymbolTable
is created. Note that the SymbolTable is open: in the case of a "hash
collision", a single bucket stores multiple entries, which must be searched
sequentially. The load factor is a measure of how full the SymbolTable
is allowed to get before its capacity is automatically increased.
When the number of entries in the SymbolTable exceeds the product of the load
factor and the current capacity, the capacity is increased by calling the
rehash method.
Generally, the default load factor (.75) offers a good tradeoff between
time and space costs. Higher values decrease the space overhead but
increase the time cost to look up an entry (which is reflected in most
SymbolTable operations, including addSymbol and containsSymbol).
The initial capacity controls a tradeoff between wasted space and the
need for rehash operations, which are time-consuming.
No rehash operations will ever occur if the initial
capacity is greater than the maximum number of entries the
Hashtable will contain divided by its load factor. However,
setting the initial capacity too high can waste space.
If many entries are to be made into a SymbolTable ,
creating it with a sufficiently large capacity may allow the
entries to be inserted more efficiently than letting it perform
automatic rehashing as needed to grow the table. |
Fields Summary |
---|
protected static final int | TABLE_SIZEDefault table size. | protected Entry[] | fBucketsBuckets. | protected int | fTableSizeactual table size | protected transient int | fCountThe total number of entries in the hash table. | protected int | fThresholdThe table is rehashed when its size exceeds this threshold. (The
value of this field is (int)(capacity * loadFactor).) | protected float | fLoadFactorThe load factor for the SymbolTable. |
Constructors Summary |
---|
public SymbolTable(int initialCapacity, float loadFactor)Constructs a new, empty SymbolTable with the specified initial
capacity and the specified load factor.
//
// Constructors
//
if (initialCapacity < 0) {
throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
}
if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
throw new IllegalArgumentException("Illegal Load: " + loadFactor);
}
if (initialCapacity == 0) {
initialCapacity = 1;
}
fLoadFactor = loadFactor;
fTableSize = initialCapacity;
fBuckets = new Entry[fTableSize];
fThreshold = (int)(fTableSize * loadFactor);
fCount = 0;
| public SymbolTable(int initialCapacity)Constructs a new, empty SymbolTable with the specified initial capacity
and default load factor, which is 0.75.
this(initialCapacity, 0.75f);
| public SymbolTable()Constructs a new, empty SymbolTable with a default initial capacity (101)
and load factor, which is 0.75.
this(TABLE_SIZE, 0.75f);
|
Methods Summary |
---|
public java.lang.String | addSymbol(java.lang.String symbol)Adds the specified symbol to the symbol table and returns a
reference to the unique symbol. If the symbol already exists,
the previous symbol reference is returned instead, in order
guarantee that symbol references remain unique.
// search for identical symbol
int bucket = hash(symbol) % fTableSize;
for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
if (entry.symbol.equals(symbol)) {
return entry.symbol;
}
}
if (fCount >= fThreshold) {
// Rehash the table if the threshold is exceeded
rehash();
bucket = hash(symbol) % fTableSize;
}
// create new entry
Entry entry = new Entry(symbol, fBuckets[bucket]);
fBuckets[bucket] = entry;
++fCount;
return entry.symbol;
| public java.lang.String | addSymbol(char[] buffer, int offset, int length)Adds the specified symbol to the symbol table and returns a
reference to the unique symbol. If the symbol already exists,
the previous symbol reference is returned instead, in order
guarantee that symbol references remain unique.
// search for identical symbol
int bucket = hash(buffer, offset, length) % fTableSize;
OUTER: for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
if (length == entry.characters.length) {
for (int i = 0; i < length; i++) {
if (buffer[offset + i] != entry.characters[i]) {
continue OUTER;
}
}
return entry.symbol;
}
}
if (fCount >= fThreshold) {
// Rehash the table if the threshold is exceeded
rehash();
bucket = hash(buffer, offset, length) % fTableSize;
}
// add new entry
Entry entry = new Entry(buffer, offset, length, fBuckets[bucket]);
fBuckets[bucket] = entry;
++fCount;
return entry.symbol;
| public boolean | containsSymbol(char[] buffer, int offset, int length)Returns true if the symbol table already contains the specified
symbol.
// search for identical symbol
int bucket = hash(buffer, offset, length) % fTableSize;
OUTER: for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
if (length == entry.characters.length) {
for (int i = 0; i < length; i++) {
if (buffer[offset + i] != entry.characters[i]) {
continue OUTER;
}
}
return true;
}
}
return false;
| public boolean | containsSymbol(java.lang.String symbol)Returns true if the symbol table already contains the specified
symbol.
// search for identical symbol
int bucket = hash(symbol) % fTableSize;
int length = symbol.length();
OUTER: for (Entry entry = fBuckets[bucket]; entry != null; entry = entry.next) {
if (length == entry.characters.length) {
for (int i = 0; i < length; i++) {
if (symbol.charAt(i) != entry.characters[i]) {
continue OUTER;
}
}
return true;
}
}
return false;
| public int | hash(java.lang.String symbol)Returns a hashcode value for the specified symbol. The value
returned by this method must be identical to the value returned
by the hash(char[],int,int) method when called
with the character array that comprises the symbol string.
return symbol.hashCode() & 0x7FFFFFF;
| public int | hash(char[] buffer, int offset, int length)Returns a hashcode value for the specified symbol information.
The value returned by this method must be identical to the value
returned by the hash(String) method when called
with the string object created from the symbol information.
int code = 0;
for (int i = 0; i < length; ++i) {
code = code * 31 + buffer[offset + i];
}
return code & 0x7FFFFFF;
| protected void | rehash()Increases the capacity of and internally reorganizes this
SymbolTable, in order to accommodate and access its entries more
efficiently. This method is called automatically when the
number of keys in the SymbolTable exceeds this hashtable's capacity
and load factor.
int oldCapacity = fBuckets.length;
Entry[] oldTable = fBuckets;
int newCapacity = oldCapacity * 2 + 1;
Entry[] newTable = new Entry[newCapacity];
fThreshold = (int)(newCapacity * fLoadFactor);
fBuckets = newTable;
fTableSize = fBuckets.length;
for (int i = oldCapacity ; i-- > 0 ;) {
for (Entry old = oldTable[i] ; old != null ; ) {
Entry e = old;
old = old.next;
int index = hash(e.characters, 0, e.characters.length) % newCapacity;
e.next = newTable[index];
newTable[index] = e;
}
}
|
|