StringTokenizerpublic class StringTokenizer extends Object implements EnumerationThe string tokenizer class allows an application to break a
string into tokens. The tokenization method is much simpler than
the one used by the StreamTokenizer class. The
StringTokenizer methods do not distinguish among
identifiers, numbers, and quoted strings, nor do they recognize
and skip comments.
The set of delimiters (the characters that separate tokens) may
be specified either at creation time or on a per-token basis.
An instance of StringTokenizer behaves in one of two
ways, depending on whether it was created with the
returnDelims flag having the value true
or false :
- If the flag is
false , delimiter characters serve to
separate tokens. A token is a maximal sequence of consecutive
characters that are not delimiters.
- If the flag is
true , delimiter characters are themselves
considered to be tokens. A token is thus either one delimiter
character, or a maximal sequence of consecutive characters that are
not delimiters.
A StringTokenizer object internally maintains a current
position within the string to be tokenized. Some operations advance this
current position past the characters processed.
A token is returned by taking a substring of the string that was used to
create the StringTokenizer object.
The following is one example of the use of the tokenizer. The code:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
prints the following output:
this
is
a
test
StringTokenizer is a legacy class that is retained for
compatibility reasons although its use is discouraged in new code. It is
recommended that anyone seeking this functionality use the split
method of String or the java.util.regex package instead.
The following example illustrates how the String.split
method can be used to break up a string into its basic tokens:
String[] result = "this is a test".split("\\s");
for (int x=0; x<result.length; x++)
System.out.println(result[x]);
prints the following output:
this
is
a
test
|
Fields Summary |
---|
private int | currentPosition | private int | newPosition | private int | maxPosition | private String | str | private String | delimiters | private boolean | retDelims | private boolean | delimsChanged | private int | maxDelimCodePointmaxDelimCodePoint stores the value of the delimiter character with the
highest value. It is used to optimize the detection of delimiter
characters.
It is unlikely to provide any optimization benefit in the
hasSurrogates case because most string characters will be
smaller than the limit, but we keep it so that the two code
paths remain similar. | private boolean | hasSurrogatesIf delimiters include any surrogates (including surrogate
pairs), hasSurrogates is true and the tokenizer uses the
different code path. This is because String.indexOf(int)
doesn't handle unpaired surrogates as a single character. | private int[] | delimiterCodePointsWhen hasSurrogates is true, delimiters are converted to code
points and isDelimiter(int) is used to determine if the given
codepoint is a delimiter. |
Constructors Summary |
---|
public StringTokenizer(String str, String delim, boolean returnDelims)Constructs a string tokenizer for the specified string. All
characters in the delim argument are the delimiters
for separating tokens.
If the returnDelims flag is true , then
the delimiter characters are also returned as tokens. Each
delimiter is returned as a string of length one. If the flag is
false , the delimiter characters are skipped and only
serve as separators between tokens.
Note that if delim is null, this constructor does
not throw an exception. However, trying to invoke other methods on the
resulting StringTokenizer may result in a
NullPointerException.
currentPosition = 0;
newPosition = -1;
delimsChanged = false;
this.str = str;
maxPosition = str.length();
delimiters = delim;
retDelims = returnDelims;
setMaxDelimCodePoint();
| public StringTokenizer(String str, String delim)Constructs a string tokenizer for the specified string. The
characters in the delim argument are the delimiters
for separating tokens. Delimiter characters themselves will not
be treated as tokens.
Note that if delim is null, this constructor does
not throw an exception. However, trying to invoke other methods on the
resulting StringTokenizer may result in a
NullPointerException.
this(str, delim, false);
| public StringTokenizer(String str)Constructs a string tokenizer for the specified string. The
tokenizer uses the default delimiter set, which is
" \t\n\r\f" : the space character,
the tab character, the newline character, the carriage-return character,
and the form-feed character. Delimiter characters themselves will
not be treated as tokens.
this(str, " \t\n\r\f", false);
|
Methods Summary |
---|
public int | countTokens()Calculates the number of times that this tokenizer's
nextToken method can be called before it generates an
exception. The current position is not advanced.
int count = 0;
int currpos = currentPosition;
while (currpos < maxPosition) {
currpos = skipDelimiters(currpos);
if (currpos >= maxPosition)
break;
currpos = scanToken(currpos);
count++;
}
return count;
| public boolean | hasMoreElements()Returns the same value as the hasMoreTokens
method. It exists so that this class can implement the
Enumeration interface.
return hasMoreTokens();
| public boolean | hasMoreTokens()Tests if there are more tokens available from this tokenizer's string.
If this method returns true, then a subsequent call to
nextToken with no argument will successfully return a token.
/*
* Temporarily store this position and use it in the following
* nextToken() method only if the delimiters haven't been changed in
* that nextToken() invocation.
*/
newPosition = skipDelimiters(currentPosition);
return (newPosition < maxPosition);
| private boolean | isDelimiter(int codePoint)
for (int i = 0; i < delimiterCodePoints.length; i++) {
if (delimiterCodePoints[i] == codePoint) {
return true;
}
}
return false;
| public java.lang.Object | nextElement()Returns the same value as the nextToken method,
except that its declared return value is Object rather than
String . It exists so that this class can implement the
Enumeration interface.
return nextToken();
| public java.lang.String | nextToken(java.lang.String delim)Returns the next token in this string tokenizer's string. First,
the set of characters considered to be delimiters by this
StringTokenizer object is changed to be the characters in
the string delim. Then the next token in the string
after the current position is returned. The current position is
advanced beyond the recognized token. The new delimiter set
remains the default after this call.
delimiters = delim;
/* delimiter string specified, so set the appropriate flag. */
delimsChanged = true;
setMaxDelimCodePoint();
return nextToken();
| public java.lang.String | nextToken()Returns the next token from this string tokenizer.
/*
* If next position already computed in hasMoreElements() and
* delimiters have changed between the computation and this invocation,
* then use the computed value.
*/
currentPosition = (newPosition >= 0 && !delimsChanged) ?
newPosition : skipDelimiters(currentPosition);
/* Reset these anyway */
delimsChanged = false;
newPosition = -1;
if (currentPosition >= maxPosition)
throw new NoSuchElementException();
int start = currentPosition;
currentPosition = scanToken(currentPosition);
return str.substring(start, currentPosition);
| private int | scanToken(int startPos)Skips ahead from startPos and returns the index of the next delimiter
character encountered, or maxPosition if no such delimiter is found.
int position = startPos;
while (position < maxPosition) {
if (!hasSurrogates) {
char c = str.charAt(position);
if ((c <= maxDelimCodePoint) && (delimiters.indexOf(c) >= 0))
break;
position++;
} else {
int c = str.codePointAt(position);
if ((c <= maxDelimCodePoint) && isDelimiter(c))
break;
position += Character.charCount(c);
}
}
if (retDelims && (startPos == position)) {
if (!hasSurrogates) {
char c = str.charAt(position);
if ((c <= maxDelimCodePoint) && (delimiters.indexOf(c) >= 0))
position++;
} else {
int c = str.codePointAt(position);
if ((c <= maxDelimCodePoint) && isDelimiter(c))
position += Character.charCount(c);
}
}
return position;
| private void | setMaxDelimCodePoint()Set maxDelimCodePoint to the highest char in the delimiter set.
if (delimiters == null) {
maxDelimCodePoint = 0;
return;
}
int m = 0;
int c;
int count = 0;
for (int i = 0; i < delimiters.length(); i += Character.charCount(c)) {
c = delimiters.charAt(i);
if (c >= Character.MIN_HIGH_SURROGATE && c <= Character.MAX_LOW_SURROGATE) {
c = delimiters.codePointAt(i);
hasSurrogates = true;
}
if (m < c)
m = c;
count++;
}
maxDelimCodePoint = m;
if (hasSurrogates) {
delimiterCodePoints = new int[count];
for (int i = 0, j = 0; i < count; i++, j += Character.charCount(c)) {
c = delimiters.codePointAt(j);
delimiterCodePoints[i] = c;
}
}
| private int | skipDelimiters(int startPos)Skips delimiters starting from the specified position. If retDelims
is false, returns the index of the first non-delimiter character at or
after startPos. If retDelims is true, startPos is returned.
if (delimiters == null)
throw new NullPointerException();
int position = startPos;
while (!retDelims && position < maxPosition) {
if (!hasSurrogates) {
char c = str.charAt(position);
if ((c > maxDelimCodePoint) || (delimiters.indexOf(c) < 0))
break;
position++;
} else {
int c = str.codePointAt(position);
if ((c > maxDelimCodePoint) || !isDelimiter(c)) {
break;
}
position += Character.charCount(c);
}
}
return position;
|
|