File Doc Category Size Date Package
RemoveHTMLReader.java API Doc Example 3974 Sat Jan 24 10:44:26 GMT 2004 je3.io

RemoveHTMLReader

java.lang.Object
- java.io.Reader
  - java.io.FilterReader

public class RemoveHTMLReader extends FilterReader

A simple FilterReader that strips HTML tags (or anything between pairs of angle brackets) out of a stream of characters.

Fields Summary
boolean
intag
Constructors Summary
public RemoveHTMLReader(Reader in)
A trivial constructor. Just initialize our superclass
super(in);
Methods Summary
public int read(char[] buf, int from, int len)
This is the implementation of the no-op read() method of FilterReader. It calls in.read() to get a buffer full of characters, then strips out the HTML tags. (in is a protected field of the superclass).
// Used to remember whether we are "inside" a tag int numchars = 0; // how many characters have been read // Loop, because we might read a bunch of characters, then strip them // all out, leaving us with zero characters to return. while (numchars == 0) { numchars = in.read(buf, from, len); // Read characters if (numchars == -1) return -1; // Check for EOF and handle it. // Loop through the characters we read, stripping out HTML tags. // Characters not in tags are copied over previous tags int last = from; // Index of last non-HTML char for(int i = from; i < from + numchars; i++) { if (!intag) { // If not in an HTML tag if (buf[i] == '<") intag = true; // check for tag start else buf[last++] = buf[i]; // and copy the character } else if (buf[i] == '>") intag = false; // check for end of tag } numchars = last - from; // Figure out how many characters remain } // And if it is more than zero characters return numchars; // Then return that number.
public int read()
This is another no-op read() method we have to implement. We implement it in terms of the method above. Our superclass implements the remaining read() methods in terms of these two.
char[] buf = new char[1]; int result = read(buf, 0, 1); if (result == -1) return -1; else return (int)buf[0];