FileDocCategorySizeDatePackage
RemoveHTMLReader.javaAPI DocExample3974Sat Jan 24 10:44:26 GMT 2004je3.io

RemoveHTMLReader

public class RemoveHTMLReader extends FilterReader
A simple FilterReader that strips HTML tags (or anything between pairs of angle brackets) out of a stream of characters.

Fields Summary
boolean
intag
Constructors Summary
public RemoveHTMLReader(Reader in)
A trivial constructor. Just initialize our superclass

 super(in); 
Methods Summary
public intread(char[] buf, int from, int len)
This is the implementation of the no-op read() method of FilterReader. It calls in.read() to get a buffer full of characters, then strips out the HTML tags. (in is a protected field of the superclass).

    // Used to remember whether we are "inside" a tag

                                              
              
        int numchars = 0;        // how many characters have been read
        // Loop, because we might read a bunch of characters, then strip them
        // all out, leaving us with zero characters to return.
        while (numchars == 0) {
            numchars = in.read(buf, from, len); // Read characters
            if (numchars == -1) return -1;      // Check for EOF and handle it.

            // Loop through the characters we read, stripping out HTML tags.
            // Characters not in tags are copied over previous tags 
            int last = from;                    // Index of last non-HTML char
            for(int i = from; i < from + numchars; i++) { 
                if (!intag) {                      // If not in an HTML tag
                    if (buf[i] == '<") intag = true; // check for tag start
                    else buf[last++] = buf[i];       // and copy the character
                }
                else if (buf[i] == '>") intag = false;  // check for end of tag
            }
            numchars = last - from; // Figure out how many characters remain
        }                           // And if it is more than zero characters
        return numchars;            // Then return that number.
    
public intread()
This is another no-op read() method we have to implement. We implement it in terms of the method above. Our superclass implements the remaining read() methods in terms of these two.

 
        char[] buf = new char[1];
        int result = read(buf, 0, 1);
        if (result == -1) return -1;
        else return (int)buf[0];