File Doc Category Size Date Package
RemoveHTMLReader.java API Doc Example 3864 Sat Jun 02 02:43:00 BST 2001 None

RemoveHTMLReader

java.lang.Object
- java.io.Reader
  - java.io.FilterReader

public class RemoveHTMLReader extends FilterReader

A simple FilterReader that strips HTML tags out of a stream of characters. It isn't perfect: it doesn't know about tags, for example, within which '<' and '>' aren't to be interpreted as tags. It will also strip '<' and '>' characters (and anything in between) out of plain text files. For this reason, it should only be used with properly formatted HTML input.</td></tr></table><table class=tags><tr><td><dl></dl></td></tr></table><p> <script> document.write('(<a hr'+'ef=java'+'script:go("/j'+'cs/3864_RemoveHTMLReader.html2")>Omit source code</a>)<p>'); </script> <table class=fields><tr><th colspan=2>Fields Summary</th></tr><tr><td id=m1>boolean</td><td><dt><span id=field>intag</span></dt><dd></dd></td></tr></table><table class=constructors><tr><th colspan=1>Constructors Summary</th></tr><tr><td id=c1><span class=cst>public RemoveHTMLReader</span>(<a href=/jcs/s/Reader>Reader</a> in)</span><blockquote>A trivial constructor. Just initialze our superclass<p><table class=tags><tr><td><dl></dl></td></tr></table><p><code><pre> super(in); </pre></code></blockquote></tr></table><table class=methods><tr><th colspan=2>Methods Summary</th></tr><tr><td id=m1>public <a href=/jcs/s/int>int</a></td><td><span class=method>read</span>(<a href=/jcs/s/char>char[]</a> buf, <a href=/jcs/s/int>int</a> from, <a href=/jcs/s/int>int</a> len)</dt><blockquote>This is the implementation of the no-op read() method of FilterReader. It calls in.read() to get a buffer full of characters, then strips out the HTML tags. (in is a protected field of the superclass).<p><table class=tags><tr><td><dl></dl></td></tr></table><p><code><pre> // Used to remember whether we are "inside" a tag int numchars = 0; // how many characters have been read // Loop, because we might read a bunch of characters, then strip them // all out, leaving us with zero characters to return. while (numchars == 0) { numchars = in.read(buf, from, len); // Read characters if (numchars == -1) return -1; // Check for EOF and handle it. // Loop through the characters we read, stripping out HTML tags. // Characters not in tags are copied over any previous tags in the buffer int last = from; // Index of last non-HTML char for(int i = from; i < from + numchars; i++) { if (!intag) { // If not in an HTML tag if (buf[i] == '<") intag = true; // check for start of a tag else buf[last++] = buf[i]; // and copy the character } else if (buf[i] == '>") intag = false; // Else, check for end of tag } numchars = last - from; // Figure out how many characters remain } // And if it is more than zero characters return numchars; // Then return that number. </pre></code></blockquote></tr><tr><td id=m1>public <a href=/jcs/s/int>int</a></td><td><span class=method>read</span>()<blockquote>This is another no-op read() method we have to implement. We implement it in terms of the method above. Our superclass implements the remaining read() methods in terms of these two.<p><table class=tags><tr><td><dl></dl></td></tr></table><p><code><pre> char[] buf = new char[1]; int result = read(buf, 0, 1); if (result == -1) return -1; else return (int)buf[0]; </pre></code></blockquote></tr></table></td></tr></table><script src="/js/eucookie.js"></script><hr class=footer></hr><table id=footer><tr><td align=left><a href=/jcs/home>Java Code Source</a></td><td align=left><form action="https://www.paypal.com/cgi-bin/webscr" method="post"><input type="hidden" name="cmd" value="_s-xclick"><input type="hidden" name="hosted_button_id" value="RAZ53P934MLKN"><input type="image" src="/images/btn_donate_SM.gif" border="0" name="submit" alt="PayPal - The safer, easier way to pay online."><img alt="" border="0" src="https://www.paypal.com/en_GB/i/scr/pixel.gif" width="1" height="1"></form></td></tr></table></body></html>