FileDocCategorySizeDatePackage
HTMLdtd.javaAPI DocApache Xerces 3.0.119621Fri Sep 14 20:33:52 BST 2007org.apache.xml.serialize

HTMLdtd

public final class HTMLdtd extends Object
Utility class for accessing information specific to HTML documents. The HTML DTD is expressed as three utility function groups. Two methods allow for checking whether an element requires an open tag on printing ({@link #isEmptyTag}) or on parsing ({@link #isOptionalClosing}).

Two other methods translate character references from name to value and from value to name. A small entities resource is loaded into memory the first time any of these methods is called for fast and efficient access.

deprecated
This class was deprecated in Xerces 2.9.0. It is recommended that new applications use JAXP's Transformation API for XML (TrAX) for serializing HTML. See the Xerces documentation for more information.
version
$Revision: 476047 $ $Date: 2006-11-16 23:27:45 -0500 (Thu, 16 Nov 2006) $
author
Assaf Arkin

Fields Summary
public static final String
HTMLPublicId
Public identifier for HTML 4.01 (Strict) document type.
public static final String
HTMLSystemId
System identifier for HTML 4.01 (Strict) document type.
public static final String
XHTMLPublicId
Public identifier for XHTML 1.0 (Strict) document type.
public static final String
XHTMLSystemId
System identifier for XHTML 1.0 (Strict) document type.
private static Hashtable
_byChar
Table of reverse character reference mapping. Character codes are held as single-character strings, mapped to their reference name.
private static Hashtable
_byName
Table of entity name to value mapping. Entities are held as strings, character references as Character objects.
private static Hashtable
_boolAttrs
private static Hashtable
_elemDefs
Holds element definitions.
private static final String
ENTITIES_RESOURCE
Locates the HTML entities file that is loaded upon initialization. This file is a resource loaded with the default class loader.
private static final int
ONLY_OPENING
Only opening tag should be printed.
private static final int
ELEM_CONTENT
Element contains element content only.
private static final int
PRESERVE
Element preserve spaces.
private static final int
OPT_CLOSING
Optional closing tag.
private static final int
EMPTY
Element is empty (also means only opening tag)
private static final int
ALLOWED_HEAD
Allowed to appear in head.
private static final int
CLOSE_P
When opened, closes P.
private static final int
CLOSE_DD_DT
When opened, closes DD or DT.
private static final int
CLOSE_SELF
When opened, closes itself.
private static final int
CLOSE_TABLE
When opened, closes another table section.
private static final int
CLOSE_TH_TD
When opened, closes TH or TD.
Constructors Summary
Methods Summary
public static intcharFromName(java.lang.String name)
Returns the value of an HTML character reference by its name. If the reference is not found or was not defined as a character reference, returns EOF (-1).

param
name Name of character reference
return
Character code or EOF (-1)

        Object    value;

        initialize();
        value = _byName.get( name );
        if ( value != null && value instanceof Integer ) {
            return ( (Integer) value ).intValue();
        }
        return -1;
    
private static voiddefineBoolean(java.lang.String tagName, java.lang.String attrName)

        defineBoolean( tagName, new String[] { attrName } );
    
private static voiddefineBoolean(java.lang.String tagName, java.lang.String[] attrNames)

        _boolAttrs.put( tagName, attrNames );
    
private static voiddefineElement(java.lang.String name, int flags)

        _elemDefs.put( name, new Integer( flags ) );
    
private static voiddefineEntity(java.lang.String name, char value)
Defines a new character reference. The reference's name and value are supplied. Nothing happens if the character reference is already defined.

Unlike internal entities, character references are a string to single character mapping. They are used to map non-ASCII characters both on parsing and printing, primarily for HTML documents. '<amp;' is an example of a character reference.

param
name The entity's name
param
value The entity's value

        if ( _byName.get( name ) == null ) {
            _byName.put( name, new Integer( value ) );
            _byChar.put( new Integer( value ), name );
        }
    
public static java.lang.StringfromChar(int value)
Returns the name of an HTML character reference based on its character value. Only valid for entities defined from character references. If no such character value was defined, return null.

param
value Character value of entity
return
Entity's name or null

       if (value > 0xffff)
            return null;

        String name;

        initialize();
        name = (String) _byChar.get( new Integer( value ) );
        return name;
    
private static voidinitialize()
Initialize upon first access. Will load all the HTML character references into a list that is accessible by name or character value and is optimized for character substitution. This method may be called any number of times but will execute only once.

        InputStream     is = null;
        BufferedReader  reader = null;
        int             index;
        String          name;
        String          value;
        int             code;
        String          line;

        // Make sure not to initialize twice.
        if ( _byName != null )
            return;
        try {
            _byName = new Hashtable();
            _byChar = new Hashtable();
            is = HTMLdtd.class.getResourceAsStream( ENTITIES_RESOURCE );
            if ( is == null ) {
            	throw new RuntimeException( 
				    DOMMessageFormatter.formatMessage(
				    DOMMessageFormatter.SERIALIZER_DOMAIN,
                    "ResourceNotFound", new Object[] {ENTITIES_RESOURCE}));
            }    
            reader = new BufferedReader( new InputStreamReader( is, "ASCII" ) );
            line = reader.readLine();
            while ( line != null ) {
                if ( line.length() == 0 || line.charAt( 0 ) == '#" ) {
                    line = reader.readLine();
                    continue;
                }
                index = line.indexOf( ' " );
                if ( index > 1 ) {
                    name = line.substring( 0, index );
                    ++index;
                    if ( index < line.length() ) {
                        value = line.substring( index );
                        index = value.indexOf( ' " );
                        if ( index > 0 )
                            value = value.substring( 0, index );
                        code = Integer.parseInt( value );
                                        defineEntity( name, (char) code );
                    }
                }
                line = reader.readLine();
            }
            is.close();
        }  catch ( Exception except ) {
			throw new RuntimeException( 
				DOMMessageFormatter.formatMessage(
				DOMMessageFormatter.SERIALIZER_DOMAIN,
                "ResourceNotLoaded", new Object[] {ENTITIES_RESOURCE, except.toString()}));        	
        } finally {
            if ( is != null ) {
                try {
                    is.close();
                } catch ( Exception except ) { }
            }
        }
    
public static booleanisBoolean(java.lang.String tagName, java.lang.String attrName)
Returns true if the specified attribute is a boolean and should be printed without the value. This applies to attributes that are true if they exist, such as selected (OPTION/INPUT).

param
tagName The element's tag name
param
attrName The attribute's name

        String[] attrNames;

        attrNames = (String[]) _boolAttrs.get( tagName.toUpperCase(Locale.ENGLISH) );
        if ( attrNames == null )
            return false;
        for ( int i = 0 ; i < attrNames.length ; ++i )
            if ( attrNames[ i ].equalsIgnoreCase( attrName ) )
                return true;
        return false;
    
public static booleanisClosing(java.lang.String tagName, java.lang.String openTag)
Returns true if the opening of one element (tagName) implies the closing of another open element (openTag). For example, every opening LI will close the previously open LI, and every opening BODY will close the previously open HEAD.

param
tagName The newly opened element
param
openTag The already opened element
return
True if closing tag closes opening tag

        // Several elements are defined as closing the HEAD
        if ( openTag.equalsIgnoreCase( "HEAD" ) )
            return ! isElement( tagName, ALLOWED_HEAD );
        // P closes iteself
        if ( openTag.equalsIgnoreCase( "P" ) )
            return isElement( tagName, CLOSE_P );
        // DT closes DD, DD closes DT
        if ( openTag.equalsIgnoreCase( "DT" ) || openTag.equalsIgnoreCase( "DD" ) )
            return isElement( tagName, CLOSE_DD_DT );
        // LI and OPTION close themselves
        if ( openTag.equalsIgnoreCase( "LI" ) || openTag.equalsIgnoreCase( "OPTION" ) )
            return isElement( tagName, CLOSE_SELF );
        // Each of these table sections closes all the others
        if ( openTag.equalsIgnoreCase( "THEAD" ) || openTag.equalsIgnoreCase( "TFOOT" ) ||
             openTag.equalsIgnoreCase( "TBODY" ) || openTag.equalsIgnoreCase( "TR" ) ||
             openTag.equalsIgnoreCase( "COLGROUP" ) )
            return isElement( tagName, CLOSE_TABLE );
        // TD closes TH and TH closes TD
        if ( openTag.equalsIgnoreCase( "TH" ) || openTag.equalsIgnoreCase( "TD" ) )
            return isElement( tagName, CLOSE_TH_TD );
        return false;
    
private static booleanisElement(java.lang.String name, int flag)

        Integer flags;

        flags = (Integer) _elemDefs.get( name.toUpperCase(Locale.ENGLISH) );
        if ( flags == null ) {
            return false;
        }
        return ( ( flags.intValue() & flag ) == flag );
    
public static booleanisElementContent(java.lang.String tagName)
Returns true if element is declared to have element content. Whitespaces appearing inside element content will be ignored, other text will simply report an error.

param
tagName The element tag name (upper case)
return
True if element content

        return isElement( tagName, ELEM_CONTENT );
    
public static booleanisEmptyTag(java.lang.String tagName)
Returns true if element is declared to be empty. HTML elements are defines as empty in the DTD, not by the document syntax.

param
tagName The element tag name (upper case)
return
True if element is empty



                                              
          
    
        return isElement( tagName, EMPTY );
    
public static booleanisOnlyOpening(java.lang.String tagName)
Returns true if element's closing tag is generally not printed. For example, LI should not print the closing tag.

param
tagName The element tag name (upper case)
return
True if only opening tag should be printed

        return isElement( tagName, ONLY_OPENING );
    
public static booleanisOptionalClosing(java.lang.String tagName)
Returns true if element's closing tag is optional and need not exist. An error will not be reported for such elements if they are not closed. For example, LI is most often not closed.

param
tagName The element tag name (upper case)
return
True if closing tag implied

        return isElement( tagName, OPT_CLOSING );
    
public static booleanisPreserveSpace(java.lang.String tagName)
Returns true if element's textual contents preserves spaces. This only applies to PRE and TEXTAREA, all other HTML elements do not preserve space.

param
tagName The element tag name (upper case)
return
True if element's text content preserves spaces

        return isElement( tagName, PRESERVE );
    
public static booleanisURI(java.lang.String tagName, java.lang.String attrName)
Returns true if the specified attribute it a URI and should be escaped appropriately. In HTML URIs are escaped differently than normal attributes.

param
tagName The element's tag name
param
attrName The attribute's name

        // Stupid checks.
        return ( attrName.equalsIgnoreCase( "href" ) || attrName.equalsIgnoreCase( "src" ) );