Hypertext Markup Language - 2.0

(1)

The document character set is somewhat independent of the character encoding scheme used to represent a document. For example, the ISO-2022-JP character encoding scheme can be used for HTML documents, since its repertoire is a subset of the ISO10646 repertoire. The crititcal distinction is that numeric character references agree with ISO10646 regardless of how the document is encoded.

(2)

There are a number of syntactic idioms that are not supported or are supported inconsistently in some historical user agent implementations. These idioms are called out in notes like this throughout this specification.

HTML documents should not contain these idioms, at least until such time as support for them is widely deployed.

(3)

To support non-western writing systems, HTML user agents should support the Unicode-1-1-UTF-8 and Unicode-1-1-UCS-2 encodings and as much of the character repertoire of ISO10646 as is possible as well.

(4)

In the interest of robustness and extensibility, there are a number of widely deployed conventions for handling non-conforming documents. See section Undeclared Markup Error Handling for details.

(5)

There are SGML mechanisms, CDATA and RCDATA, to allow most `<', `>', and `&' characters to be entered without the use of entity references. Because these features tend to be used and implemented inconsistently, and because they conflict with techinques for reducing HTML to 7 bit ASCII for transport, they are not used in this version of the HTML DTD.

(6)

The SGML declaration for HTML specifies SHORTTAG YES, which means that there are other valid syntaxes for tags, such as NET tags, `<EM/.../'; empty start tags, `<>'; and empty end-tags, `</>'. Until support for these idioms is widely deployed, their use is strongly discouraged.

(7)

Some historical implementations consider any occurrence of the `>' character to signal the end of a tag. For ompatibility with such implementations, when `>' appears in an attribute value, it should be represented with a numeric character reference, such as in: `<IMG SRC="eq1.jpg" alt="a>b">'.

(8)

Some historical implementations allow any character except space or `>' in a name token. Attributes values must be quoted only if they don't satisfy the syntax for a name token.

(9)

Some historical implementations only understand the minimized syntax.

(10)

Some historical HTML implementations incorrectly consider any `>' character to be the termination of a comment.

(11)

If the body of a text/html body part does not begin with a document type declaration, an HTML user agent should infer the above document type declaration.

(12)

The start and end tags for HTML, Head, and Body elements are omissible; however, this is not recommended since the head/body structure allows an implementation to determine certain properties of a document, such as the title, without parsing the entire document.

(13)

The length of a title is not limited; however, long titles may be truncated in some applications. To minimize this possibility, titles should be fewer than 64 characters. Also keep in mind that a short title, such as Introduction, may be meaningless out of context. An example of a meaningful title might be "Introduction to HTML Elements."

(14)

Use the non-breaking space and soft hyphen indicator characters is discouraged because support for them is not widely deployed.

(15)

Som historical documents contain P tags in PRE elements. User agents are engcouraged to treat this a a line break. A P tag followed by a newline character should produce only one line break, not a line break plus a blank line.

(16)

References to the "beginning of a new line" do not imply that the renderer is forbidden from using a constant left indent for rendering preformatted text. The left indent may be constrained by the width required.

(17)

Within a Preformatted Text element, the constraint that the rendering must be on a fixed horizontal character pitch may limit or prevent the ability of the HTML interpreter to faithfully render character formatting elements.

(18)

The names are not guaranteed to be unique keys, nor are the names of form elements required to be distinct. The values encode the user's input to the corresponding interactive elements. Fields with null values may be omitted from the returned list of name/value pairs, whereas those with non-null values should be included (even if the value was not altered by the user). In particular, unselected radio buttons and checkboxes should be excluded from the contents list.

(19)

In a future version of the HTML specification, the IMAGE functionality may be folded into an enhanced SUBMIT field.

(20)

In the initial design for forms, multi-line text fields were supported by the Input element with TYPE=TEXT. Unfortunately, this causes problems for fields with long text values. SGML's default (Reference Quantity Set) limits the length of attribute literals to only 240 characters. The HTML 2.0 SGML declaration increases the limit to 1024 characters.