OMITTAG: Excerpts from ISO 8879

...everything I have been able to find in the SGML Handbook relevant to tag omission.

(%%% Missing: OMITTAG parameter in SGML declaration)

The omitted tag minimization parameter

It is worth noting that the omitted tag minimization parameter of an <!ELEMENT...> declaration -- the pair of dashes-or-Os following the element type name -- has no effect on tag inference. It only determines whether a parser must report a markup error when an omitted tag has already been inferred. The real rules are specified in 7.3.1.

The grammar for <!ELEMENT ...> declarations is:

[116] element declaration =
        ( mdo ("<!"),
          "ELEMENT",
          ps+,
          element type,
          ( +ps+,
             omitted tag minimization )?,
          ps+,
          ( declared content
          | content model),
          ps*,
          mdc (">") )

[11.2, p. 405]

and the definition of the omitted tag minimization parameter is:

[122] omitted tag minimization =
	start-tag minimization,
	ps+,
	end-tag minimization
[123] start-tag minimization =
	"O" |
	minus
[124] end-tag minimization =
	"O" |
	minus
where

O
means that omission of the tag under the conditions specified in 7.3.1 is not a markup errror.
minus
means that omission of the tag under the conditions specified in 7.3.1 is a markup errror.
[11.2.2, p. 408]

General

The rules governing start- and end-tag omissibility are stated in section 7.3.1, ``Omitted Tag Minimization'':

A tag can be omitted only as provided in this sub-sub-clause, and only if the omission would not create an ambiguity [1], and if ``OMITTAG YES'' is specified on the SGML declaration
NOTE -- A document type definition may consider a technically valid omission to be a markup error (see 11.2.2). [7.3.1, p. 308]

End-tag omission

The rules for end-tag omission are fairly straightforward:

The end-tag can be omitted for an element that is followed either

by the end of an SGML document entity or SGML subdocument entity;
by the end-tag of another open element; or
by an element or SGML character that is not allowed in its content
NOTE -- An element that is not allowed because it is an exclusion exception has the same effect as one that is not allowed because no token appears for it in the model group.
[7.3.1.2, pp 310-311]

Q: not allowed anywhere in its content or just at that point?

Start-tag omission

The rules for start-tag omission depend on the notion of contextually required elements, and are considerably more complicated:

The start-tag can be omitted if the element is a contextually required element and any other elements that could occur are contextually optional elements, except if:

the element type has a required attribute or declared content; or
the content of the instance of the element is empty.

It is ambiguous to omit the start-tag of an element whose content begins with a short reference string whose mapping is changed by the element's associated short reference map.
An omitted start-tag is treated as having an empty attribute specification list. [7.3.1.1, p. 310]

The definition of ``contextually required element'' and the subsidiary definitions on which it depends are:

4.61 contextually required element:
An element that is not a contextually optional element and

whose generic identifier is the document type name; or
whose currently applicable content token is a contextually required token.

4.62 contextually required token:
A content token that

is the only one in its model group; or
is in a seq group

that

is itself a contextually required token; or
contains a token which has been satisfied;
and
all preceding tokens of which

have been satisfied; or
are contextually optional.

4.59 contextually optional element:
An element

that can occur only because it is an inclusion; or
whose content token in the currently applicable model group is a contextually optional token.

4.60 contextually optional token:
A content token that

is an inherently optional token; or
has a plus occurrence indicator and has been satisfied; or
is in a model group that is itself a contextually optional token, no tokens of which have been satisfied.

4.159 inherently optional token:
A content token that:

has an opt or rep occurrence indicator; or
is an or group, one of whose tokens is inherently optional; or
is an and or seq group, all of whose tokens are inherently optional.

4.274 satisfied token:
A content token whose corresponding content has occurred.
[pp 163-164]

%%% Missing: #PCDATA is inherently optional

Found it: 11.2.4.2, p. 414: "The "#PCDATA" content token is regarded as having an occurrence indicator of rep.

%%% 4.61 clause 1: should be "is the document element". There does not seem to be a prohibition against elements with the same GI as the document type name appearing as children of other elements. Defn. 4.99 "Document element" has the same mistake.

%%% See also 11.2.4, p. 412.8 "An equivalent for a non-recommended content model" -- pernicious mixed content -- "can normally be obtained by replacing "#PCDATA" with the GI of an element whose content is "#PCDATA" and both of whose tags can be omitted". Note that this does not, in fact, work, in most cases.

Also relevant is the following note:

NOTE -- An element could be neither contextually required nor contextually optional; for example, an element whose currently applicable content token is in an or group that has no inherently optional tokens. [4.61, p. 164]

It is interesting to note that a content token can be both contextually required and contextually optional: consider the primitive token A in ((A)|B?). By 4.62 clause 1, it is contextually required; by 4.60 clause 1 and 4.159 clause 2, it is contextually optional. (A is not a contextually required element, however, so its start-tag may not be omitted.)

A few other relevant excerpts

A model group with a single content token is regarded as a seq group. [11.2.4.1 ``Connector'', p. 413]

The rationale states that this rule ``is necessary in order to apply the definitions of optional and required elements and tokens that are used to determine whether start-tag omission is possible'' [p. 413]. However, 4.62 ``contextually required token'' already includes a special case for a content token that is the only one in its model group, so I'm not sure I understand the rationale.

A validating parser is not required to report the occurrence of an omitted tag that causes an ambiguity:

validating SGML parser: A conforming SGML parser that can find and report a reportable markup error if (and only if) one exists.
reportable markup error: A failure of a document to conform with this International Standard, other than [...]

...
e)
an otherwise allowable omission of a tag that creates an ambiguity;
...
[9.3, ``Conforming System'', pp. 215-216]

I have absolutely no idea what this means or why this clause was included in the Standard.

Content tokens

Paraphrasing section [11.2.4 ``Content Model'', pp. 409-414] (and ignoring DATATAG), recall that a content token is one of:

#PCDATA
an element token (i.e., a generic identifier)
a seq group
an or group
an and group

and may have one of the following occurrence indicators:

opt (?)
rep (*)
plus (+)

Notes

[1] Regarding the rule ``only if the omission would not create an ambiguity'': the only description I can find in the Standard of how such an ambiguity could arise is in 7.3.1.1 regarding short-reference maps.

Other possible causes of ambiguity are:

If an element begins with a subelement (or character data) that is legal at the point where the element itself appears, and the start-tag for the parent element is omitted.
If an element is followed by a sibling element that is legal at the end of the earlier element, and the end-tag for the earlier element is omitted.

The Standard does not explicitly mention these cases, as far as I can determine.

Joe English / joe@art.com $Date$

Back up to OMITTAG notes...