Message-id: <8o99eg$f7m$1@dragon.flightlab.com> References: From: Joe English Date: 26 Aug 2000 13:31:12 -0700 Subject: Re: XSLT question: 'unflattening' an XHMTL document Newsgroups: comp.text.xml Organization: Advanced Rotorcraft Technology, Inc. Lines: 124 Julien Quint wrote: > >I am trying to transform an XHTML document in order to give it more structure. >The source document uses

,

and

for sections, subsections and >subsubsections, but the document is flat: all headings are on the same axis, >as are paragraphs, lists, etc. The target document should group sections >and sub sections under
tags, so that > >

Section

>

Sub-section

>

A paragraph

>

Another Section

>

With a paragraph

> >should become something like > >
>

Section

>
>

Sub-section

>

A paragraph

>
>
>
>

Another Section

>

With a paragraph

>
Here's how I would do it: The source document type looks something like: (where "P" stands in for all the block-level elements). Now imagine an intermediate document type that looks like: Now imagine trying to validate the input document against the *target* DTD, but include a simple error-recovery strategy: When you see a start-tag , check the following: 1) If is legal at the current position in the document, then keep going; 2) If is not legal at the current position, but there is an element 'Y' such that would be legal at this point and can appear at the beginning of a , then insert a start-tag; 3) Otherwise, insert an end-tag for the current element and go back to step (1) Also, 4) when you see an end-tag and the current element is 'Y', try inserting a end-tag. So for example: >

Section

Here

isn't legal, but we can insert a start-tag by rule (2):

Section

>

Sub-section

Again we use rule (2) to infer a start-tag.

Section

Sub-section

A paragraph

>

Another Section

Here rule (3) applies, so we insert a
end-tag. Then rule (3) applies again, so we insert a
. Finally rule (2) applies, and we can insert a and continue with:

Section

Sub-section

A paragraph

Another Section

With a paragraph

> Now we infer a
end-tag by rule (4), and then is legal. As a last step, transform the intermediate document to your final DTD by changing ... into just plain
elements. The whole process can be done with two SAX filters (the first to infer the extra structure, the second to rename the elements). Note that you don't need to write a full-blown validator either; you can get away with a much simpler set of contextual conditions. --Joe English