Data structures

In HXML, XML documents are represented as a Tree of XMLNodes:


type Name = String
type AttList = [(Name,String)]

data XMLNode	=
      RTNode
    | ELNode	Name AttList
    | TXNode	String
    | PINode	Name String
    | CXNode	String
    | ENNode	Name
data Tree a	= Tree a [Tree a]
type XML	= Tree XMLNode

parseXML :: String -> XML
showXML  :: XML -> String

Name is a type synonym for String; it is used for element and attribute names. The AttList type denotes an attribute value list, represented in HXML as a list of (name, value) pairs.

XML Information Set

XMLNode is an algebraic data type roughly corresponding to XML Information Set items, (<URL:http://www.w3.org/TR/xml-infoset>) as follows:

RTNode
The document information item. This node type is used as a container node to hold the result of parseDocument. In addition to the root element, it may contain processing instructions and other miscellaneous junk.
ELNode GI AttList
An element information item. GI is the generic identifier or element type name. AttList is the list of attribute information items belonging to this element, represented as a list of name-value pairs [(String, String)].
TXNode String
A consecutive sequence of character information items. Note that a TXNode does not necessarily represent a maximal contiguous sequence of characters -- the parser may split text up into multiple TXNodes.
PINode Name String
A processing instruction information item. The first field is the processing instruction target property, the second is its content property. The notation processing instruction property is not included; I believe this is an error in the InfoSet spec.
CXNode String
A comment information item.
ENNode Name
An unparsed entity information item. Generated by a named entity reference (&foo;).

Unsupported InfoSet properties

The following properties defined in the InfoSet are not directly supported by HXML:

Other infoset properties such as parent and children are derived from the Tree context. The prefix and local name properties of named nodes (elements, attributes, etc.) may be extracted from the name with the splitName function. Most of the additional unparsed entity information item properties are available only with great effort.