Data structures
In HXML, XML documents are represented as a Tree
of XMLNodes:
type Name = String
type AttList = [(Name,String)]
data XMLNode =
RTNode
| ELNode Name AttList
| TXNode String
| PINode Name String
| CXNode String
| ENNode Name
data Tree a = Tree a [Tree a]
type XML = Tree XMLNode
parseXML :: String -> XML
showXML :: XML -> String
Name is a type synonym for String; it is used
for element and attribute names.
The AttList type denotes an attribute value list,
represented in HXML as a list of (name, value) pairs.
XML Information Set
XMLNode is an algebraic data type roughly corresponding
to XML Information Set items, (<URL:http://www.w3.org/TR/xml-infoset>)
as follows:
- RTNode
- The document information item.
This node type is used as a container node to hold the
result of parseDocument. In addition to the
root element, it may contain processing instructions and other
miscellaneous junk.
- ELNode GI AttList
- An element information item.
GI is the generic identifier or element type name.
AttList is the list of attribute information items
belonging to this element, represented as a list of name-value pairs
[(String, String)].
- TXNode String
- A consecutive sequence of character information items.
Note that a TXNode does not necessarily represent
a maximal contiguous sequence of characters --
the parser may split text up into multiple TXNodes.
- PINode Name String
- A processing instruction information item.
The first field is the processing instruction target property,
the second is its content property.
The notation processing instruction property is not
included; I believe this is an error in the InfoSet spec.
- CXNode String
- A comment information item.
- ENNode Name
- An unparsed entity information item.
Generated by a named entity reference (&foo;).
Unsupported InfoSet properties
The following properties defined in the InfoSet are not
directly supported by HXML:
- base URI
- character encoding scheme
- standalone
- namespace name
- namespace attributes
- in-scope namespaces
- attribute type
- references
- ... probably others
Other infoset properties such as parent and children
are derived from the Tree context.
The prefix and local name properties of named nodes
(elements, attributes, etc.) may be extracted from the name
with the splitName function.
Most of the additional unparsed entity information item properties
are available only with great effort.