
2 Nov 2001

Announcing HXML, an(other) XML parser for Haskell.

This implementation should have better space behaviour than HaXml's parser,
and may be used as a drop-in replacement in existing HaXml programs.

HXML is available at:

    <URL: http://www.flightlab.com/~joe/hxml >

The current version is 0.1, and is slightly post-alpha quality.

Please contact Joe English <jenglish@flightlab.com> with
any questions, comments, or bug reports.

* * * Known bugs -- READ THIS FIRST

	+ The XML declaration is ignored.
	+ Unicode support is only as good as that provided
	  by the Haskell system (i.e., not very, except for HBC).
	+ Marked sections are not implemented yet
	+ Cannot parse <!DOCTYPE ...> declarations yet
	+ Does not support XML Namespaces.
	+ Does not do any well-formedness or validity checks.


* * * USAGE

The main entry point to the parser is:

    parseInstance :: String -> [XMLEvent]

where XMLEvent is defined in XMLParse.hs as:

    type Name = String
    data XMLEvent =
	  StartEvent Name [(Name,String)]	-- start-tag (gi, atttributes)
	| EndEvent   Name			-- end-tag (gi)
	| TextEvent  String			-- character data (text)
	... other event types ...

This provides a "SAX-like" interface (or the FP equivalent thereof;
instead of invoking callback methods on a handler object, the
parser returns a (lazy) list of events).

parseInstance is most usefully composed with

	buildTree :: [XMLEvent] -> Tree XMLNode

where 'Tree a' is the data type of Rose trees

	data Tree a = Tree a [Tree a]

and XMLNode is defined in XML.hs as:

    data XMLNode =
	  RTNode			-- root node
	| ELNode Name [(Name,String)]	-- element node: GI, attributes
	| TXNode String			-- text node
	... other node types, see XML.hs ...

This provides a "grove-like" interface, and is is probably the most
useful form to work with in Haskell programs.

(There are advantages and disadvantages to the 'Tree XMLNode'
representation.  The main advantage is uniform access to all nodes;
the main disadvantage is that it doesn't capture many XML constraints
that we'd like to enforce in a strongly typed language.  For example,
with this representation it's possible for a TextNode to have children,
which is nonsensical in XML.)

There's also a HaXml adapter:

	toContent :: Tree XMLNode -> Content

where 'Content' is HaXml's representation of an XML instance.

Another interface, under construction, is a full "XPath-like" view,
which allows navigation up and down the tree.  See Tree.hs for details.
Also in progress is a high-level combinator library, inspired
by HaXml and XSLT.  (These last two are not quite ready yet).


* * * PUBLIC MODULES

module HXML:
    Package module which simply includes and reexports the
    HXML public modules listed below:

module XML:
    Defines data types for XML document instances.

module DTD:
    Defines data types for XML (and SGML) DTDs

module XMLParse:
    The parser:
	parseInstance	:: String -> [XMLEvent]
	parseDTD 	:: String -> DTD

    parseDTD recognizes a superset of XML DTD syntax;
    it can (almost) be used to parse SGML DTDs as well.
    (@@@ except that parameter entities don't work yet)

module Tree:
    Definition of Rose trees,

    	data Tree a = Tree a [Tree a],

    plus a few miscellaneous utilities.

module TreeBuild:
    Converters between the stream and tree representations:

	buildTree     :: [XMLEvent] -> Tree XMLNode
	serializeTree :: Tree XMLNode -> [XMLEvent]

    These are not inverses -- serialize . build is not the identity --
    but they should satisfy

    	build . serialize . build == build
	serialize . build . serialize == serialize

module PrintXML:
    Converts XML (tree view or stream view) back into a
    sequence of Characters:

	printTree :: Tree XMLNode -> String
	printXML  :: [XMLEvent] -> String


* * * HaXml interface:

module HaXmlAdapter:
    Contains adapters for converting to and from HaXml's
    internal representation:

	    toContent 		:: Tree XMLNode -> Content
	    fromContent 	:: Content -> Tree XMLNode

	    buildContent	:: [XMLEvent] -> Content
	    serializeContent	:: Content -> [XMLEvent]

    Also a few utilities:

	    parseContent 	:: String -> Content
	    parseContent	=  buildContent . parseInstance
	    printContent	:: Content -> String
	    printContent	=  printXML . serializeContent

    and modified versions of HaXml's processXmlWith driver:

	    processXmlWith' 	:: (Content -> [Content]) -> IO ()

    which uses the HXML parser instead of the HaXml one, and

	    processXmlWith'' 	:: (Content -> [Content]) -> IO ()

    which in addition uses HXML's serializer.  These versions
    should have better space performance than processXmlWith
    (@@@ though there are still a few problems).

* * * INTERNAL MODULES:

module LLParsing.hs:
    CPS-based parser combinator library used by XMLParse.
    Uses function application for sequencing a la Swierstra
    and Duponcheel; the user interface is similar to
    Swierstra & Duponcheel's UU_Parsing, but the implementation
    is much simpler (and less powerful).

    This is the only part of the code that requires
    an extension to Haskell 98 (rank-2 polymorphism),
    and that can be avoided simply by replacing
    'newtype Parser sym res  =  P ...'
    with 'newtype P p = P p' and omitting all other
    type declarations involving 'Parser'.  Haskell's type inference
    algorithm computes workable (though incomprehensible)
    types for all the combinators.

module XMLScanner:
    The lexical analyzer.  Parses a sequence of characters into a
    sequence of Delimiters.  In addition to the 'Delimiter' data
    type (q.v.), exports two functions:
	pcdataMode :: String -> [Delimiter]
    which is used to parse the document instance, and
	markupMode :: String -> [Delimiter]
    which is used to parse DTDs.

module AssocList:
    A quick and dirty finite map implementation; used by DTD.hs

module Misc:
    Miscellaneous handy utilities which didn't fit anywhere else.

-- *EOF*
