Simple translations with Cost

Joe English
Last updated: Sun Jun 16 15:54:05 PDT 1996



1 Introduction

Cost is a powerful but somewhat complex system. The Simple module provides a simplified, high-level interface for developing translation specifications.

2 Getting started

A large number of SGML translation tasks involve nothing more than

The Simple module is designed to handle these types of translations. It makes a single pass through the document, inserting text and optionally calling a user-specified script at the beginning and end of each element. The translated document is written to standard output.

To load this module, put the command

require Simple.tcl
at the beginning of the specification script. Next, define a translation specification as follows:
specification translate {
    specification-rules...
}

The specification-rules is a paired list matching queries with parameter lists. The queries are used to select elements, and are typically of the form

    {element GI}
or
    {elements "GI GI..."}
where each GI is the generic identifier or element type name of the elements to select.

Any Cost query may be used, including complex rules like

    {element TITLE in SECTION withattval SECURITY RESTRICTED}
or simple ones like
    {el}
The latter query -- el -- matches all element nodes; it can be used to specify default parameters for elements which don't match any earlier query.

The parameter lists are also paired lists, matching parameters to values. The Simple module translation process uses the following parameters:

startAction
Tcl statements to execute at the beginning of the element
endAction
Tcl statements to execute at the end of the element
before
Text to insert before the element (before evaluating startAction)
prefix
Text to insert at the beginning the element (after evaluating startAction)
suffix
Text to insert at the end of the element (before evaluating endAction)
after
Text to insert after the element (after evaluating endAction)
cdataFilter
A filter procedure for character data
sdataFilter
A filter procedure for system data (SDATA entity references).

Tcl variable, backslash, and command substitution are performed on the before, after, prefix, and suffix parameters. This takes place when the element is processed, not when the specification is defined. The value of these parameters are not passed through the cdataFilter command before being output.

NOTE -- Remember to ``protect'' all Tcl special characters by prefixing them with a backslash if they are to appear in the output. The special characters are: dollar signs $, square brackets [], and backslashes \. See the Tcl documentation on the subst command for more details.

The cdataFilter parameter is the name of a filter procedure. This is a one-argument Tcl command. Cost passes each chunk of character data to this procedure, and outputs whatever the procedure returns. The default value of cdataFilter is the identity command, which simply returns its input:

proc identity {text} {return $text}

The sdataFilter parameter works just like cdataFilter, except that it is used for system data (the replacement text of SDATA entity references.) The default sdataFilter is also identity.

The Simple module saves and restores the current cdataFilter and sdataFilter at each element node.

3 Example

The following specification translates a subset of HTML to nroff -man macros. (Well, actually it doesn't do anything useful, it's just to give an idea of the syntax.)

require Simple.tcl

specification translate {
	{element H1} {
		prefix 	"\n.SH "
		suffix 	"\n"
		cdataFilter	uppercase
	}
	{element H2} {
		prefix 	"\n.SS "
		suffix 	"\n"
	}
	{elements "H3 H4 H5 H6"} {
		prefix "\n.SS"
		suffix "\n"
		startAction {
		    # nroff -man only has two heading levels
		    puts stderr "Mapping [query gi] to second-level heading"
		}
	}
	{element DT} {
		prefix	"\n.IP \""
		suffix	"\"\n"
	}
	{element PRE} {
		prefix "\n.nf\n"
		suffix "\n.fi\n"
	}
	{elements "EM I"} {
		prefix "\\fI"
		suffix "\\fP"
	}
	{elements "STRONG B"} {
		prefix "\\fB"
		suffix "\\fP"
	}

	{element HEAD} {
		cdataFilter nullFilter
	}
	{element BODY} {
		cdataFilter nroffEscape
	}
}

proc nullFilter {text} {
    return ""
}

proc nroffEscape {text} {
    # change backslashes to '\e'
    regsub -all {\\} $text {\\e} output
    return $output
}

proc uppercase {text} {
    return [nroffEscape [string toupper $text]]
}

4 Notes

The specification order is important: queries are tested in the order specified, so more specific queries must appear before more general ones.

Parameters are evaluated independently of one another. For example,

specification translate {
    {element "TITLE"} {
	cdataFilter uppercase
    }
    {element TITLE in SECT in SECT in SECT} {
	prefix "<H3>"
	suffix "</H3>\n"
    }
    {element TITLE in SECT in SECT} {
	prefix "<H2>"
	suffix "</H2>\n"
    }
    {element TITLE in SECT} {
	prefix "<H1>"
	suffix "</H1>\n"
	startAction {
	    puts $tocfile [content]
	}
    }
}

The parameter cdataFilter uppercase applies to all TITLE elements, regardless of where they occur, and the startAction parameter applies to any TITLEs which are children of a SECT, even if an earlier matching rule specified a prefix or suffix.

As its name implies, the Simple module is not very sophisticated, but it should be enough to get you started.