Newsgroups: comp.text.sgml Subject: NOT the comp.text.sgml FAQ Followup-To: poster From: Joe English Archive-Name: sgml/not-the-faq Posting-Date: 1 April 2002 Posting-Frequency: sporadic ============================================================ Not the comp.text.sgml Frequently Asked Questions List ============================================================ Copyright (C) 1997, 1999, 2001, 2002 Joe English. All rights reserved. There are lots of wrongs in this document, and those are all reserved too. Author bears no responsibility for any other reservations you might have. Standard disclaimers apply. For external use only. If irritation, rash, or swelling occurs, discontinue use immediately. Void where prohibited. If you are acquiring this document on behalf of the U.S. Government, the Government shall have only "Restricted Rights Regulated Regulations Rights" (RRRRRs) as defined in Clause 31.415.269 (c) (2) (CCXXXVII) of that really long document that nobody has ever seen or read and good luck finding it (US-DOD-MILSPEC-217-RLDTNHESOR-AGLFI-YOG-SOTHOTH-1968), pursuant to the following: (1) Who are you? (2) What do you want? (3) Who do you serve, and who do you trust? (4) Where do you want to go today? Notwithstanding the foregoing, the author grants permission to anyone who has actually read this far into the disclaimer and copyright notice to do whatever they want with the damn thing. ============================================================ Part 1. Administrivia. ============================================================ Q. Is there a FAQ for this newsgroup? A. Yes. This is not it. Q. In the real FAQ, how come all the answers just point to ? A. If you've ever tried to get more than four SGML experts to agree on the answer to _any_ question, you'd understand. The only thing that every expert agrees on is that Robin Cover's web site is the best source of SGML information available anywhere. Q. I saw this last year. Can you tell me what's new in this year's edition so I don't have to slog through the whole thing again? A. No. Save a copy of this article so next year you can run 'diff' on it. ============================================================ Part 2. SGML, HTML, and XML. ============================================================ Q: In what way is XML simpler than SGML? A: Nearly everyone who has ever adopted an SGML-based system will agree that the hardest part of the project is the initial set-up. This process often requires highly-paid consultants, who spend weeks or even months analyzing requirements. The principal end result of this process is: a DTD. Since XML does not require DTDs, it is hoped that companies who adopt XML instead of full SGML can skip this expensive and time-consuming step. Q. Why does XML require SYSTEM identifiers after all PUBLIC identifiers? A. Since there is as of yet no standardized resolution mechanism for PUBLIC identifiers, XML requires authors to supply a URI for every external entity so that it will always be possible for clients to retrieve the entity. More information can be found on the WWW consortium's web site at www.cern.ch^H^H^H^H^H^Hmit.edu^H^H^H^H^H^H^Hw3.org, or at Robin Cover's excellent XML/SGML web page at www.sil.org^H^H^H^H^H^H^Hoasis-open.org/cover/. Q. Even in declarations? I thought XML was designed to be usable without DTDs. A. A PUBLIC identifier on it's own doesn't do anybody any good unless they can reliably resolve the entity. You must provide a URI, so that document consumers can retrieve the DTD if they so choose. Q. I have a document that has a whole bunch of "xmlns:foo='http://...'" attributes in it. What are all these URLs supposed to point to? A. Erm... ============================================================ Part 3. SGML, XML, and the Web. ============================================================ Q. I'm looking for the HTML DTDs used by the current versions of Netscape Navigator and MSIE. Where can I find them? A. This has the editor stumped. I can't think of an answer that's funnier than the original question. Q. How do I include JavaScript inside an XML document? A. Easy! Just write: "... ]]> Note that this solution also works for Perl, Python, Tcl, REXX, Icon, Ada, Basic, Beta, C, C++, Eiffel, Forth, Fortran, Haskell, Scheme, SML, Pascal, Modula, PL/I, Prolog, REXX, Sather, Smalltalk, SNOBOL, RPG/III, and COBOL. But not APL. Sorry. Q. But that doesn't work! A. What do you mean it doesn't work? There's your XML document, there's your JavaScript, there's your JavaScript inside your XML document, just like you asked. Q. I'm trying to debug my CSS stylesheet with MSIE 5.0, but it won't reload properly. A. That's more of an HTML/Web question than an SGML question. Try comp.infosystems.www.authoring.*. Q. Does anyone know of a web page hosting service that will let me upload .xml and .xsl files? A. Try asking in comp.infosystems.www.authoring.*, down the hall, second newsgroup on the right. Q. How do I make a borderless centered blinking table in a frame? A. Ask in the *Web* newsgroups, fercryinoutloud! Q. Is there a list of all the new tags in Netsca-- A. NO! SHUT UP! GO AWAY! Sheesh! Q. Does anyone know of a Java SAX/DOM interface for IE5 that can query an ASP with a W3C SQL-OODBMS HTTP/DHTML backend via XSL-T or P3P using a CSS URI? Or should I use SMIL over RDF/PICS and WAI instead? I'm using the UML DTD 1.0 and want to save IDL as XMI or MOF. SGBD, XML-QL, XML-NS 1.1 (namespace), and should I use DTD or RDF or DC/DC schema? Can I parse BLOBs with IBM's XML4J? Does WIDL work with SSL? XML to RTF, Servlet, applet, Cthulhu fthagn. Sent by deja.com: Share what you know, learn what you don't! A: Well! Let's move on to more SGML questions, shall we? ============================================================ Part 4. Miscellany. ============================================================ Q. How do I get the current element name in XSLT? A. That's element *type* name, dammit! Q. I'm designing my first DTD. Should I use elements or attributes to store data? A. Of course. What else would you use? Q. What is the usual way to choose between using attributes and children? [Update from David Carlisle:] In England, the normal method is to flip a coin in the air and call heads it's attributes, tails it's children. Unfortunately I'm not sure what to suggest in other countries where you can't guarantee having the Queen's head on one side of every coin. Q. How do I convert SGML to PDF? A. It's easy! Just download XML2PDF.EXE from Omniscient Heuristics Benevolent Software Inc.'s web site (http://www.ohbs.com/xml2pdf/), unpack and install it, run it, and Presto! Beautifully typeset PDF output from any file you throw at it! ... Hoo-hah! Oh, that was a good one! The correct answer, as anyone will tell you, is that you must use DSSSL. XML users must use XSL instead. Q. Isn't "DSSSL" a really awful acronym? A. Not if you are used to programming languages with mnemonic function names like "caddr" and "cdddr". Q. What is the philosophy of SGML? A. "The philosophy of SGML" is a rhetorical device, most often invoked when one participant in a discussion disagrees with something another has proposed. Typical usage: "That goes totally against the philosophy of SGML, which dicates that [... fill in the blank ...]". Q. OK, so then what's the spirit of SGML? A. This is not formaly defined at present. A proposed amendment to ISO 8879 (below) seeks to correct this deficiency: [Text of proposed changes:] | 4.333 spirit of SGML: An alcoholic concoction produced by fermenting | a conforming SGML document. | | 4.334 spirit of SGML: The name of a boat which will win the Whitbread | Round-the-World yacht race in the year of the 100th anniversary of the | adoption of SGML. | | 4.335 spirit of SGML: The ghost of the inventor of SGML who comes | back to haunt you every time you write "fully-tagged" without a hyphen. (Thanks to Steve Pepper for this one.) Q. What's so great about ISO standardization? A. It is often said that one of the advantages of SGML over some other, proprietary, generic markup scheme is that "nobody owns the standard". While this is not strictly true, the ISO's pricing policy certainly has helped to keep the number of people who do own a copy of the Standard at an absolute minimum. [ Ed. note: I'm not exactly sure why this is seen as an advantage, it's just something people say. ] ============================================================ Part 5. Terminology. ============================================================ Q. I've tried reading the (XML | SGML | XSL | XPATH | DSSSL | ...) specification, but it doesn't make any sense! There's too much jargon! A. Specification authors deliberately obfuscate the text of ISO and W3C standards to ensure that normal people (e.g., Perl programmers) can't use the technology without assistance from the so-called "experts" who designed the specs. Fortunately, there is a handy translation table you can use: -------------------------------------------------- ISO/W3C terminology Common name -------------------------------------------------- attribute tag attribute value tag attribute value literal tag attribute value specification tag character reference tag comment tag comment declaration tag declaration tag document type declaration tag document type definition tag element tag element type tag element type name tag entity tag entity reference tag general entity tag generic identifier tag literal tag numeric character reference tag parameter entity tag parameter literal tag processing instruction tag tag command -------------------------------------------------- With the help of this table, even Visual Basic programmers should have no trouble deciphering ISO prose. Q. What's a DTD? A. DTD is an acronym for Document Type Definition, which consists of a formal part (specified in SGML) and an informal part (specified in natural language). Many people forget about the informal part and use "DTD" to refer only to the formal part (i.e., the stuff that usually goes in a file named "something.dtd"). This is incorrect: the proper term for this object is "the entity containing the external subset of the formal part of the document type definition". (SGML has a tradition of using the longest possible phrases to describe the most frequently talked-about concepts; see also "declared-content-or-content-model".) The term "DTD" is also often confused with the document type declaration, mostly because they have the same initials. Q. I don't want to have to learn a new syntax. Is it true that Schemas are easier to understand than DTDs? A. Absolutely! Just compare the following W3C XML Schema fragment: | | | | | | | | | | | with the corresponding DTD fragment: | | | All those weird exclamation points and pound signs in the DTD are enough to make your head explode! The first version is obviously much simpler. Q. "Open" is to "close" as "start" is to: (a) "stop" (b) "end" (c) "finish" A. Most normal people will answer (a) or (c). If you answered (b), chances are you've been doing too much SGML. ============================================================ Part 6. Esoterica. ============================================================ Q. Explain how the XML encoding declaration works and why it's needed. A. Every XML document uses the Unicode character repertoire, but there are a number of different ways in which Unicode may be physically stored and transmitted. This is known as the "encoding", and the XML declaration specifies which one is used by the document. For example, if the first thing in the file is: an XML parser will recognize that the document is stored in the traditional ROT13 encoding. Q. Why can't you specify character numbers in hexadecimal? A. SGML is designed to be readable by human beings as well as by computers, and humans tend to find decimal numbers easier to interpret than hex, as the following demonstrates:

we're zany to the max

JDAHA'I >=C;F;E;AO EE; F;KH IC;=?B;I The first example is clearly much more readable than the second. UPDATE: This situation has changed since the Standard was first published. Thanks to the widespread availability of Unicode tables, most people nowadays find the hexadecimal form to be quite legible. A recent amendment adds this capability to SGML. (XML has always allowed it, since it wasn't designed to be read by humans.) Q. Why does nsgmls complain that '((a|b)*,b)' is ambiguous? A. Because it's not clear whether 'a' stands for Anchor, Author, or Address. Q. OK, so how do I make it unambiguous? A. This depends of course on the content model in question. Your best bet is to post to comp.text.sgml, where you are likely to receive several answers. Many will be wrong, so don't take any advice from the newsgroup unless three or more respondents say the same thing. Q. What's an RE? A. RE is an acronym for Record End, which is sort of like a newline, only different. Goldfarb's First Law of Text Processing states that: "... if a text processing system has bugs, at least one of them will have to do with the handling of input line endings." [The Handbook, footnote p. 321] The Record End concept was introduced to make sure that SGML parsers don't violate Goldfarb's First Law. Q. So what's an RS? A. An RS is a fictitious character inserted by the entity manager and later removed by the parser. Applications shouldn't ever have to worry about RSs; their primary function is to make REs disappear in mysterious places. Q. What's the difference between a QUANTITY and a CAPACITY? A. In the SGML declaration, a "quantity" is an arbitrary limit placed on the size of individual parts of a document which must be increased in order to use reasonable DTDs. A "capacity" on the other hand is an arbitrary limit placed on the size of the document as a whole which must be increased in order to process reasonably-sized documents. Quantities and capacities are used to make sure that every document includes an SGML declaration. Were it not for the ridiculously small limits in the Reference Concrete Syntax, most SGML users would be able to simply use the default SGML declaration instead of providing their own, slightly modified version, in violation of section 6.2. Q. What are inclusion exceptions and how do they work? A. There are two primary schools of thought regarding inclusions. Some feel that inclusion exceptions are a dangerous and badly designed feature, and should never be used (exclusion exceptions, however, are another matter). Others feel that inclusion exceptions are a useful and necessary enhancement to SGML's formal model. Q. What are exclusion exceptions and how do they work? A. There are two primary schools of thought regarding exclusions. Some feel that exclusion exceptions are a dangerous and badly designed feature, and should never be used (inclusion exceptions, however, are another matter). Others feel that exclusion exceptions are a useful and necessary enhancement to SGML's formal model. ============================================================ Part 7. HyTime. ============================================================ Q. What's a grove? A. I've heard that it's supposed to be an acronym for some thing or another, but I don't buy it. Q. What's the difference between a 'wand' and a 'baton' in the HyTime scheduling and rendition module? A. Sorry, it took me three months just to figure out 'pathloc'. I'm not even going to try to answer that one. Q. How many HyTime consultants does it take to screw in a lightbulb? A. Just one. HyTime's powerful linking and location facilities make it trivial to create a link expressing the abstract semantic of "screwing" with the lightbulb and the socket as link-ends, and even if the lightbulb and socket are not expressed as SGML objects you're free to use whatever application-specific query notation you desire to locate them. The actual installation of the light bulb is, of course, left up to the application. ============================================================ Part 8. DSSSL, CSS, XSL, and DHTML. ============================================================ Q. What are DSSSL, CSS, XSL and DHTML? A. They are acronyms. See Robin Cover's extensive SGML/XML web site at for more information. ============================================================ Part 9. Acknowledgments ============================================================ Thanks to Christian Wetzel, Steve Pepper, and David Carlisle for some of the funnier answers.