Extremism, 2006-08-09
Extremism: Day 2, Wednesday, 2006-08-09
David J. Birnbaum, djbpitt+xml@pitt.edu
The second day of Extreme Markup 2006 was dominated, even if largely because of the serendipitous hard work of the late-breaking fairy, by two themes: alternative syntax and overlap.
That XSLT is written in XML means that we can author XSLT with intelligent XML editors, that we can validate XSLT with XML validating parsers, and that we can even use XSLT to generate more XSLT. Sounds pretty good, right? In fact, it sounds so good that it has become an article of faith: It is a Good Thing that XSLT is XML, and many of us repeat this to our colleagues or students or clients without ever reconsidering our underlying assumptions, and without thinking about the price tag attached to this Good Thing. Extreme attendees this year were fortunate to have those assumptions questioned in (very) late-breaking papers by two of our most distinguished industry veterans, Sam Wilmott and Lynne Price.
Sam is a professional designer of programming languages who spent many of his formative years as the primary architect of Omnimark, the SGML industry leader in transformation languages. Many of us are XML geeks who learn and use programming languages (including XSLT) as we need them, but who have little occasion to think of programming language design as itself a proper object of study. Sam’s perspective as an expert both in the design of programming languages in general and in transformation languages for structured text as a practical tool allowed him to draw our attention to facets of XSLT design that many of us have long accepted without reflection. Those of us who use XSLT on a regular basis have surely always understood, at least vaguely, that notation is important, and that white space is both important and persnickity, but after listening to Sam we are better able to understand why these issues play the role that they do in XSLT.
Sam began his “Rethinking XSLT†with a thought experiment: What would a C program look like if it were coded in XML? C programs don't identify themselves explicitly as C programs, the components (e.g., “ifâ€) don’t have to proclaim themselves parts of the language (rather than data), and there is a minimum of markup overhead (e.g., the argument to “if†is necessarily a test, and doesn't need to announce the fact). As Mr. Rosen, my ninth-grade math teacher, used to say when he had reached the obvious point in a geometric proof and wasn’t about to waste his time with any more formal argumentation, “any cluck can see†that the C program is easier to read, minimally redundant, and involves less typing. C notation is advantageous to human beings; XSLT notation is advantageous to computer programs, but a lot of XSLT is written by people. (Sam didn’t mention the overhead of reserved words involved in C-like notation, but some might argue that this is unlikely to lead to serious problems in real applications.)
If we consider the implications of this experiment and think of XSLT first and foremost as a programming language, removing the XML artifacts from the non-XML part of the program (and the only part that absolutely has to be XML is the literal result portion of template rules), we can derive a notation that feels natural and comfortable and that may offer advantages in legibility (development, maintenance, learning) over the current all-XML-all-the-time model. Where XSLT must be intermixed with result elements, Sam’s RSXLT uses curly braces, a convention familiar from attribute value templates in XSLT (and, although Sam didn’t mention this, from XQuery). Sam further asserted that every markup language gets white space wrong: “It’s the emperor's clothes of markup languages. If you come up with a new markup language, you’re going to get it wrong again.†The advantage of RXSLT over standard XSLT in this respect is less clear, but the more concise notation of RXSLT does create the possibility of making white space easier to see.
So is RXSLT really an improvement on XSLT? One XSLT guru argued that it isn’t difficult to learn to read standard XSLT notation, that the editor manages the end tags for you, and that the overhead of reserved words is too high a price. These arguments have merit, of course, but although Sam didn’t raise this analogy, most of us who write RelaxNG prefer to use the compact syntax for precisely the notational reasons that Sam advanced in favor of RXSLT. And this comparison raises an issue that did emerge in the discussion: if RXSLT is more convenient when humans (or, at least, for some humans) need to write stylesheets by hand and conventional XSLT notation has other advantages (e.g., it can be generated and processed by XSLT just like other XML), perhaps what is needed is the ability to transform one’s stylesheets from one syntactic representation to the other, much as one can do with the RelaxNG compact and verbose syntactic notations.
A related topic arose in the late-breaking presentation by Lynne Price (one of the original developers of FrameMaker+SGML), entitled “A Non-XSLT DTD for Editing XSLT.†Lynne demonstrated the use of the FrameMaker environment to create a representation of XSLT that supported cut-and-paste editing with guaranteed control over validity, indentation, consistent use of names (with automatic global updating), and generation of reports (such as indices of templates and template calls) that facilitate review of coding logic. Lynne’s motivating insight was that we don’t need to employ the same tool to edit XSLT and to use it to transform documents, and her system edits an alternative syntax that is optimized for editing, after which she uses XSLT to convert that alternative representation to real XSLT. Lynne’s system keeps track of and prints nesting levels (handy for nested chooses), color-codes parentheses (to keep track of nesting), and permits embedding comments within comments, which is not possible in standard XSLT because it is not permitted in XML. To be sure, Lynne’s system, unlike Sam’s, is always valid XML, but what unites the two approaches is an awareness that an optimal notation for authoring XSLT may differ from standard XSLT syntax, but transformations between the two are possible, and permit us to choose the appropriate syntax for each phase of our processing. (For an earlier examination of the use of different schemas at different stages in the life-cycle of an XML document [not XSLT, in this case], interested persons might wish to consult a paper from the first [2000] Extreme conference that discussed the advantages of developing a special schema for authoring and then converting to an alternative representation once authoring was complete.)
Two papers today addressed overlap, a theme that returns to every Extreme conference because nobody is yet satisfied with the available solutions. C. M. Sperberg-McQueen’s “Rabbit/Duck Grammars: A Validation Method for Overlapping Structures†reviewed some of the fundamental issues involved in encoding and processing multiple non-synchronous hierarchies expressed over one set of data. Michael’s model divides elements and tags into four classes (normal, milestone, transparent, and invisible) within a particular grammar. In the ensuing discussion, Wendell Piez pointed out that some of the problems Michael sought to resolve might alternatively be handled with Schematron, the costs and benefits of which might merit further examination. Michael’s paper provided a useful background for John Cowan’s late-breaking report on the current state of LMNL, which is beginning to move forward after four years of stasis; the developers have rethought layers and relaxed somewhat the rules for names, and are continuing to work on selecting appropriate notations. (The image to the right is taken from http://ist-socrates.berkeley.edu/~kihlstrm/JastrowDuck.htm.)
Also worthy of attention:
- Felix Sasaki’s late-breaking “Architectural Forms, CSS, and RDF What Do They Have in Common and Why Should You Care?†Felix’s approach involves the use of XSLT to generate XSLT, which highlights the benefits that accrue from an XML processing language that itself is written in XML (or, rather, in light of Sam’s suggestions, that can be expressed as XML). Felix’s task was to combine features of the three technologies mentioned in his title to add information to documents without compromising the integrity of the source, enabling him, for example, to mark selected portions of his source for automated translation while leaving other portions untouched.
- “Metadata Enrichment for Digital Preservation,†by David Dubin, et al., explored the problem of automating the processing of bad data, which is pretty much what we all face with legacy projects. David’s sample involved a real project that included aerial photographs allegedly produced by an author named “Aerial Photographs,†with conflicting values for the same features, and with crucial semantic information that we would like to find in markup encoded instead in character data. As David reminded us, this is the data one finds on Planet Earth, and while Michael corrected identified the bad data as the enemy, David’s report suggested that it was an enemy that we were not going to vanquish through indignation, and that a more constructive approach might involve deducing mapping rules that build a network of object instances, property-value pairs, and relation occurrences.
- Today’s award for the serious paper with the best sense of humor was Martin Bryan’s polemic, a call to the barricades entitled “DSRL—Bringing Revolution to XML Workers.†Martin’s discussion of the ISO/IEC JTC1/SC34 Document Schema Renaming Language (ISO/IEC 19757:8. DiSRuLe), punctuated by raised fists and revolutionary chants, proposed a formal mechanism for mapping not only element names (as would be possible, after all, with architectural forms), but also almost any other component of an XML document. See http://www.dsdl.org for details.
Overheard:
- A pure XML conference is more pragmatic, “how you achieve things,†more oriented toward business. (In praise of Extreme Markup)
- The philosophical presentations are interesting, but I don’t think I’d want to pay money to hear them. (Not in praise of Extreme Markup)
- I’ve never actually run any of this stuff through an XSLT processor. I’m interested in the syntactic issues, not the functional issues. (Sam Wilmott on RXSLT)
What they’re drinking in the Mulberry suite this year:
- Bowmore 17
- Lagavulin 16
- Macallan 12
- Auchentoshen 10


Recent comments
12 weeks 13 hours ago
24 weeks 2 days ago
26 weeks 2 days ago
1 year 13 weeks ago
2 years 6 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago