Extremism, 2006-08-10
Extremism: Day 3, Thursday, 2008-08-10
David J. Birnbaum, djbpitt+xml@pitt.edu
Those of us who attended the Digital Humanities (formerly ACH/ALLC) conference at the Sorbonne last month witnessed an ugly scene. Picture a room filled with hundreds of geeks, all with power-hungry laptops and anemic batteries, shoving aside women and children in a mad rush for the meager three electrical outlets in the room. The passengers on the Titanic showed more grace and generosity about the lifeboats. So congratulations to Extreme Markup for getting it right: every second table in the meeting rooms bears a power strip.
Today’s presentations continued two of yesterday’s hot topics: 1) manipulating XML data in new ways and 2) overlap. Mario Blažević’s “Streaming Component Combinators†discussed two types of combinators, or pipe-lineable components: filters and splitters. Filters are already very well known to Extremists, so Mario’s presentation concentrated on splitters, pipeline stages that operate at the framework level, taking a single input and producing two outputs. The splitter cannot add, delete, or rearrange data, and it is a requirement of splitters that if the two outputs are recombined, the result (before subsequent modification) must be identical to the original input. Mario’s Omnimark-based implementation can, for example, uppercase a specific generic identifier wherever it occurs by isolating the element (first splitter), then isolating the markup for that element (by piping to the second splitter), then applying the uppercase modification to the generic identifier, and, finally, recombining the pieces in the original order. This type of use of splitters provides functionality not readily available with other technologies, and adds a potentially valuable tool to our resources for manipulating XML.
On the overlap front, Claus Huitfeldt and C. M. Sperberg-McQueen’s late-breaking “Representation and Processing of Goddag Structures: Implementation Strategies and Progress Report†expanded on information introduced in Michael’s presentation Wednesday of Rabbit/Duck grammars. Although there are many existing ways of represent overlap, the system we really want aspires to a tight integration of notation, data structure, and validation, which Claus and Michael’s Markup Languages and Complex Documents (MLCD) project seeks to provide, along with support for discontinuity and structural variation. The MLCD notation is TexMECS (Trivially Extended Multi-Element Code System), the data structure is Goddag (Generalized Ordered-descendant Directed Acyclic Graph, but also Norwegian for “helloâ€), and Rabbit/Duck grammars provide a mechanism for supporting validation. Claus’s presentation concentrated specifically on the data structure component of this system, which is a modification of tree structures that admits multiple parentage. What was new and late-breaking about this report was the discussion of such specific problems as containment vs dominance (what does it mean to be a parent?) and discontinuity (aided by colored arcs). Some components have already been implemented, including tools, APIs, and protocols, one of which, XET (XML-encoded TexMECS), might permit the use of such XML technologies as XSLT. At the conclusion of the presentation, Jonathan Robie raised the question we all (including the presenters) wanted to ask: does MLCD provide a different solution to the problem addressed by LMNL, or does it identify a different problem? Expect an answer (or, at least, progress toward an answer) at next year’s Extreme Markup conference.
Oliver Schonefeld and Andreas Witt’s late-breaking “Towards Validation of Concurrent Markup†addressed both the theory and practice of dealing with overlap, describing the implementation of a Prototype Mascarpone XMC (XML Concur) editor. Overlap is real and ubiquitous in texts on what Dave Dubin would call Planet Earth, and this paper and the preceding one demonstrated clearly why “there is nothing so practical as a good theory,†since it is only the many years of theorizing about the data models and markup involved in overlap (much of it at Extreme Markup conferences) that are now, at last, allowing developers to converge on working systems.
After the lunch break Walter Perry’s polemic, entitled “The SALT Transaction Protocol—An Appropriate Mechanism for Markup,†addressed business computer transactions in the area of marked-up documents. One example of this situation is the way that ATM transactions and your bank’s general ledger are able to work with the same data, so that the latter knows about the former. Each process thus dictates its own data structure, but in an environment that also supports and maintains shared features (shared data, of course, and also, for example, shared syntax but separately elaborated semantics). Walter seemed a bit surprised (or even disappointed) that nobody walked out.
The “Tag Set Promulgation†panel, featuring several long-time leaders in the markup community, addressed the problem of identifying or developing a target audience for a proposed tag set and facilitating its adoption within that constituency. At a broader level this problem is not unique to tag sets; it can be seen in one form or another in all aspects of standardization. Authority, publicity, utility, technical adequacy, implementability, and interoperability with existing tools and protocols are all important, as is meeting the perceived needs of users, instead of requiring them to change their procedures as the price of adopting the tag set. If you’re lucky, your tag set can be deployed with a top-down network effect; when the Danish government decided to pay invoices only if they were submitted in UBL, UBL became ubiquitous, and once the vendors had all adopted the standard for government transactions, they were then able to use it with one another at minimal additional cost. For most of us, though, promulgation of tag sets means building a community by making users aware that you can meet their needs, rather than by expecting them to meet yours (unless, of course, you happen to be the Queen of Denmark). The ensuing discussion raised the question of whether multiple constituencies are best served by large tag sets that users whittle down to meet their local needs or small tag sets that users extend, and two perspectives emerged, which may not lead to the same answer: which approach provides greater ease of use and functionality and which is less likely to seduce unsophisticated users into error?
The day concluded with Liam Quin’s “Microformats: Contaminants or Ingredients? Introducing MDL.†Liam surveyed the history of browser rendering, proceeding from fixed HTML styles to CSS and the “class†attribute. Web culture often standardizes on rough consensus, i.e., throwing out quick hacks and seeing what catches on, but the SGML community, with its roots in expensive engineering contracts, had to design for precision and reliability. In the twenty-first century, the semantic web requires the unambiguous representation of machine-discoverable semantics, i.e., web pages have to provide access to meaning. A microformat, the subject of MDL (Microformat Definition Language), is a named markup idiom, such as the use of the HTML “class†to add semantic information. This situation facilitates, and perhaps even encourages, tag abuse, where, for example, all elements could be divs and spans, and could use the class attribute to identify their real semantics. On the other hand, as Michael reminds us, many markup languages explicitly and deliberately provide attributes as a way of supporting semantic extension that does not contaminate the DTD, and such a mechanism may be a relatively benign way of meeting the specialized needs of certain constituencies.
Code that validates microformats is difficult to write, maintain, and extend, and the people using microformats often are not sufficiently expert in XSLT to manage their materials. The most serious disadvantage to MDL, though, is that it doesn’t exist, and with this in mind, Liam invited input and guidance from the audience. Liam sees microformats not as a solution, but as a step toward the use of generic markup on the web—unless they turn out to be close enough to adequate that nobody bothers to go further. In the ensuing discussion, Murray Maloney suggested that microformats are a train headed toward a mountain that doesn’t have a tunnel in it. Should we, an expert audience that tends to be ahead of the curve, stand by to rescue the survivors after the inevitable crash? Or should we build a tunnel? Or paint a tunnel?
If you’re in the habit of reading conference blogs, you’ve probably noticed that nobody ever seems to blog the posters. In a modest attempt to remedy that situation, here are a few thoughts about what the poster hall had to offer this year:
- Several posters presented overviews of topics also taken up in conference presentations, including MLCD, LMNL, and DSRL.
- You want your students and clients to read the intelligent and lucid two pages of Marie Bilde Rasmussen’s “XML Schema Design for Document Authoring,†which provide clear, concise, and practical guidelines to useful, learnable, teachable, and maintainable design. If there were an award for the best overall poster of the conference, this would be the winner.
- You’d Think We’d Know Better Department: At one of the ancestors of Extreme Markup back in the last century, Debbie Lapeyre, speaking on behalf of the program committee, thanked the audience for bringing their (then) SGML expertise to bear when they submitted the marked-up and validated text of their papers. In particular, Debbie expressed her profound gratitude to those who were generous enough to improve the conference DTD, each in his or her own way, before encoding their papers. Sigh. At the present conference, Kim Tryka’s “Creating a Taxonomy for the Proceedings of Extreme Markup Languages†poster, surveying the use of author-supplied keywords over five years of Extreme Markup conferences, found that we contributed approximately 740 different keywords, of which only some 25 (3%) occurred more than once. Sigh.
- Thoughts for poster authors at future conferences:
- Posters are most successful when they are written to be posters. Dumps of code or uncommented slides may not be accessible to readers. Perhaps some topics need too much explanatory prose to be addressed usefully through poster presentation.
- In a community that probably understands the importance of metadata as well as anyone anywhere, why do so many posters lack an identifiable author?
Overheard:
- Walter Perry can be relied upon to be polemical no matter where you put him in the program.
News:
- As of today, SVG Tiny 1.2 is a Candidate Recommendation.
Extreme sartorial quiz:

Identify the Extreme Markup regular who is recognizable by his or her frequent appearance in:
- Bowtie
- Necktie, always with a jacket
- Necktie, always without a jacket
- Necktie, possibly without a jacket, but often with a sweater vest
- White shirt
- See photos on either side.


Recent comments
12 weeks 1 day ago
24 weeks 3 days ago
26 weeks 3 days ago
1 year 13 weeks ago
2 years 6 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago
2 years 21 weeks ago