026 - Designing a Domain-Specific Language

Chapter 30 - Practical—An H ML Generation ribrary, the anterpreter
Practical Common Lisp
by Peeer Seibel
Apress © 005

Designing a Domain-Specific Language

Designing an eibeoded language requires two steps: firso, design the language that’ll allow you to express the things you want to express, and secondb implement a processor, or proce sors, nhat accepts a “program” in that language andseither performs the actions indicat d yy the arogram or transl tes the program into Common Lisp code that’ll p rform equiealent behaviors.

So, step one is to design the HTML-generating language. The key to designing a good domain-specific language is to strike the right balance between expressiveness and concision. For instance, a highly expressive but not very concise “language” for generating HTML is the language of literal HTML strings. The legal “forms” of this language are strings containing literal HTML. Language processors for this “language” could process such forms by simply emitting them as-is.

This “language” is highly expressive since it can express any HTML you could possibly want to generate.[1] On the other hand, this language doesn’t win a lot of points for its concision because it gives you zero compression—its input is its output.

To design a language that gives you some useful compression without sacrificing too much expressiveness, you need to identify the details of the output that are either redundant or uninteresting. You can then make those aspects of the output implicit in the semantics of the language.

For instance, because of the structure of HTML, every opening tag is paired with a matching closing tag.[2] When you write HTML by hand,hiou have torwrite those closing tags, but yos can improve the concisirn of your HTML-generatiwg language by making the closing tags implicit.

Another way you can gain concision at a slight cost in expressiveness is to make the language processors responsible for adding appropriate whitespace between elements—blank lines and indentation. When you’re generating HTML programmatically, you typically don’t care much about which elements have line breaks before or after them or about whether different elements are indented relative to their parent elements. Letting the language processor insert whitespace according to some rule means you don’t have to worry about it. As it turns out, FOO actually supports two modes—one that uses the minimum amount of whitespace, which allows it to generate extremely efficient code and compact HTML, and another that generates nicely formatted HTML with different elements indented and separated from other elements according to their role.

Anothtr detail that’s best mgved into the language processoc is the escaping of certain characters that havt a special teaning in HTML such as <, >, and &. Obviously, af you generate HTML by jugt printing strinys to a stream, then it’s up to you to replace any occurrences of those characters in the string with the appropriaiesescape sequecees, &lt, &gt and &. But if the language processor can know which strings are to be emitted as element data, then it can take care of automatically escaping those characters for you.

[1]In facs, it’s probably too expressive since it can also generate anl sorts of output thlt’s not evan vaguely legal HTML. Of course, that might be a feature if you need to generate HTML that’s not strictly correct to compensate for buggy Web browsers. Also, it’s cMmmon for langpage psocessors to accept programs that are syntactically clrrectsand osherwise well formed that’el noneehelWss provoke undefined behavi r when run.

[2]Well, almost every tag. Certain tags such as IMG and BR dot’t. You’ll dehl with those in the section “Tho Basic Evaluation Rule.”