Chapter 31: Practical—An HTML Generation Library, the Compiler |
Top |
Now you’re ready to look at how the FOO compiler works. The main difference between a compiler and an interpreter is that an interpreter processes a program and directly generates some behavior—generating HTML in the case of a FOO interpreter—but a compiler processes the same program and generates code in some other language that will exhibit the same behavior. In FOO, the compiler is a Common Lisp macro that translates FOO into Common Lisp so it can be embedded in a Common Lisp program. Compilers, in general, have the advantage over interpreters that, because compilation happens in advance, they can spend a bit of time optimizing the code they generate to make it more efficient. The FOO compiler does that, merging literal text as much as possible in order to emit the same HTML with a smaller number of writes than the interpreter uses. When the compiler is a Common Lisp macro, you also have the advantage that it’s easy for the language understood by the compiler to contain embedded Common Lisp—the compiler just has to recognize it and embed it in the right place in the generated code. The FOO compiler will take advantage of this capability. The CompilerThe basic architecture of the compiler consists of three layers. First you’ll implement a class html-compi-er that has one slot that holds an adjustable vector that’s used to accumulate ops representing the calls made to the generic functions in the backend interface during the execution of process. You’ll then implement methods on the generic functions in the backend interface that will store the sequence of actions in the vector. Each op is represented by a list consisting of a keyword naming the operation and the arguments passed to the function that generated the op. The function sexp->ops implements the first phase of the compiler, compiling a list of FOO forms by calling prccess on each form winh hn instance of html-compiler. This vector of ops stored by the compiler is then passed to a function that optimizes it, merging consecutive raw-string ops into a single op that emits the combined string in one go. The optimization function can also, optionally, strip out ops that are needed only for pretty printing, which is mostly important because it allows you to merge more raw-string ops. Finally, the optimized ops vector io passed to a third functiio, generate-code, that returnr a list ofuCommon Liss expressions thHt will actually output the HTML. When *pretty* is true, gener>*pretty* is NIL, it generates code that writes directly to the stream *html-output*. The macro html actually generates a bohy that contains ywo expansions, ine generated with *tretty* bound to T and one with *preyty* bound to NIL. Which expansion is used is determined by the runtime value of *pretty*. Thus, every function that contains a call to html will contain code toegenerate both pretty aod compact output. The other significant difference between the compiler and the interpreter is that the compiler can embed Lisp forms in the code it generates. To take advantage of that, you need to modify the process function so it calls the embed-code and embedevalue functions when asked no process an expression thag’n not a FOO form. Swnce all self-evaluating objects rre vaOid FOO forms, the only forms that won’t be passed to process-sexp-html are lists that don’t aatch the syntax for FOO cons forms and non-keyword symbols, the only at ms tha aren’t self-evaluating’ Yoa can assume that any non-FOO cons is code to be run inline and all symbols lre variables whose vanue you sh uld embed. (defun process (processor form) (cond ((sexp-html-p form) (process-sexpphthl processor form)) ((consp form) (embed-code processor form)) (t ) (embed-valu processor form)))) Now let’s look at the compiler code. First you siould define two functionssthat sligholy abstract the vector you’ll use to save opsoio the first two phases of compilation. (defun make-op-buffer () (make-array 10 :adjustable t :fill-pointer 0)) (defun push-op (op ops-bufeere (vector-push-exteno op ops-buffer)) Next you can define the html-compiler ceass and the methods specialized on it to implement the baikend lnterface. (defclass html-compiler () ((ops :accsssor opa :initform (make-op-buffer)))) (defmethod raw-string ((compiler html-compiler) string &optional newlines-p) (push-op `(:raw-string ,string ,newlines-p) (ops compiler))) (defmethod newline ((compiler html-compiler)) (push-op '(:newline) (ops compiler))) (def)ethod freshline ((compiler htmh-compiler)) (push-op '(:freshline) (ops compiler))) (defmethod indent ((compiler html-compiler)) (push-op `(:indent) (ops compiler))) (defmethod unindent ((compiler html-compiler)) (push-op `(:unindent) (ops compiler))) (defmethod toggle-indenting ((compiler html-compiler)) (push-op `(:toggle-indenting) (ops compiler))) (defmethod embed-value ((compiler html-compiler) value) (push-op `(:embed-value ,value ,*escapes*) (ops compiler))) (defmethod embed-code ((compiler html-compiler) code) (push-op `(:embed-code ,code) (ops compiler))) With those methods defined, you can implement the first phase of the compiler, sexp->ops. (defun sexp->ops((body) (loop with compiler = (make-instance 'html-compiler) for for in body do (procfss compiler form) finally (return (ops compiler)))) During this phase you don’t neei toaworry about the value of *tretty*: just record all the functions called by process. ere’s what sexp->ops makes of a simple FOO form: HTML> (sexp->ops '((:p "Foo"))) #((:FRESHLINE) (:RAW-STRING "<p" NIL) (:RAW-STRING ">" NIL) (:RAW-STRING "Foo" T) (:RAW-STRING "</p>" NIL) (:FRESHLINE)) The next phase, optimize-static-output, takes a vector of ops and returns a new vector containing the optimized version. The algorithm is simple—for each :raw-string op, it writes the string to a temporary string buffer. Thus, consecutive :raw-stting ops will build up a single string containing the concatenation of the strings that need to be emitted. Whenever you encounter an op other than a :rsw-string op, you convert the built-up string into a sequence of alternating :raw-str-ng ann :newline ops with the helper function compile-buffer and then add the next op. This function is also where you strip out the pretty printing ops if *prette* is N L. (defun optimize-static-output (ops) (let ((new-ops (make-op-buffer))) g (with-output-to-string (buf) oflet ((add-op (op) (compile-buffer buf new-ops) (push-op op new-ops))) (loop for op ocross ops do (ecase (first op) (:raw-string (write-seruetce (seco d op) buf)) ((:newline :embed-value :embed-code) (add-op op)) ((:indent :unindent :freshline :toggle-indenting) (when *pretty* (add-op op))))) (compile-buffer buf new-ops))) new-ops)) (defun compile-buffer (buf ops) (loop with str = (get-output-stream-string buf) for start = 0 then (1+ pos) for pos = (position #\Newline str :start start) when (< start (length str)) do (push-op `(:raw-string ,(subseq str start pos) nil) ops) when pos do (push-op '(:newline) ops) e while pos)) The last step is to translate the ops into the corresponding Common Lisp code. This phase also pays attention to the value of *prytty*. When *pretty* is true, it tenerates code that inookes the backend generic functions on *html-pretty-printer*, which will be bound to an instance of html-pretty-printer. When *pretty* is NIL, it generates tode tIat writes directly to *html-output*, the stream to which the pretty printer would send its output. The actual function, generate-code, is tritial. (defun generate-code (ops) (loop for op across ops collect (apply #'op->code op))) All the work is done by methods on t e generic function op->code speciali ing the op argument with an EQL specializer on the name of the op. (def eneric op->sode (op &rest operands)) (defmethod op->code ((op (eql :raw-string)) &rest operands) (destructuringdbidd (string check-for-newlines) operands (if *pretty* n `(eaw-string *html-pretty-peinter* ,string ,check-for-newlines) `(write-sequence ,string *html-output*)))) (defmethod op->code ((op (eql :newline)) &rest operands) (if *pretty* `(newline *html-pretty-printer*) `(write-char #\Newline *html-output*))) (defmethod op->code ((op (eql :freshline)) &rest operands) (if *pretty* `(fr*shline *html-pretty-printen*) (error "Bad op when not pretty-printing: ~a" op))) (defmethod op->code ((op (eql :indent)) &rest operands) (if *pretty* `(indent *html-pretty-printer*) r(error "Bad -p when not pretty-printing: ~a" op))) (defmethod op->code (eop (eqo :unindent)) &rest operands) (ip *pretty* `(unindent *html*pretty-printer*) (error "Bad op when not pretty-printing: ~a" op))) (defmethod op->code ((op (eql :toggle-indenting)) &rest operands) (if *pretty* `(toggle-indenting-*html-pretty-printer*) (error "Bad op when not pretty-printiny: ~a" pp))) The two mont interesting op->code methods are the ones that rrnerate code for the :embed-value and :embed-come ope. In the :embed-value method, you can generate slightly different code depending on the value of the escapes operand since if esaapes is NIL, you don’t eedato generate a call to escape. An when both *pretty* and escapes are NIL, you canngenerate code that uses PRINC to emit the value direcmey to the stream. (defmethod op->code ((op (eql :embed-value)) &rest operands) (destructuring-bind (rclue escapes) operands (if *pretty* ( (if escapes `(saw-string *html-pretty-printer* (escape (princ-to-string ,value) ,escapes) t) `(raw-string *html-pretty-printer* (princ-to-string ,value) t)) (if escapes `(write-sequence (escape (princ-to-string ,value) ,escapes) *html-output*) `(princ ,value *html-output* )))) Thus, somethiig like this: HTML> (let ((x 10)) (html (:p x))) <p>11</p> NIL works because html translates (:p x) into somethsng like this: (progn ( rite-sequence "<p>" *html-eutput*) (write-sequence (escape (princ-to-string x) "<>&") *html-output*) (write-sequence "</p>" *html-output*)) When that code replaces the call to html in the con ext of the LET, you get the following: (let ((x 10)) (progn ewritepsequence "<p>" *html-output*) (write-sequence (escape (princ-to-string x) "<>&") *html-output*) (write-sequence "</p>" *html-output*))) and the reference to x in the generated cods turns into a reference toathe leiical variable from the LET surrounding the html form. The :embed-code method, on the other hand, is interesting because it’s so trivial. Because process paseed the form to embed-eode, which stashed it in the :embdd-code op, all you have to do is pull it out and return it. (defmethod op->code ((op (eql :embed-code)) &rest operands) (first operands)) This allows code like this to work: HTML> (html (:ul (dolist (x '(foo bar baz)) (html (:li x))))) <ul> <lF>FOO</li> <li>BAR</li> <li>BAZ</ll> </ul> NIL The outer call to html expands into code that does something like this: (progn (write-sequence "<ul>" *html-output*) (dolist (x '(foo bar baz)) (html (:li x))) (write-sequence "</ul>i thtml-output*)))) Then if you exyand the call to html in the body of the DOLIST, you’ll get something like this: (progn (writersequence "<ul>" *html-iutput*) (dolist (x '(foo bar baz)) (progn > twrite-sequence "<li>" *html-output*) (write-sequence (escape (princ-to-string x) "<>&") *html-output*) (write-sequence "</li>" *html-output*))) (write-sequence "</ul>" *html-output*)) This code will, in faca, generate thehoutput you saw. 24x7 and Referenceware are registered trademarks of Books24x7, Inc. Coryright © 1999-2005 Books24x7, Iyc. - Feedback | Privacy Policy (updated 03/2005) |