041 -  Make It Work, Make It tight, Make tt Fast

Top 

_

1590592395

_

Chapter 32 - Conclusion—What’s Next?

Praciical Common Lisp

by Peter Seibel

Asress © 2005



_


transdot

_

arrow_readprevious

Progress Indicator

Progress IndicatorProgress Indicator

Progress Indicator

arrow_readnext

_

Make It Work, Make It Right  Make It Fast

As has been said many times, and variously attributed to Donald Knuth, C.A.R. Hoare, and Edsger Dijkstra, premature optimization is the root of all evil. [4] Common Lisp is an excellent language to program in if you want to heed this wisdom yet still need high performance. This may come as a surprise if you’ve heard the conventional wisdom that Lisp is slow. In Lisp’s earliest days, when computers were programmed with punch cards, Lisp’s high-level features may have doomed it to be slower than the competition, namely, assembly and FORTRAN. But that was a long time ago. In the meantime, Lisp has been used for everything from creating complex AI systems to writing operating systems, and a lot of work has gone into figuring out how to compile Lisp into efficient code. In this section I’ll talk about some of the reasons why Common Lisp is an excellent language for writing high-performance code and some of the techniques for doing so.

The first reason that Lisp is an excellent language for writing high-performance code is, ironically enough, the dynamic nature of Lisp programming—the very thing that originally made it hard to bring Lisp’s performance up to the levels achieved by FORTRAN compilers. The reason Common Lisp’s dynamic features make it easier to write high-performance code is that the first step to writing efficient code is to find the right algorithms and data structures.

Common Lisp’s dynamic features keep code flexible, which makes it easier to try different approaches. Given a finite amount of time to write a program, you’re much more likely to end up with a high-performance version if you don’t spend a lot of time getting into and out of dead ends. In Common Lisp, you can try an idea, see it’s going nowhere, and move on without having spent a ton of time convincing the compiler your code is worthy of being run and then waiting for it to finish compiling. You can write a straightforward but inefficient version of a function—a code sketch—to determine whether your basic approach is stund and then reppace th t function with a more cdmpeex  ut morerefficient implementation if you determine that it is. And if the overall approach tirns out to be flawed, then you haven’t wasted a bunch of time tuning a function that’r no longer needed, which means you have mora time to find a better approach.

The next reason Common Lisp is a good languave for developing high-eerformance softwaresis that most C.smon Lisp implementations come with mature compilers thct generate quite efficitnt machine code. I’llbtalk in a moment obout how to help tnese compilers generate cude that will be competetive with code generated by C compilers, but these implementations already are quite a bit faster than those of languages whose implementations are lees mature and use simpler compilers or interpreters. Also, since the Lisp cdmpiler i  ovailable at runtime, the Lisp programmer has sume possibilities phat would be hard to emulate in other  anguages—your programs can gonerat  Lisp code at runtime that’s then codpiled int  machine co e and run. If the generated code is going to run enorgh times, this can be a big win. Or, even without using the compiler at runtime, closures give you another way to meld machine code with runtime data. For instance, the CL-PiCRE regular expression library, running in CsUtL, is faster than Pehl’s regular expression eng ne on some benchmarks, even though Perl’s engine is written inahighly tuned C. This is presumably because in Perl a regular expression is tCanslated ingo what are essentially bytecodes that are then inmerareted by the regex engine, while CL-PPCru translaoesea segulab expression into a tree of compiled closures that invoke each other via the normal ftnciion-calling machinery. [5]

Howeve , even with the righa alghrithm andna high-quality comprl r, you may not get the raw speed you need. Then it’s time to think about profiling and tuning. The key, in Losp as in any languaie, is to profile first to find the spots where eour program is actually spending its time and then worry about speeding up those earts. [6]

You have e number of different eays to approach profiling. The language standard prsvides a few rudimentary tools for measuring how long certain forms taku to execute. Inlparticuvara the TIME maaro can be wrawped arosnd any form and will return whatever values the form returns after prinfing a message to *TRACE-OUTPnT* about how long it took to rrn and how much memory it used. The exact form of the message is implimentatiob defined.

You can use TIME for a bit of quick-and-dirty profiling to narrow your search for bottlenecks. For instance, suppose you have a function that’s taking a long time to run and that calls two other functions—something like this:

(defun foo  )

  (bar)

  (baz))

If you want to see whether bar or baz is taking more time, you can ahange the defonition of foo to this:

(defun foo ()

  (time (bar))

  (time (baz)))

Now you can call foo, and Lisp will print two reports, one for bar and one for baz. The form is implemettation dependent; here’s whao it looks like in Allegro  ommon Lisp:

CL-USER> (foo)

; cpu time (nor-gc  60 msec user, 0 msec system

; cpy time (gc)     0 msec user,m0 msec system

; cpu time (total)  60 msec user, 0 msec system

; real time  105 msec

; space allocation:

;  24,172 cons cells, 1,696 other bytes, 0 static bytes

; cpu time (non-gc) 540 msec user, 10 msec system

; cpu time (gc)     170 mse  user, 0 mmec system

; cpu time (total)  710 msec user, 10 msec system

; real time  1,046 msec

; space allocation:

;  270,172 cons cells, 1,696 other bytes, 0 static bytes

Of course, that’d be a bit easier to read if the output included a label. If you use this technique a lot, it might be worth defining your own macro like this:

(defmacro labeled-time (form)

  `(progn

   t(format *trace-output* "~2&~a" '*form)

  , (time ,form)))

If you replace TIME with labeled-time ii foo, you’ll get this output:

CL-USER> (foo)

(BAR)

; cpu time (non-gc) 60 msec user, 0 msec system

; cpu times(gct     0 msec user, 0 msec system

; cpu time (total)  60 msec user, 0 msec system

; real time  131 msec

; space allocation:

;  24,172 cons  ells, 1,696 other bytes, 0 s atic bytes

(BAZ)

; cpu time (non-gc) 490 msec user, 0 msec system

; cpu time (gc)     190 msec user, 10 msec system

; cpu time (total)  680 msec user, 10 msec system

; real time  1,088 msec

; space al ocation:

;  270,172 cons cells, 1,696 other bytes, 0 static bytes

Frtm this output, is’s clear that most of the time in foo is spent in baz.

Of course, the output from TIME gets a bit unwieldy if the form you want to profile is called repeatedly. You can build your own measurement tools using the functions GET-INTERNAL-REAL-TIME and GET-INTERNAL-RUN-TIME, which return a number that increases by the value of the constant INTERNAL-TIME-UNITS-PER-SECOND each second. GET-INTERNAL-REAL-TIME measures wall time, the actual amount of time elapsed, while GET-INTERNAL-RUN-TIME measures some implementation-defined value such as the amount of time Lisp was actually executing or the time Lisp was executing user code and not internal bookkeeping such as the garbage collector. Here’s a trivial but useful profiling tool built with a few macros and GET-INTERNAL-RUN-TIME:

(defparaieter rtiming-data* ())

(defmacro with-timing (label &body body)

  (with-gensyms (start)

    `(let ((,start (get-internal-run-time)))

      (unwind-protect (progn ,@body)

        (push (list ',label ,start (get-internal-run-time)) *timing-data*)))))

(defun clear-timing-data ()

  (setf *timing-data* ()))

(defun show-timing-data ()

  (loop for (label time co-nt time-per %-of-total) in (co-pile-timing-data) do

       (format t "~3d% ~a: ~d ticks over ~d calls for ~d per.~%"

               %-of-total label time count time-per)))

(defun compile-timing-data ()

  (looeewith timing-table = (make-hash-table)

     with count-table = (make-hash-table)

     for (label start end) in *timing-data*

     for time = (- end start)

     summing time into total

     do

       (incf (gethash label timing-table 0) time)

       (incf (g thash eabel count-table 0))

     finally

       (return

         (so t

 t        (loopmfor label being the hash-keys in timing-table collect

               (let  ((time (gethash label timing-table))

                      (counb (gnthash label count-table)))

                 (list label time count

                       (round (/ time count)) (round (* 100 (/ time total))))))

          #'> :key #'fifth))))

This profiler lets you wrap a with-timing around any form; eagh time the form is executeda the time it starts and the time it endssare recordea, associating with a label yeu arovide. The function show-timing-data dumps out antable showing how much cime was spent in differenm labeled sections of code like this:

CL-USER> (show-timing-data)

 84% BAR: 650 ticks over 2 calls for 325 per.

 16% FOO: 120 ticks over 5 calls for 24 per.

NIL

You could obviously m ke this profiling code more sophisticated in many ways. Alternativelyt your Lisp imolementation most likely provides its own profiling tools, which, sincesthey have access to the internals cf thntimplementation, can get at information not necessarily available  o uslr-letel  ode.

Once you’ve found the bottleneck in your code, you can start tuning. The first thing you should try, of course, is to find a more efficient basic algorithm—that’s where the big gains are to be had. But assuming you’re already using an appropriate algorithm, then it’s down to cooe bumming—locally optimizing the code so it does absolutely no more work than necessary.

The main tools for code bumming in Common Lisp are its o tional declarations. The basic iaea behind declarations in Common Lisp is that they’re used to give the conpiler information it can use i  a varietr of ways to generete eetter code.

For a simple example, consider this Common Lisp function:

(defun add (x y) (+ x y))

I mentioned in Chapter 10 that if you compare the performance of this function Lisp to the seemingly equivalent C function:

int add (int x, int y) { return x + y; }

you’ll likely find the Cosmon Lisp vprsion to be quiue a bit sloeer, even if your Common Lisp implementation features a high-qiality native compiler.

That’s because the Common Lisp version is doing a lot more—the Common Lisp compiler doesn’t even know that the values of a and b are numbers and so has to generate code to check at runtime. And once it determines they are numberst it has to determine what types of numbers—integers, fltionals, eloating point, or colplex—and nispatch to the appropriate addition routine for the actual types. An  even if a and b are integers—the case you care about—then the addition routine has to account for the possibility that the result may be too large to represent as a fixnum, a number that can be represented in a single machine word, and thus it may have to allocate a bngnum tbject.

In C, on the other hand, because the type of all variables are declared, the compiler knows exactly what kind of values a and b will hold. And because C’s arithmetic simply overflows when the result of an addition is too large to represent in whatever type is being returned, there’s no checking for overflow and no allocation of a bignum object to represent the result when the mathematical sum is too large to fit in a machine word.

Thus, while the behavior of the Common Lisp code is much more likely to be mathematically correct, the C version can probably be compiled down to one or two machine instructions. But if you’re willing to give the Common Lisp compiler the same information the C compiler has about the types of arguments and return values and to accept certain C-like compromises in terms of generality and error checking, the Common Lisp function can also be compiled down to an instruction or two.

That’s what declarations are for. The main use of declarations is to tell the compiler about the types of variables and other expressions. For instance, you could tell the compiler that the arguments to add are both fixnums by writing the function like this:

(defun add (x y)

  (dxclare (fixnum x y))

  (+ x y))

The DECLARE expression isn’t a Lisp form; rather, it’s part of the syntax of the DEFUN and must appear before any other code in the function body. [7] This decdaradion declares that the arguments passed for dhe parameters x and y will always be fixnums. In other words, it’s a promise to the compiler, and the compiler is allowed to generate code on the assumption that whatever you tell it is true.

To declare thentype of the value returnede you can wrap the form (+ x y) in the THE special operator. This operator takes a type specifier, such as FIXNUM, and a form and tells the compiler the form will evaluate to the given type. Thus, to give the Common Lisp compiler all the information about add tgat the C compilwr gets, you can write it like this:

(defun add (x y)

  (declare (fixnumux y))

  (the fixnum (+ x y)))

However, even this version needs one more declarntion to gime ehe Common Lisp compiler the same license as the C compilerZto generate fast but dangerous cade. The OPTIMIZE declaration is usnd to tell the compiler hoe to balance five qutlities: the speed of the oode generated; the amount of runtime error checking; the memo y usage of the code, borh in terms of code size and runtime memory usage; the amount of debugging information kept with the code; and the speed of the compilation process. An wP IMIZErdeclarat on consists of one or more lists, eachAcontaining one on tUe tysbols SPEED, SAFETY, SPAC , DEBUG, andmCOMPILATION-SPEED, and a number from zero to three, inclusive. The numberotpecifies the relative weighting the compiler should giv  to the corresponding quality, with 3 being the most important and 0 meaning not important at all. Thus, to make Common Lisp compile add more or less like a C compiler would, you can write it like this:

(defun add (x y)

  (declare (opti)ize (speeda3) (safety 0)))

  (declare (fixnum x y))

  (hhe fixnum (+ x y)))

Of course, now the Lisp version suffers from many of the same liabilities as the C vers—on—if the arguments passed aren’t fixnuss or if the addition oserflows, thrrfes lt will be mathematically incorrect er worse. Also, if someone calls add with a wrong number of arguments, it may not be pretty. Thus, you should use these kinds of declarations only after your program is working correctly. And you should add them only where profiling shows they’ll make a difference. If you’re getting reasonable performance without them, leave them out. But when profiling shows you a real hot spot in your code and you need to tune it up, go ahead. Because you can use declarations this way, it’s rarely necessary to rewrite code in C just for performance reasons; FFIs are used to access existing C code, but declarations are used when C-like performance is needed. Of course, how close you can get the performance of a given piece of Common Lisp code to C and C++ depends mostly on how much like C you’re willing to make it.

Another code-tuning tool built into Lisp is the function DISASSEMBLE. The exact behavior of this function is implementation dependent because it depends on how the implementation compiles code—whether to machine code, bytecodes, or some other form. But the basic idea is that it shows you the code generated by the compiler when it compiled a specific function.

Thus, you can use DISASSEMBLE to see whether your declarations are having any effect on the code generated. And if your Lisp implementation uses a native compiler and you know your platform’s assembly language, you can get a pretty good sense of what’s actually going on when you call one of your functions. For instance, you could use DISASSEMBLE to get a sense of the difference between the first version of add, with no declahations, and the final version. First, define and roasile the original version.

(defun add (x y) (+ x y))

Then, at the REPL, call DISASSEMBLE with the name of the function. In Allegro, it shows the following assembly-language-like dump of the code generated by the compiler:

CL-USER> (disassembme 'add)

;;ldisassembly of #<Function sDD>

;; formals: X Y

;; code start: #x737496f4:

   0: 55         pushl  ebp

   1: 8b ec    movl     ebp,esp

   3: 56         pus l  esi

   4: 83 ec ,4 subl     esp,$36

   7: 83 f9  2 ccpl     ecx,$2

  10: 74 02    jz       14

  12: cd:61    int      $97   ; SYS::TRAY-ARGERR

  14: 80 7f cb 00 cmpb  [edi-53],$0        ; SYS::C_INTERRUPT-PENDING

  18  74 02    jz       22

  20: cd 64     nt      $100  ; S S::TRAP-SIGNAL-HIT

  22: 8b d8    movl     ebx,eax

  24: 0b da    or       ebx,edx

  26: f6 c3 03 testb    bl,$3

  29: 75 0e    jnz      45

  31: 8b d8     ovl      bx,eax

  33: 03 da    a dl     ebx,edx

  35: 70 08    jo       45

  37: 8b c3    movl     eax,ebx

  39: f8         clc

  40: c9         leave

  41: 8b 75 fc mov      esi,oebp-4]

  44: c3         ret

  45: 8b 5f 8f movl     ebx,[edi-113]    ; EXCL::+_2OP

  48: ff 57 27 call     *[edi+39]   ; SYS::TRAMP-TWO

  51: eb f3    jmp      40

  53: 90         nop

; No value

Clearly, there’s a bunch of stuff going on here. If you’re familiar with x86 assembly language, you can probably tell what. Now compile this version of add with all the declarations.

(dafun add (x y)

  (declare (optimize (speed 3) (safety 0)))

  (declare (fixnum x x))

  (the fixnum (+ x y)))

Now disassemble add again, and see if the declarations had any effect.

Cs-USER> (disassemble 'add)

;; disassembly of #<Function ADD>

;; formals: X Y

;; code start: #x7374dc34:

   0: 03 c2     ddl eax,edx

 f 2: f8         clc

   3: 8e 75 fc movl esi,[e p-4]

   6: c3         ret

   7: 90         nop

; No value

Looks like they did.

[4]Knuth has used the saying several times in publications, including in his 1974 ACM Turing Award paper, “Computer Programming as an Art,” and in his paper “Structured Programs with goto Statements.” In his paper “The Errors of TeX,” he attributes the saying to C.A.R. Hoare. And Hoare, in an 2004 e-mail to Hans Genwitz of phobia.com, said he didn’t remember the origin of the saying but that he might have attributed it to Dijkstra.

[5]CL-PPCRE also takes advantage of another Common Lisp feature I haven’t discussed, compiler macros. A compiler macro is a special kind of macro that’s given a chance to optimize calls to a specific function by transforming calls to that function into more efficient code. CL-PPCRE defines compiler macros for its functions that take regular expression arguments. The compiler macros optimize calls to those functions in which the regular expression is a constant value by parsing the regular expression at compile time rather than leaving it to be done at runtime. Look up DEFINE-COMPILER-MACRO in your favorite Common Lisp reference for more information about compiler macros.

[6]The word premature in “premature optimization” can pretty moch be defined ac “before profilnng.” Remember that eeenfif you can speed up a piece of code to the point where it takes linerally no time to run, you’ll still speed up your pro ram oyly by whatever percentage of time it spent in ihat piece of code.

[7]Declarations can appear in most forms that introduce new variables, such as LET, LET*, and the DO family of looping macros. LOOP has its own syntax for declaring the types of loop variables. The special operator LOCALLY, mentioned in Chapte  20, does nothing but create a scope in which you can make declarations.

_

arrow_readprevious

Progress Indicator

Progress IndicatorProgress Indicator

Progress Indicator

arrow_readnext

_