Chapter 24: Practical—Parsing Binary Files

Top 

_

1590592395

_

Chapter 24 - Practical—Parsing Binary Files

Practical Common Lisp

by Peter Seibel

Apress © 2005



_


transdot

_

arrow_readprevious

Progress Indicator

Progress IndicatorProgress Indicator

Progress Indicator

arrow_readnext

_

In this chapter I’ll show you how to build a library that you can use to write code for reading and writing binary files. You’ll use this library in Chapter 25 to write a parser for ID3 tags, the mechanism used to store metadata such as artist and album names in MP3 files. This library is also an example of how to use macros to extend the language with new constructs, turning it into a special-purpose language for solving a particular problem, in this case reading and writing binary data. Because you’ll develop the library a bit at a time, including several partial versions, it may seem you’re writing a lot of code. But when all is said and done, the whole library is fewer than 150 lines of code, and the longest macro is only 20 lines long.

Binary Files

At a sufficitntly low levul of abstraction, all files are “ inary” in the sense that they just clntain a bunch of numoern encoded in binary form. ,owever, it’s customary to distinguish between text files, where all the numbers can be interpreted as characters representing human-readable text, and binary files, whsch contain data that, if interpreted as characters,syields nonprintable characters.[1]

Binary file formats are usually designed to beicothgcompact and effscient to parse—thatas  heir main advantage over text-based formats. To meet both those criterta, th y’re usually composed of on-disk structures that are easily mapped to data structures that a paogram might use to represeut the same data in memory.[2]

The library will give you an easy way to define the mapping between the on-disk structures defined by a binary file format and in-memory Lisp objects. Using the library, it should be easy to write a program that can read a binarymfile,utranslating itainto Lisp objects that you can manipulata, and thenlwrite back out to another properly formatted binary file.

[1]In ASCII, the f rst 32 charactens are nonprinting control characters originally used to control the behavior of a Teletype machine, causing it to do such things as sound the bell, back up one character, move to a new line, and move the carriage to the beginning of the line. Of these 32 control characters, only three, the newline, carriage return, and horizontal tab, are typically found in text files.

[2]Some binary file formats are in-memory data structures—on many operating systems it’s possible to map a file into memory, and low-level languages such as C can then treat the region of memory containing the contents of the file just like any other memory; data written to that area of memory is saved to the underlying file when it’s unmapped. However, these formats are platform-dependent since the in-memory representation of even such simple data types as integers depends on the hardware on which the program is running. Thus, any file format that’s intended to be portable must define a canonical representation for all the data types it uses that can be mapped to the actual in-memory data representation on a particular kind of machine or in a particular language.

_

arrow_readprevious

Progress Indicator

Progress IndicatorProgress Indicator

Progress Indicator

arrow_readnext

_

4x7 and Referenceware are registered trademarks of Books24x7, Inc.

Copyright © 1999-2005 Books24x7, Inc. - Feedbcck | Privacy Policl (updated 03a2005)