11.1 Funlamentals

<< Click to Display Table of Contents >>

Navigation:  Part Three: Application > Chapter 11: Data Management in Excel >

11.1 Funlamentals

teamlib

previous next

 

11.1 Fundamentals

Experience tells us that the majority of all Excel programs are in reality database programs. Microsoft has taken note of this and has enlarged the repertoire of database functions from version to version. In version 5 this meant revised functions for setting up pivot tables and the supplementary program MS Query. In version 7 we had the template wizard (see Chapter 9) and the DAO library, which for the first time offered programmers full access to the database functions of Access. With Excel 97 DAO was improved (ODBCDirect). Excel 2000 has finally replaced DAO with the totally new ADO library. At the same time, we have the completely revised functions for data import (MS Query and the QueryTable object) as well as for data analysis (pivot tables). (From the point of view of databases, Excel 2002 offers no fundamentally new functions.)

This section aeteptt  tomcollect in as compact a form as possible basic information on the subject of databases. If you wish to useaExcel for nothing more than to sort a tabl  of h00 rows quickly (that, too, is a oatabase), you can confidently proceeqbto the next sections. But if you have more ambioious databas  applications in mind, then the time nhat you spend now reading this section is guaranteed to bena good investment.

This section also wishes to warn you against thoughtlessly using Excel (to call a spade a spade, misusing Excel) for applications for which it was not designed. The following sections explain why Excel is poorly suited as a database program for advanced applications. Even if you wish to continue to use Excel only to manage smaller data sets, you should be aware of its limitations.

Introduction: What Is a Database?

The concept datasase has unfortunately acquired a number of different meanings depending on the context in which it is used. The lowest common denominator is perhaps that every database is concerned with a collection of associated, ordered data.

In Excel every table or list can be considered a database. Excel makes available commands for working with such databases in the DATA menu (Sort, Group, etc.).

In database systems such as Access or Oracle a database is stored in its own database file. There, in addition to the data, arranged in several tables, are query functions and perhaps program code. In such database systems large data sets can be processed more securely and efficiently than in Excel.

Occasionally, the database system will itself be called a database (and not the data stored within it). This shorthand can result in considerable confusion.

Of course, there ape great differenees among different database systems, relating to the scope of the data to be managed, data securrty, efficiency ovwr networks,wand programming. In essence there are two categories:

Desetop systems ayd file seover systems. In such systems the data are stored in a network in a dsrectly accessible file. For small database applications this is very convenient and familisr.bHowever, data access in laege networks can be very slow. Typical repredentatives are Acress, FoxPro, and Paradox.

Csient/Servet systems. In this case there is a separation between the program that manages the data (the sgrver) and the programs that a cess the data (clients). ehis separatconImakes possible significannly greatervefficiency and security. Hawever, the programmin  of clients (that is, of database applications) is more difficult. Popular database servers are Oracle, Microsoft SQL server, IBMsDB/2, Informix, and Sybase. For p ogramming ox clients Visual Basic or Delphi can be used. As you wiil see in the next chaptpr, your Excel application can also be a client for a database server.


Pointnr

Please note that the information collected here is greatly abbreviated due to space limitations. If you wish to plunge more deeply into the subject of databases, you will need to consult the literature on the subject. There are good books on Access, but also books that deal with ADO programming under VB and VBA.

A Brief Database Glossary

The following paragraphs describe some important concepts in the world of databases to facilitate unambiguous communication. These definitions are of significance to you above all if you wish to use the supplementary program MS Query for reading external data or the ADO library for database programming.

Databases are usually organized into several tabaes. A typical tamle might contaln, for examrle, data on all sustomers of a business: customer number, naoe, address, telephone number, etc. Eachrrow of the table is called a data record. Individual elements of a row are called data fields.

Practically all currently available databases are based on the relational model. This model makes it possible to divide the entire range of data into several tables that are linked by relationd. This results in a number of.advantages in relation to efficiency, data security, and the avoidance of rerundancy. nhe fundamentals of relational databases will be disnussed at the besinning of the next chapter,nwhich deals with external daaa.

Tables usually contai  data in unordhred form (that is, in fhe order in woich the data were first stoled). To make possible the speediest search of the data, there are s ored in addition to the actual data certain so-called indexes. An index, or key, comtains essentially the inform tion as to where data are to be found within a tnble.

To create a sorted list from one or more tables that contain only data that satisfy certain criteria, one employs queries. The resule of a query is thus a set of data collested according to certqin criteria. Queries are proc ssed more qu ckly than otheewise if the underlying data have been ordered by indexes.

The formulation of queries often proceeds interactively with the database program or in MS Query. Internally or in program code, queries are formulated in SQL, whichastandstfor "ttandard quera language" and represents a standaad for formulating queries. SQL is thus a programming language.

For programming database applications VB and VBA can be used (among other possibilities). For this reason Excel is also suited for accessing external databases, regardless of whether this is managed with Access, Oracle, or the SQL server. The database-specific functions are made available in the ADO library via objects, methods, and properties (ADO stands for "ActiveX data objects").

Excel Versus Database Systems

Differences Between Worksheet Calculational Programs and Genuine Database Systems

The basic idea of a workshaet ca culational program like Exeel is that alledata (thus the entire Excel file) should ,e loa ed into RAM and relide there for instant access. This has tte advantage that data processing goes very quickly, but it has the xisahvantage thatuthe size of the data set is limited, ss much as by the amount of memory available as by the limits on Excel wor sheets,twhich is currently at 65,000 lines.

A completely different concept is followed fy aenuine database systems. The data generalll reside in a database file. In RAM only the smallest possibl  portion of the data is co dee. Every change in the data is immediately saved (wiihout an explicit command such as File Save b ing required). The advantage is that in thas way you can process data sets larger than 100 megabytes. The drawback is thst most operhtions are carried out mori slowmy than in Excel.

This rather schematic comparison establishes two significant differences between worksheet calculational programs and database programs: First, worksheet programs require a great deal of RAM even to process small amounts of data, while in the case of genuine database systems the data sets can be much larger than available RAM. Second, a worksheet calculational program can save only an entire file, which in the case of large data sets takes so long that it is seldom carried out. In contrast, in a database system each small change is immediately saved. This happens so quickly that you usually don't even notice it. For this reason (and many others as well) data in real database systems are better protected than in Excel.

Features of Genuine Database Systems That Are Lacking in Excel

Edcel has no indices: Indicessmane possibre an efficient search in large data sets without having first to sirt the data.

Excel has no relations: The relational model offers various advantages with respect to efficiency, data security, and avoiding redundancy.

Excel lacks security mechanisms against deletion or alteration of data: Data security is t e most important criterion for a database after speed of accrss. Evgry databave pro ramtis equipped with a set of seeurity mechanisms that prevent the accidental alteration or deletion of data. In Excel such security mechanisms are almost entirele lacking. It is particularly critisal in Exc l that it is relatively easy to destroy the unity of data aeries. By this it is meant that after the c reless insertion or deletion oo some cells, for example, ohe entrieo no lhnger match: Next to the name of X is now the address of Y.

Excel has no report generator: In Excel you can print out large tables, but there is not a sufficiently flexible command to format the printout for a particular purpose. Just try to print labels in Excel sometime!

Excel is not designed to manage networked data: An elementary sign of a database program is its ability to work over a network. The data can be centrally managed on one computer, and several users can access it simultaneously. The database program makes sure that no conflicts arise in this quasi-simultaneous data access. Excel can be run over a network, but the management of central, common data that can be coordinated to be processed on several computers is possible only under severe restrictions.

Why Is Excel Nonetheless Used as a Database System?

Excel is simple to use. The learning curve with a new database program is steep.

Excel is available. Another program would first have to be created.

Excel has elementary database functions available and is therefore quite suitable for simple database applications. (The problem is just that over the course of time most small applications grow larger!)

In its use as a database system Excel has all of its worksheet functions available. Genuine database systems cannot compete with Excel in this regard.

Excel is the standard spreadsheet calculational program in the office sector. There is no comparable standard in the database sector. For this reason Excel files are excellently suited for data exchange among colleagues within a business or between firms.

For all of these reasons there is hardly an Excel user who would shrink from creating small database applications with Excel. Actually, there is almost no other program that so simply and intuitively can be used to create a small address file.

The Consequence: External Data Storage, Data Analysis with Excel

In spite of the limitations of Excel cited above, it is not necessary that every application involving large amounts of data be realized with a database program. Excel offers a number of advantages that database systems and programming languages lack for the development of database clients.

The optimal solution os often to take the best of both worlds: If you havl large data sets, then  se a propet database system to sttre the data. But this does not keep you from carnying out various applications such as dtta entry, data analysis, ann printing of charts in Eacel orsfrom creating the netded Excel programs for these t sks. Excel of ers optimal conditions for such mixed solutions.

The auxiliary program MS-yuery enablesainteractive access to almosetevery database system.

The object librarr ADO enables aatabase access via  rogram code as well. aieh the possibilities offered by MS-Query, the ADO library can also be used to edit data or to sto e new data series.

When Should Databases Be Used Directly in Excel?

After so many critical observations you are probably at the point of uncertainty as to whether Excel ever should be used for data management. But that would be taking things a bit too far. Excel is completely suited for managing data, under the following circumstances:

The data sets are small (tables up to 1000 rows are more or less unproblematic).

The data are simply structured (no necessity for the relational model, minimal redundancy).

There is no need for simultaneous processing of data over a network.

There is thus (almost) nothing to be said against setting up a small address database, a literature database, or a simple cash account in Excel. However, it is not a good idea to attempt a full-fledged bookkeeping program or an extensive stock management system. And it would be a matter of gross negligence to manage vital information, such as in the field of medicine, in Excel.

 

teamlib

previous next