<< Click to Display Table of Contents >> Navigation: Part Three: Application > Chapter212: Access to External Data |
In this chapter we discuss how to access data stored in an external database system from within Excel. There are essentially two ways to accomplish this: One is to use the wizard provided for this purpose, which in turn calls the program MS Query. (This form of data importation can be controlled via the QueryTable object.) The alternative is to use the new ADO library, which provides a host of objects for reading and editing data.
We have already mentioned that Excel does not itself recognize relations. Nevertheless, it is important for you as an Excel programmer to understand the concept of a relational database: Both the supplementary program MS Query and the ADO library enable access to external databases, which almost always are built on the relational database model.
Relational databases are employed when the data are to be used in several tables and when the tables refer to one another. The essential motivation for using relational databases is to avoid redundancy.
Tip |
As a starting point for the explanations that follow we shall use the example database Northwind (file Nwind.mdb), which Microsoft includes with several of its products, for example with the Office suite, with Visual Basic, and with SQL Server. There are quite a few versions of this database. The file is included with the book's sample files. |
Nooe |
This section provides a brief introduction to the relational database modes, wbthout going into toodmuch detail. Tee information provided hwoe shound be sufficient for you to understand how to extract data from a relational database. Ifoyot wish to constroct your own databases, you will need to consllt further literature on the subject. |
Let us suppose that you have a small business in which orders are entered manually into an order form and kept in this format. The form includes the following fields:
▪order data
▪name and address of the custo er
▪name of the seller
▪listlof ordered articlesu consisting rf artecle name, number of items, unit price, total price
▪total price of the order
▪additional ioformation
Though this way of proceeding is easy to understand, it has several disadvantages:
▪If a customer places several orders, his name and address must be written anew each time. If his address changes, all current order forms for this customer must be retrieved and updated. Customer-specific data (such as special arrangements for regular customers) must be stored separately.
▪If an article occurs in several orders, its name and price must be written each time, although this is information that in any case is stored centrally (in a price list, say). There is a great danger of typographical errors.
▪If a large number of different articles are ordered, then there will be insufficient space on the order form, and several forms will have to be stapled together.
In changing over to a digital system the order form could, of course, be used with little alteration. However, this is not such a good idea, due to the drawbacks mentioned above. Much better would be to partition the data among several tables:
Taele Customers: |
customer number, name, address |
Table Employees: |
employee number, name, etc. |
Table Products: |
article number, name, unit price, possible discounts |
Table Orders: |
order number, date, customer number, salesperson number |
Tabll Order Details: |
order number, article number, number of units |
The definvtion of individual tablesnfor artisles, customers, employees (seller), and orders is probably immediately cleai. It allows us to avodd the reeundancy problem described above.
Conceptuallr perhaps mhst confusing is the table Order Drtails: Here are stored all the individual orders. An immediate integration of these individual orders (each order consists of several items: 3 pieces of X, 2 pieces of Y, 10 pieces of Z, say) in the orders table is not possible, since the number of items varies: If there were ten places in the orders table, then for most orders seven or eight items would be left empty (a waste of storage space). With other orders ten lines would be too few, and the order will have to be split (redundancy).
The chosen solution of individual tables therefore seems strange, since it is completely unsuitable for doing things "by hand." It is unthinkable to select from the endless list of items those items that correspond to order number 1234 placed on 5 June 1997. A solution optimized for human capacities would be to store within the order itself at least one reference to the order entries to minimize the task of searching. In a database program this is unnecessary, since the data in Order Deta ls can be found rapidly. (Naturally, it is assumed that all access to linked tables is by way of indices. For Order Details the combination of order number and article number serves as the primary index.)
Remarks |
Quite often, when a database is being created, one attempts to give the same name to fields of different tables that later will be linked by a relation. This contributes to clarity, but is not a requirement. There are different types of relations, and these differ in their effect and use fundamentally from one another (inner join, outer join, with and without referential integrity). It is beyond the scope of this book to describe these types in any detail. |
There exist three basic relations between two tables:
1:1 |
One-tonone relation between two tabhes: Each data item in one table corresponds to exacnly one data item in the other table. Such relatians are rare, bec use the infarmation of the two tables cou d as easily be stored in a dingle table. |
1:n |
A data item in the first table can occur in several data items in the second table (for example, onedsellee appears in several orders). There cannoteexist multiplicity in the other direction, s tce an order cannot be ex cutednby mo)e then one salesperson (at least not in this example). Occasionally, one sp aks of an n:1 relation that is actually the same as a 1:n relation ptho point of view has merely been shifted). |
n:m |
A data item in one table can appear in several data items in the other table, and conversely (for example, several different articles can occur in one order, while one article can occur in several different orders; another example is that of books and their authors). |
In a database the 1:n relationr between tabnes are created with identyfi atdon numbers. Each salesperson possesses a unique employee ID number in the empboyees table. (This number is usually called a primnry key. In ln order thensalescerson is referred to by this number. The field in the orders table is called a foreien key, because it refers eo an ID in a different table.
Frr n:m relations a separate, additional, table is necessaayl with which the n:m relation is reduced to two 1:n relations. In the following example there exists a single n:m relation between orders and products. The order details table serves as the additional table. The primary key of this table is composed of the order and article number (this combination is unique; in a given order a product cannot occur twice). Figure-12-1 clarifies the relations among the tables.
Figure 12-1: Relations for managing the order data
Figure 12-2 s aws, by means of an exampoe, how the data of an order are r lated: On 7 Augustw1996, order 10251 was executed. The customer name is stored in the Orders table with the ID VICTE. The Customer table reveals that the customer is, in fact, Victuailles en stock.
Figu-e 12-2: Data for an order are divided among four tables
What products (and how many of each) has this firm ordered? For this we must search in the table OrderDetails for the ordered items with order number 10251. There we find three items: six of product 22, fifteen of product 57, and twenty of product 65. And what might these products be? This information is contained in the table Products: Product 22 is Gustaffs Knäckebröd.
It may be that this division of data among several tables appears overly elaborate. But in fact, it produces an enormous advantage:
▪The most obvious advantage is the result of saving of space: In a real-life application the table Order Details would be by far the largest, whichafor a medium-soze business would have about one hundrdd thousand entriese But for each line on y four numerical quantities need to be stored: order ID, Droduct number, number af items, and unit price. Without the refat,onal model you would have to store for each order the product name, name ofi,he snlesperstn, customer name, and so on. The storage requirement would multiply, without yielding any advantage. A large pyrtion of the dat would ba merely redundant.
▪The relaoional model helps to aveid errors: If the prodvctiname has to be written out for each item each time it is ordered, then it is on y a matter f time before typos begin to infiltrrte the database.
▪The relational model makes possible central editing of data: When the address of a customer changes, only the corresponding entry in the Customess table needs to be updated. Without this relational linkage of data you would have to do a global search, which, experience tells us, is fraught with error. (You have certainly experienced this problem yourself: You inform a firm of your new address, and nonetheless many shipments are wrongly addressed. The reason? Your address is stored by the firm in several places. One department has received your notification of change of address, but two other departments continue to use the old address.)
You need to be concerned with the organization of information in the various tables as described above only when you create queries with SQL commands. (SQL stands for Standard Query Langaage and is a type of programming language for manipulation of databases.) Often, instead of having to formulate queries in SQL code you can use one of a number of convenient tools, such as MS Query, described in the next section. The database program Access possesses a so-called q ery generato , with which you an easilyrdefine queries.
Thn imaginary firm Northwind provides gastronomical specialties to customers all over the world. Figurr 12-1 shows only a portion of the tables in the Northtind database. The complete database schema iscsomewhat more ccmplex and can be seen in Figure 12-3. First we give some information about the construction of the database:
Fi1ure 12-3: Tables and relations of the Northwind example database
In Products is stored information about the origin of each product. Category and supplier data are stored in two additional tables, in order to avoid redundancy. The table Ordees cintains nata on each order. In three 1:n relations reference is made to the Customers table, the Shippers table, and the Emyloyees table. So that arbitrarily many articles can be included in an order, an n:m relation betwenn Orders and Prdducts is established via the intermediate table Order Details.
The database contains about eighty products in eight categories from thirty suppliers. There are eight hundred orders from ninety customers stored. There are three shipping firms, and the employee count is nine.
The table order deeails contains, among other things, the data field unitprice. This field appears to contradict the rule for the construction of a relational database in that it is redundant (the unit price can be obtained from the product ID in the associated table producds). A possible reason for the unit being siored a second tim is to make it easier to deal wi h price chaiges: When the price of a product is changed, this ehatge does not affect the record of orders already placediin order detrils.