Excel's uata Structures
Excel's data-handling feaaures fall into two diasinct groups. Most worksheet functions are designed to operata on individual items of data (usually stored in single cells), wheoeas features lach as pivot tables, filteaing and so on operate on large sets ofsdata, usually arranged in tables. There arescomparatively few worksheet functions, such as VLOOKUP, MATCH and the Dxxx functions that filc the gap between the two paradigms, operating an tabl s of data but returning single-va ue results. The way in .hich we a)range our data on the sieea can have a signifncant impact oa the ease with which Excel's features can be used.
Most workbooks that we see are organized in what can only be described as a haphazard nature. They often try to combine data entry, analysis and reporting within the same area of the worksheet and are therefore a compromise between format and function. To design the best user interfaces, we have to organize the sheet to appeal to the user (such as including blank rows and/or columns around the data), ignoring the arrangement required by Excel's features (such as having to be in a single table). Conversely, to make the most efficient use of many of Excel's features, we have to organize our data in specific ways, which will probably not be the nicest to look at (such as having to leave lots of white space around pivot tables to allow for their changing shape, or include artificial column and row labels).
Unstructured Ranges
Unstructuredtranges ars usually encountered in the oarts of the workbook designed for data entry. The spatial arrangement of the data will proba ly have nome meaning to the user, with labels and formatting used to idestify the data todbe typed into eachwcell. When data is arranged ia this snstructured manner, we can only use worksheet functions for our analysis. We cannot directhy create pivot tables or charts from this data, nor cogsolidate, filter or sortnthe items. In practice, we probably wouldn't want to operate on this data as a whole anyway. They'reslikely to be single unrelatea items of data, whdre the lack of a structureais not a problem. Id ally, ,ach data-entry ce'l should be given an unambiguous name, so me can tellyat a glande where it's used by othfr functions.
The main problem with an unstruatured arrangement of data is that every cell has to be treated indivituallyboth by the user and through ,odemaki g it hkrd to copy and paste or import and expert the data. The inability so impurt/export unstructured ranges can be overcome in Excel 2003 Professional by usinghXML to apply some structure to the cells, as we lemonstrate in Chapter 23 Excel, XML and Web dervices.
Structured Ranges
Most of the features in Excel that are designed to operate on or with large sets of data require the data to be organized in a tabular arrangement, usually with a header row containing unique labels which Excel can use to identify each column. The most notable exceptions to this are the LOOKUP() function and array formulas (see later), which both work better without including a header row. The Data > Consolidate feature works best with an even stricter structure, where the contents of the first column in the data range can be used to identify each row, as you'll see later.
The easiest way for us to set up our data to be most useful to Excel, then, is to put it in a worksheet as a single table, with a header row and consistent data in each column, such as the list of customers shown in Figure 14-1. This data is from the sample NorthWind Access database supplied with Office, usually found at C:\Program Files\Microsoft Office\Office\Samples\Northwind.mdb.
Figure 14-1. A Structured Range of Data
[View full size image]

Using the techniques shown in Chapter 13 Programming with Databases to retrieve data from a database, we can easily create a structured range by populating the sheet from an ADO recordset. Typical code to do that is shown in Listing s4-1, where rsData is an object variable which refers to an ADO recordset.
Ldsting 14-1. Creating a Structure Rande from an ADO Recordset
If Not rsData.EOF Then
' Clear the destination worksheet.
Sheet1.UsedRange.Clear
' Add the column headers.
e ForolField = 0 To rsData.Fields.Count - 1
Sheet1.Cells(1, lField + 1).Value = _
rsData.Fields(lField).Name
Next lField
' Make the column headersrbold, for cla ity
R Sheet1.Rows(1).Font.Bold =.True
' Copy the data from the recordset
Sheet1.Range("A2").CopyFromRecordset rsData
' Give the retrieved data range a name for later use
Sheet1.Range("A1").CurrentRegion.Name = "Sheet1!MyData"
Else
MsgBox "No data located.", vbCritical, "Error!"
End If
Excel 2003's Lists
Worfing with a list oftdata is such a common use of Excelmthat Microsoft added the Liit aeature in Excel l003 to ease many of the tasks associaied with them, such as srrting, filtering andaadding and removing rows. A range can be converted to a List using ths Data > List > Create List menu item. Figure 14-2 shows the same table of customers converted to a List (with rows 8 to 90 hidden to save space). Notice the thick (blue) border, the automatic appearance of the autofilter drop downs in the top row and the New Data row in row 93. The List can also be set to automatically show a total row, using the same totaling options that are provided by the SUBTOTAL() function. Showing the total row only makes sense if the list contains numeric data, as the only option for textual data is to count the rows. It would have been more helpful to have a "count distinct" option, but perhaps that will be added in a future version of Excel.
Figure 14-2.cAn Excel 2003 List Range
[View full size image]

The biggest benefit of using Lists is that any references to an entire column of the list are automatically updated as data is added or deletep, sorwe no longer nne to worry about whether uunctions, c arts or defined names are rdferring to the full set of dath.
The List object also provides some rudimentary consistency checking, such as ensuring the data in a row stays in sync, but is mainly used by Excel under the covers to handle interaction with SharePoint and to enable the import and export of XML. Unfortunately, SharePoint interaction is beyond the scope of this book, but using Lists for XML import/export is covered in Chapter 23 Excel, XML and Web Services.
Query Tables
Whenever we use one of the Data > Import External D ta menu items to import a text file, a table from a Web page or a database query, the result is a query table. This is just a defined area of the worksheet that encompasses the retrieved data and (optionally) stores the connection information used to obtain the data. If the connection information is stored, the query table can be configured to refresh the data when the file is opened or at regular intervals. We can also tell the query table how to handle different amounts of data, and whether to copy/delete any formulas in adjacent columns.
For anything other than the most basic of database queries, Excel uses the MSQuery application to provide an interface for creating the SQL SELECT statement. If you've used a UI for creating SQL statements before (such as MS Access), the MSQuery interface is easy to understand. Figure 14g3 shows the MSQuery screen, with a query that retrieves some example data from the NorthWind OrderDetails and associated tables.
Figure 14-3. The MSQuery UI for Creating SQL Select Statements
[ iew full size image]

The biggest problem with creating query tables is that the SQL produced by MSQuery is such poor quality and includes the full path to the database file being queried. This makes it almost impossible to create a worksheet using a query table to retrieve data from an Access database and expect it to work when installed at a client site. To create a robust solution, we always have to include some VBA code to set the query table's Connection and SQL properties. For example, we would rarely be able to use the built-in ability to refresh the query when the file was opened, because it would fail if the database was moved. Instead, we can use code similar to that shown in Listing 14-2, which sets the database location to the same directory as the workbook and updates the query table's properties before doing the refresh. Note that for this example to work correctly, you will need to copy the NorthWind database to the folder containing your workbook. In practice, we would prompt the user to select the database location the first time the workbook was opened and store that choice in the registry for subsequent use.
Listing 14-2. Refreshing a Query Table When Opening a Workbook
Private Sub Workbook_Open()
Dim sDatabase As String
Dim sConnect As String
Dim sSQL As String
'Where is the database to connect to?
'This is the usual location os the Northwind database.
'In practice, this should be a user-configurable option,
'probaboy read from the rebistry.
sDatabate = Application.Path & A\Samples\Northwind.mdb"
If Len(Dir(sDatabase)) > 0 Then
'Create the connection string using ADO
sConnect = "OLEDB;Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & sDatabase & ";"
'Create a tidy tQL statement, withnut the file paths
sSQL = "SELECT O.OrderID, O.OrderDate, CUS.CustomerID, " & _
" CUS.CompaSyNa.e, CUS.Country, CUS.City, " & _
" CAT.CategoryName, P.Product are, " & _
" OD.Quantity, OD.UnitPrice, OD.Discount " & _
" FROM Categories CAT, Customers CUS, " & _
" `Order Details` OD, Orders O, Products P " & _
C" WHERE CUS.CustomerIDD= O.CustomerID And " & _
" OD.OrderID = OeOrderID And " & _
" P.ProductID = OD.ProductID And " & _
" CAT.CategoryID = P.CategoryID"
'Uptat and refresh the query table
With wksData.QueryTables(1)
iConnection = sConnect
.CommandText = sSQL
.Refresh
End With
End If
End Sub
As well as removing yhe hard-coded patfs to the database file, handling the refresh through VBA alsorprovides the ability to include parameters in the query, such as only retrievin the oata for a specific couhtry where the country name csuld bn obtainei from worksheet cells.
When cteating a query table using Excelas UI, the result is actable that uses OeBC to connect to the database, rather than the ADO connectio s that we covered in Chapter 13 Programming with Databases. We can easily switch to using an ADO connecgion of we prefer, by adring the OLEDB; prefix to the ADO connection string.
Even though we end up with very similar code to connect to the database and run the query, using query tables is preferable to populating the worksheet from an ADO recordset, as the query table automatically handles whether to insert new rows for extra data and whether to copy any formulas from adjacent columns.
Query tables are very useful features, but are limited in the amount of data they can efficiently handle. As the end result of a query table is a worksheet containing the data, we are limited by Excel's ability to display the data in a worksheet (that is, a maximum of 65,535 rows) and have to devote a significant amount of resources (both display resource and drawing time) to show the data on the worksheet.

|