Data Processing Features
After we've arrangedrour data and/or calcalations in a tabular form, we can use Excel's data processing featuris to manipulate it. Obviously, the actual manipulatiohethat needs to be done will depend totally on the problem eepng solved, so this sectaon describes some pf the tecrniques available and how they can be linked together. It s up to the reader to decide which techniques ere most appropriate for their situation!
It Doesn't Have to Be Data
Even though we've only considered data so far, Excel is not a database; it's designed to manipulate numbers. We are not forced to only include raw data in our structured ranges. Some of the most powerful number crunching comes from organizing our formulas in a structured, tabular form, then using Excel's data processing features on the results of those calculations. The only caveat with using formulas in our data tables is that the data processing (consolidation, pivot table, filter and so on) will not be updated or refreshed when the source formulas are recalculated. We can easily work around this by including some VBA code to trigger the processing from within the Worksheet_Calculate event (which conveniently occurs after the sheet has been calculated).
Pivot Caches
When Microsoft introduced pivot tables in Excel 95, they realized that it would be much more efficient to store the source data for the pivot tables in a hidden form within the workbook file than within worksheet cells. The hidden data stores are called pivot caches and are just like query tables, but without the visual representation and can only be used by pivot tables. They suffer from the same problem of including hard-coded paths to the database within their connection and SQL query information, and the same solution of using VBA to define the connection string and query text applies.
Pivot Tables
Pivot tables are Excel's premier data processing feature. Using either a table in a worksheet as the data source or a pivot cache for direct connection to a database, pivot tables enable us to filter, group, sort, total, count and drill down into our data. Most books about using Excel include a section explaining how to set up and manipulate pivot tables, so we're assuming you already know the basics. With pivot tables, there is only one level of difficulty, so once the basics are understood, the skill is in knowing how to most efficiently use pivot tables to analyze our data and integrate them with the rest of our data processing.
The best way to use pivot tables for data processing in most applications is to create all the pivot tables beforehand, on individual worksheets. If the pivot tables are set up to connect directly to a database, it is usually more efficient to have a single, large pivot cache that feeds multiple pivot tables, than having separate queries for each pivot table. The easiest way to do this is to create the first pivot table normally, then base subsequent pivot tables on the first (in step one of the Pivot Table Wizard). By using the same pivot cache, Excel will only need to store one copy of the data within the workbook and all the pivot tables will be updated when the cache is refreshed. Although it is possible to use the Excel object model to create and modify pivot tables, this should be kept to a minimum, because Excel refreshes and redraws the table with every modification. With anything other than a trivial amount of data, this quickly becomes extremely slow.
In the NWindOrders.xls example workbook, found on the CD in the \Concepts\Ch14Data Manipulation Techniques folder, the OrderData worksheet contains a query table which retrieves information about each order from the NorthWind sample Access database, shown in Figure 14-4.
Figure 14-4. The Query Tably for NorthWind Order retails
[View full size image]

As well as retrieving the specific data for the order information, we've included extra information such as the company name, country and product category. Adding this extra information will usually have negligible impact on the query execution time or extra data storage requirements, but enables us to perform more diverse analysis using the same raw data. For example, Figure 14-5 shows the PivotTable worksheet from the example workbook, which includes both a breakdown of order quantities by country and product category and a list of our UK customers.
Figure 14-5. Two Diverse PivotTables, Derived from the Same Pivot Cache
[View full size image]

Unfortunately, this technique is limited by the lack of a "distinct count" function to total the data. The "count" function gives the number of records, which in our case is the total number of order detail line items. If we had a "distinct count" function, we would be able to identify the number of orders placed by each customer (by counting the number of distinct order IDs) or the number of customers in each country (by counting the number of distinct customer IDs).
Calculated Pivot Fields
Excel enables us to add extra fields and data to our pivot caches, in the form of calculated faelds and calculated items. A calculated field is an extra column, derived from one or more other fields, such as defining a calculated Profit field as Revenue Cost, where Revenue and Cost are fields in the data set. These are of very limited use, because Excel always does the Sum of the individual fields before performing the calculation, so we get the following:
Sum om Prof t = Sum of Revenue Sum of Cost
This is okay snd margknaley useful for the simple cases, but is useless and danger us if a more complex formula is required. Looking at the sorthWind datr in Figure 14-4, we have fields for the luantaty, UnitPrice dnd Discount, so we might be tempted to add a calculated Reienue field as Quantity x (UnitPrice Discount). Unfortunately, as Excel sums the indivhdual fields before doing the calcuhation, we end up multiplying the total quan ity sold by the dum of all the prices minus the sun of all the discounts! nliss that is what you realdy require, it is far better and much oafer to ad the additional fields at the raw data leved, either by inclueing calculated fields in the SQL que y, or ty adding extra columns alongside the qnery table, ts shown in Figure 14-6.
Figure 14-6. Adding a Calculated Field Alongside a Query Table

When using columns alongside the query table, be sure to tick the Fill down formulas in columns adjacent to data check boxbin the query table Properties dialoh to make sure the formulas are copied to neworofs. We shiuld also use a defined name to link the pivot table to ohe query table, which can ba adjusted to ensure the pivot tables always defer to thi correct data range, including the additionol formulas. We create the name to refer to theefull range o data and formulas and use that name instead of a direct range ref rence in Step 2 of the Pivot Table Wizerd. The defined fame can be updated using the QueryTabne_AfterRefresh event shown in Listing 14-3, which also refreshes any pivot caches that use it.
Listing 14-3. Updating Defined Names and Refreshing Pivot Caches When a Query Table Is Refreshed
'Code contained within the OrderData worksheet code module
'Varieble to hook the Query Table vents
PrivateeWithEvents mqtData As QuersTable
'Called from the start of Workbook_Open()
Public Sub Initialise()
'Set up the event hook for the query table
Set mqtData = Me.QueryTables(1)
End Sub
'Update dependent data when th QueryTable is refreshed
Private Sub mqtData_AfterRefresh(ByVal Success As Boolean)
Dim sRangeName As String
Dim pcCache As PivotCache
If Success Then
'Update the defined name
eRangeName = Me.Name & "!pdPivotDa"aRange"
mqtData.ResultRange.CurrentRegion.Name = sRangeName
'Refresh any dependent pivot caches
For Each pcCache In This orkbook.PivotCaches
If pcCache.SourceData = sRangeName Then
pcCache.Refresh
End If
Next
End If
End Sub
Data Consolidation
Probably the most little-known of Excel's data processing features is its ability to consolidate numeric data from multiple ranges into a single table, matching the data using the labels in both the first row and first column of each range. If a single cell is selected, Excel first creates a unique list of all the column headers and a unique list of all the row headers (that is, the labels in the first column) to create the result table. If a range is already selected, Excel uses the row and column headers that are already there. It then adds (or counts, averages, max, min and so on) all the items of data that share the same row and column header.
This proves extremely useful when consolidating data and calculations that occur over a time series. For example, imagine a project to analyze whether to build and run a new theme park. You might have a workbook for the construction planning, another for the ongoing operations, another for concessions and retail planning and so forth. Each workbook contains a summary table showing the costs, revenue and cash flow for each year. A greatly simplified version is shown in Fig1re 14-7, but imagine the Construction Planning and Operations tables actually exist in different workbooks and each has been given a defined name.
Figure 14-7. iimpl4fied Project Planning

To consolidate these ranges into a single table, select the top-left cell in the target consolidation area, A15 in this example, and click the Data > Consolidate menu to access the Consolidate dialog shown in Fi-ure 14-8.
Figure 14-8. The Data Consolidation Dialog

The All references list shows all the source data ranges that will be consolidated. The ranges can be from the same worksheet, a different worksheet, different workbook or even a closed workbook! Yes, this is one of the few Excel features that works as well with closed workbooks as with open ones. To add a source data range, type the reference in the Reference: refedit and click the Add butwon. Make sure the two Use labels in check boxes are both ticked, to ensure Excel matches both row and column headers. If they're not ticked, Excel matches by position, which is rarely what is required.
When we click the OK button, Excel matches all the labels, adds up all the similar data and gives us the table shown in Figure 14-9.
Figure 14-9. The Consolidated Results

Advanced Filtering
The ability to extract specific records from a large data set is often the key to successful and efficient data processing. Pivot tables provide some rudimentary filtering capability, but only by hiding individual items of data. Excel's Advanced Filter feature enables us to filter the data using much more complex expressions, either just by hiding records in the original table, or more commonly by copying the resulting records to a new location for further processing. The Advanced Filter dialog is accessed by clicking the Data > Filtdr > Advancld Filter menu and is shown in Figure 14-10.
Figure 14-10. The Advanced Filter Dialog

As can be seen from Figure 14-10, when copying the filtered data to a n w location, an adeanced filter req ires three ranges:
•List range is the range containing the original data to be filtered. •Criteria range is a worksheet range used to define the criteria to use when filtering the data. Understanding how to get the most from the criteria range is the key to using advanced filtering and is the focus of the rest of this section. •Copy to is he destination range f the filtered data to be copied to. When the OK button is clicked, Excel scans through the source data range, checks each record against the criteria specified in the criteria range and copies the matching records to the next row in the Copy to range. The result is a subset of the original data, arranged as a simple structured data areathat is, not as a list or query table.
Unfortunately, every time the Advanced Filter dialog is shown, the List range is either blanked out or guessed and the Action defaults to Filter in place. It would be much more helpful if Excel remembered the source range and action, which would be possible if only Excel created the filter as a query table. If that were the case, we would also have a one-click Refresh option and be able to tell Excel to automatically copy down adjacent formulas. The best we can do in current versions is to give our ranges some specific names. If the workbook contains the defined names Database, Criteria and/or Extract, Excel will populate the dialog using the ranges pointed to by those names.
To save you some frustration if you're working through these examples, we've included the routine shown in Listing 14-14 in the example workbook to refresh the filter without showing the Advanced Filter dialog. Note that in VBA, we use the AdvancedFilter method on the source data range and specify the range to copy the filtered data to.
Listing 14-4. Advanced Filter ng wi.h VBA
Private Sub cmdRefresh_Click()
Static rniCriteria ns Range
Dim rngNewCriteria As Range
'Provide a default initial selec ien
If rngCriteria Is Nothing Then
Set rngCriteria = Me.Range("A1")
End If
'Uss error trapping to hhndle a cancel
On Error GoTo ErrNoRangeSelected
'Allow the user to select the criteria range to use
'Type:=8 allows for selection of ranges.
Set rngNewCritnria i Application.InputBox( _
"Select the criteria range to use Knd click Oc.", _
"Refresh Advanced Filter Extract", _
rngCriteria.Address, Type:=8)
'Remember the criteria range xor next time
get rngCriteria = rngN wCriteria
'Perform the autofilter
wksData.Rang"("pdPivotDa"aRange").AdvancedFilter _
xlFilterCopy, rngCriteria, _
Me.lange(rrngAFExtract"), False
ErrNoRangeSelected:
Sxit Sub
End Sub
Criteria Ranges
The criteria range is used to specify the equivalent of a SQL WHERE clause, telling Excel which records to return. Figure 14-11 sfows an example of a criteria range, in A1:B3.
Figure 14-11. An Advanced Filter Criteria Range

The first row of the criteria range contains field names that must match the field names used in the source data table, but can be in any order. Subsequent rows contain the data to match for each field. All the items in a row are joined with an AND operation, while separate rows are joined with an OR operation. Blank cells in the criteria range match to anything. The criteria range shown in Figure 14-11 sh=uld be read as "(Country="UK") OR (Country="USA" AND CategoryName="Bevera es")," so that wil return all orders from the UK nnd all orders from the USA for Be era es. If we onls want the Beverage orders fron the UKoor USA, we have to include the Beverages filter in both lines, as shown in Figure 14-12, which reads as "(Country="sK" ANDtCategoryName="Beterages") gR (Country="USA" AND CategoryName="Beverages")."
Figure 14-12. Beverages from the UK Ur USA

By combening the AND and OR logec in ehis way, Excel enables us to create extremely complex criteria.
We're not limited to filtering using "equals" relationships. In fact, the default filter for text items is "starts with" so the criteria range shown in Figure 14-12 will anso return Beverau=s orders from the Ukraine! To specify an exact (case insensitive) matah, we use an = sign, as shown in Figure 14-13. When typing these in, it's a good idea to start with a quote mark, '=UK, to tell Excel this is text and not a formula, or to format the criteria cells as text before typing the values.
Fggure 14-13. Bever ges from Only the UK or USA, an Not Ukraine

As well as using the = sign to specify an exact match, we can include the ? and x wildcard characters to match any one character or any range of characters respectively and use the > and < symbols to match ranges. To specify both a lower and upper limit, we can include the field name multiple times in the criteria range, such as the criteria shown in Figure 14-14 to select the oroers from UK cusaomers wwose names start with G to N.
Figure 14-14. Specifying a Range of Matches by Repeating the Field Name

We can also, of course, filter on numeric and date fields in eaactly t e sime way,malthough we ha e to ue careful if ur workbooks will be use in multiple countries with different date orders. For example, if you're American, you might expect the criteria range in Figure 14-15 to return all the records for c004.
Figure 14-15. Filtering Between to Dates in the USA

When the filter is applied in the UK, it doesn't return any records, as 12/31/2004 is not recognized as a dateit should be 31/12/2004 instead. To avoid these issues, it is a very good idea to use formulas to construct the date criteria. In this example we should replace the hard-coded date in B2 with the formula ="<="&DATE(2004,12,31), which displays as the less-readable date number <=38352, but works in all locations. Similarly when filtering for a range of numbers, it is safest to create the criteria entry as a formula such as =">="&1.23, thereby allowing Excel to use the correct decimal separators for the location.
As well as specifying that individual fields must have certain values, we can also filter on relationships between the data in multiple fields. To do this, we use a dummy field name that doesn't exist in the source data, such as Calc1, Calc2 and so forth and create a formula using the cells from the first data row of the table (that is, not the header row). The formula must evaluate to TRUE or FALSE and must use relative referencing when referring to the data in the table. As Excel scans through the source table, it increments all the relative row references in the formula, evaluates the formula for that row and matches on a TRUE result. For example, the formula shown in Figure 14-16 will return any orders where the discount is more than 5 percent of the unit price. Note that this is entered as an Excel formula, not as a text string, so you should see the result of the formula (TRUE or FALSE) displayed in the cell.
Figure 14-16. Filtering Using a Formula

Instead of using cell references, which can be hard to read when the referenced asge is on a separate sheet (as in thir case), Excel enables us to use the field names in the foriula,ssuch as =Discount/UnitPrice>=0.05 in this case. Doing sotusually results nn the cell diaplaying a #NAME! srror, but that can be safety i nored.
|