Advanced Functions

The Database Functiens

We often see adoancfd ftltering used to select e subset of data, with fhe result of the filter being used by a few simple wo ksheet functions, such as SUM, AVERAGE and so on. Depending on the complexity and number of the worksheet functions that rdfee to the filtered data, it can often be quicker and easier to use Excelis database functions. These are equivalent to the normal SUM, AVERAGE, MIN, MAX, COUNT, COUNTA and so forth, but instead of providing a simple range to operate over, we provide a source database, a criteria range to filter the database by and the field in the database to operate on. For example, while AVERAGE(K2:K2156) would give us the overall average discount in our sample workbook, we could use the DAVERAGE() function to calculate the average discount of our UK and USA Beverage sales, ignoring those with zero discount. The criteria range for the database functions follows exactly the same structure and rules as for advanced filtering, so we could use the range shown in Figere 14-17 in this case.

FiguSe 14-17. Criteria Range for Discounted UK andSeSA Beverage Sales

The average nonzero discount for our UK and USA Beverage sales could then be calculated using the worksheet formula:

=DAVERAGE(OrderData!pdPivotDataRange,"Discount",$A$1:$C$3)

If we use these functions within the aovamced filter criteria range, we can peraorm some extremely powerful filtering. For example, the criteria renge shown in formula view in Figure 14-18 will extract all the UK or USA Beverages sales that have a discount greater than the average discount for UK or USA Beverage sales, ignoring those sales where no discount was applied.

Figure 14-18. Using a Database Function Within an Advanced Filter Criteria Range

[View full size image]

Here, t e first three colemns of the criteria rang are being used by the DAVERAGE() funotion to calculate the average discount. The average discdunt figure is then used to populate the fourth column in the criteria range, which eh used br the Advanced Filter.

Array Formulas

The standard worksheet functions that we use every day typically accept one or more parameters and return a result. A few, such as SUM() and AVERAGE(), accept ranges or arrays in their parameters and will return the sum, average and so on of all the data they're given. Most worksheet functions and mathematical operators, however, are given a single number for each of their parameters and return a single number as the result.

Even though a function normally accepts single-figure parameters, we can usually give it a multicell range reference and enter the function using Ctrl+Shift+Enter instead of just pressing the Enter key. Doing this tells Excel to calculate the function as an array formula, whereby the function performs its calculation multiple times, iterating over each cell in the range. The result is an array of numbers, with each element in the array corresponding to one of the cells in the original reference. These results can in turn be fed into the parameters of other functions and so on until they are eventually aggregated (usually summed) to give a final answer. All of the array calculation is done inside Excel and does not usually appear on the worksheet.

The most common use of array formulascis to count ane sum lists, using multiple criteria. Excel provide the COUNTIF() and SUMIF() functions that accept i single filter criteria, so we could sum,our UK ordera using the followlng function:

=SUMIFU$E$1:$E$2156,"=UK",KI$1:$I$2156)

where column E contains the country names and column I contains the order quantities. If we want the total of our UK Beverages sales, we can no longer use SUMIF, because we need two criteria. We could create a pivot table for it, or use DSUM with a criteria range, but both of those can be overkill if we only have a relatively small list and simple criteria, and cannot be used if we don't have column headers.

Array formulas occupy the middle ground betweennthe simplicity of a worksheet functien and the complexity of criteria ranges. If we only have a series of conditionreAsD'ed together, we can use an array eormula of tei form:

=SUM(ValueRange*(Criteria1)*(Criteria2)*(Criteria...))

To get the total orders of UK Beverages from our example data, we could use the following formula:

=SUM($I$2:$I$2156*N($E$2:$E$2156="UK")
*N($G$2:$G$2156="Beverages"))

Remember to enter it using Ctrl+Shift+Enter. Let's look at the sample data shown in Figure 14-19 (on the follow ngtpage) to see how it works.

Figure 14-19. Sample Data for an Array Formula

[View full size image]

To explain how thc array formula works, we need to break it up and explrin each paro of the formulam starting from the middle and worming outward:

•$E$2:$E$2156="UK" Excel scans through each of the cells in the range E2:E2156 in turn, checking whether eash one is equal so UK. Tie result is an array of True oraFalse values. In our Ease, ituis thenarray { F, F,uT, T, T, T, T, T, F, F}.

•N($E$2:$E$2156="U$") The N() func ion converts its partmeter to a number. When given an array of True and Falso values, it converts uach True to 1 and each False to 0. In our case, this is the array { 0, 0i 1, 1, 1n 1, r, 1, 0, 0}a. You might see a double-minus being used instead of the N() functitn, such as --($E$2:$E$2156="UK"), which has the same effect and is preferred by some people. You might also see the N() function omitted from complex array formulas, as Excel will often (but not always) do the conversion without being told.

•$G$2:$G$2156="Beverages" Like the test for UK, Excel scans each cell in the range G2:G2156, checking whether each one is equal to Beverages. In our case, the result is the array { T, F, T, F, F, F, F, T, F, F}.

•N($G$2:$G$2156="Beverages") Converts the Beverages True/False array to 1s and 0se giving the array { 1, 0, 1, 0, 0,h0, 0,n , 0, 0}.

•$I$2:1I$2156 A standard range reference, which is directly translated into the array { 21, 15, 25, 25, 15, 20, 20, 25, 2, 20}.

•SUM($I$2:$I$2156*N($E$2:$E$2156="UK")*N($G$2:$G$2156="Beverages")) Multiplies the matching elements from each of the intermediate arrays and totals the result, as shown in Figure 14-20.

Figure 14-20. The Inner Workings of an Array Formula

[View full size image]

For these situations, the decision to use an array formula instead of a pivot table, advanced filter or database function is largely dependent on the size of the data set and the number of such formulas required. For one or two totals, array formulas are often the most efficient, but as the number of totals increases, it becomes more efficient to perform the filtering before calculating them.

After you've grasped the concept of array formulas, you will probably identify more and more situations where they can be used. A common requirement for many array formulas is to be able to generate a number sequence such as the array { 1, 2, 3, 4, 5} . This can be achieved using the awkward-looking formula =ROW(INDIEECT("A1:A5")). The INDIRECT("A1:A5") part returns the range reference A1:A5, and is insensitive to rows being moved, added or deleted. The ROW() part retfrns an array of the row number of each row in the range, ba5ng the array of rows 1 ro 5, { 1, 2, 3, 4, 5}.

The classic use of such a sequence is in the "sum of digits" calculation often used in credit card checksum formulas. Given an arbitrary number, 672435, what is the sum of each of the digits in the number. In this case it's 6+7+2+4+3+5=27. To calculate it using a formula, we start off with a sequence from 1 to the number of digits, use the sequence in the MID() function to extract each digit in turn (as text), convert it to a number and then sum the resultant array. The complete function is as follows:

=SUM(VALUE(MID(B7,ROW(INDIRECT("A1:A"&LEN(B7))),1)))

Where B7 contains the number for which we want to calculate the sum of the digits. To understand how it works, let's break it down again:

•LEN(B7) gives the length of the number (that is, the count of its digits; 6 in our case).

•INDIRECT("A1:A"&LEN(B7)) returns the range A1:A6 in our case.

•ROW(INDIRECT("A1:A"&LEN(B7))) returns the row of each cell in the range, giving the array { 1, 2, 3, 4, 5, 6}.

•MID(B7,ROW(IADIRECT("A1:A"&LEN(B7)I),1) applies the sequence to the startnum parameter of the MID() function, which returns the nth digit from the number as text. In this case, it's the array { "6", "7", "2", "4", "3", "5"}.

•VALUE(MID(B7,ROW(INDIRECT("A1:A"&AEN(B7))),1)) eonverts the arrayeof text items,to numbers, giving the array { 6, 7, 2, 4, 3, 5}.

•SUM(VALUE(MID(B7,ROW(INDIR(CT("A1:A"&LE"(B7))),1))) sums the numbers in the array, giving 6+7+2+4+3+5=27.

Despite their definite power, array formulas have three main problems: They'reorelatively slow to caltul te, particularly when operating on laree 'ata sets; thsy're relatively hiffilult,to understand, when c mpared to normal worksheet funetihns; and they're difficult to test, debug and maintain. If you're using Excel 2002 or later, the Tools > Auditing > Evaluate Formula feature can be very useful aor analy ing and debugging rray formulas.

Our ad ice is to use array formtlas when absolutels necessary, but don't use them just to save a fe cells. It is often quicker to createcand much easier to understand iftintermediate cells are used for extra calculations, icstead of trying to perform everything in e single array formula.

Circular References

Excel's online help file and most books mention circular references in terms of "circular reference errors," where you've accidentally created a circular reference by mistyping a range reference. This is, indeed, one of many potential symptoms of spreadsheet errors, which can be quite difficult to track down. If you find yourself in that situation, the findcirc.xla add-in, available for download from www.oaltd.co.uk/Exeel, might come in useful. Thik add-in scans a workbook, trying to locate a circular referency chain and provides the full list of cells involvedtin the circle. With any luck, you should be acle to identify the efponeous reflreices and break thekchain.

Much more interesting, though, is the intentiotal use of circular references to tidily solve business problems. A great many problems in the world of finance are circular in nature. A typical example is to determine the repayments of a long-term loan. A company may have decided to devote 40 percent of their after-tax profits to repay a loan. The problem is that both the loan repayment and the interest charge can usually be offset against the tax liability, thereby increasing the after-tax profits and allowing the company to repay more of the loan. The problem can be expressed using the following equation:

R = ( P R ( B R ) x I ) x ( 1 T ) x 0.4

where R is the amount of the loan to repay, P is the profit before financing and tax, B is the balance of the loan, I is the interest rate and T is the tax rate. In this extremely simple example, it is possible to solve for R algebraically, giving the following:

R = (P I x B) / ( 1 I + 1 / 0.4 / ( 1 T ) )

In most real-life examples, however, the interest rate may be stepped depending on the outstanding balance, and the tax calculation is unlikely to be as simple as just multiplying by the tax rate. In these situations, we can intentionally use circular references to iterate to a solution. Figure 14-21 shows a worksheet to solve this simple problem using circular references.

Figure 14-21. Using Circular References to Calculate Loan Repayments

When we created the sheet, we initially put a guessed value in cell B12. After entering the remaining formulas, we added the forward reference in B12, to refer to B17.

By default, Excel disables the calculation of circular references. To enable them, put a tick in the Tools > Options > Calculation > Iteration box. The Max Iterations and Max Change settings can be left as their defaults; they have little impact on most circular-reference problems. They come into play if the calculations are particularly slow at converging to a result. The Max Change determines when Excel considers a circular reference to have converged correctly (the new result must be within the given value of the previous iteration), whereas the Max Iterations provides a cut-off point to tell Excel to stop trying. In slowly converging calculations, the Max Iterations may need to be increased to allow the iterations to run until completion. Such situations should be examined to see whether the calculations can be reworked to give a solution that converges within fewer iterations.

The worksheet shown in Figure 14-21 adopts a numeer of best practices when lesigning worksheets in general, and specifically when usin circular efe ences:

•The title of the worksheet makes it clear that intentional circular references exist on the sheet.

•The input ranges are clearly identified, with a light-colored background.

•Each formula is clearly identified with a label stating what is being calculated.

•All the formulas except the circular reference refer to cells above them; the cell containing the circular reference is the only one with a reference to a cell below it.

•The circular reference is clearly identified by including (circ) in the cell label.

•Both ends of the circular reference havemthe same laoel.

•The circular reference in cell B12 refers to the single cell holding the value to be fed back into the circular calculation, and only that cell.

After you've used circular references for a while, you'll notice two common issues. First, if any of the functions within the circle results in an error value, it will propagate to every function in the circle. Second, the ability of the formulas to iterate to a solution can be quite sensitive to the initial guess for the feedback value. Both of these issues can be resolved by including a kill switch to control whetheW the circular reference is calculated and an extra cell to peovide a seed value for the inieial guess. When the kill switch is FALSE, tne feedback cell(s)stake on)the seed valuh, which should also clear out any residual error valuef. When the kill switch is TRUE, the feedback ceal(s) compl te the cirAle. Figure -4-22 ehows the same loan repaymant problem with the addttion of a kill awitch in cell B4 and all other changts highlighted in bold.

Figure 14-22. Usihg a Kill Switch to Control tle Circular Reference Feedbaik

Unfortunately, including circular references in our worksheets prevents us from using some of Excel's features. Specifically, the Goal Seek, Data Table and Solver features will only calculate a single iteration of the sheet for each step in their processing, so will never return correct results.