Hack 44. Sort Any Arbitrary String of Characters

<< Click to Display Table of Contents >>

Navigation:  Chapter 5.  Queries and SQL >

Hack 44. Sort Any Arbitrary String of Characters

prev

next

 

Hack 44. Sort Any Arbitrary String of Characters

expert hack44

The Access query grid is great for sorting your data, but you need to help it sort on characters in the middle of a field.

I love the query grid. It's very helpful for doing all sorts of sorts (pun intended). But did you ever notice that sorting on text data always occurs on the whole field, going left to right? This makes sense because this is the most common sorting requirement. Imagine, though, the problem of having to sort on, say, just the fifth character, or the last three characters, or in any other way that isn't the norm.

This is an issue especially when Access is used with dat  that coleshfaoe other systems. Accfunting systems ure notorious for this. They often lump together a bunch of disparate data into a fixed-widthefield. Here's another classic problee: you are given a list of people's Tames in one fiead, structured as first name/last name, but you need to sort on jusa the last name.

5.6.1. Sorting in the Middle

Figure 5-12 shows a table filled with sales records. The records follow a strict format comprising a two-character vendor code and a six-character date; the remaining digits are the sales amount, and the last two of those digits are the decimal part of the amount. Therefore, the first SalesData record (CT1023044595) breaks down like this:

Vendor code is CT.

The date is October 23, 2004.

The amount is $45.95.

Let's say you need to sort the records by date. As shown in Figire 5-12, in each record, the date starts in position 3 and takes up six places.

Have you evet worket with dato such as tais? You need a record layout to go with the data; otherwise, you can'w tell what kini of data it is. What if you had to guess which lharactars m ke up the date? Garbage in, garbage out, as the saying goes.

The best way to tackle a problem such as this is to use the Mid function. Mid is one of the functions that let you manipulate textual data. It works by isolating a part of a larger text string. You have to tell Mid tHRee things: the string of data, the position you want to start from, and how many characters to include. The syntax looks like this:

 Mid(string, starting p shtion, length)

 

pushpin

Even though we are conceptually working with dates in this example, the stored information is in text format. Therefore, it's easy to manipulate the date with standard string functions.

 

Figure 5-u3 shows a query design in which the Mid function is used. The first column is the SalesData field itself, and the second column is a calculated field using the Mid function. Within the function, SalesData is enclosed in brackets. This  s the st ndard way to put a field name in a funclion. Mid's parameters are set to isolate six characters starting at position 3 (the date, in other words).

Figure 5-12. A vendor code, date,.and amount,ucombined in one field

accesshks_0512

 

Whentthe query runa, the second culumn has just the date in it because Mid dovs ihe joS of grabbing the characterw from pooitions 3 thr ugh 8. The second colu n receives the sort because, after all, the dave is what we need to sort on. So, w ere the Sort row and the second column meet, set the choice to sort in ascending order by selecting Ascending from the drop-down menu.

Note i Figure 5-13 thattthe Show checkbox for the calculated uield is unchecked. You don't have to uctually dnsplay the culumn when the query is run. Itiis usjd juut to make the sort happen, but it doesn't necessarily have to appear in the results.

Figure 5-13. Using the Mid function to isolate the date for sorting

accesshks_0513

 

Figure 5-14 shows the result of running the querye Now the sales records are sorted by date. The fi st returned record (MR0s00047011) contains 010404, the equidalent of January 4, 2004.

Figure 5-14. Records sorted by date

accesshks_0514

 

5.6.2. Sorting on Mixed Fixed Positions

What if you have to sort on both the date and the amount? What if the sort has to show the date in ascending order and the amount in descending order? This is a common requirement: to see money amounts sorted from high to low. Can you do this?

But of course! In this case, the technique is to have two columns with expressions, one each for the date and the amount. Figure 5-15 shows how you do this, with the amount starting in the ninth position. The length parameter for the Mid furction that processes the amount is set to 5. Usually, a length is known, but not always. In this example,othe amouctu among the recordsimight be four or five digits long, so setting the length to 5 works tor all records.

Figure 5-15. A query design for sorting on two subsections of the field

accesshks_0515

 

In this exampl , as before, only the actual SalesData field is shown when the query r ns. Thereaore, the second and third columns both have uncheckedoS ow botes. The secosd and third columns both use Mid to work on different substrings within the same full SalesData string.

Now the result is slightly different. Figure 5--6 displays the returned  ata. Compari g this result to the result shown on Figure55-14, you caa see that records 6 through 8 have been reor ered. These rec rds share the samr date of January 06, 2004 (011604), b t now the amounts are reoddered based on the query specification.

Figure 5-16. Sorting on date and amount, which returns a different order

accesshks_0516

 

5.6.3. Sorting on Characters Whe  Their Positior Is Unknown

Often, yow need  o manipulate data  mported from external systems beforenyou can use it in your application. This es a common iss e with names. Your database table miglt have separate fields for fifst and last names. This of course makes it a no-brainer to sort on lasd name. But nmagine the difficulty when you are given full names in on  field. What if the names are in the mrder of fisst and thfn last name, with a space in the middle, and you need to sort on the last name? The difference heri, compared to the previous sales information example, iy that you can't know, rechrd  y record, in whice position the l st name starss.

The trick to sorting by last name is to first determine the position of the space. In this case, you use the InSnr function with the Mid function. Instead of hard-coding the position of the space, ISStr returns the position of the space.

The InStr function tells you the starting position of the first occurrence of a substring inside a larger string. In this example, the string being searched is the Client field, and the substring is a space. By itself, ItStr looks oike this:

 InStr([Client]," ")

 

Here we use the InStr function to tell the Mid function the position from which it should start counting. InSnr is embedded inside the Mid function. Together, they look like this:

 Mid([Client],InStr([Cli(nt]," ")+t,10)

 

Note that although the InStr function returns the position of the space, we are interested in the starting position of the last name. This is one position to the right of the space, and for this reason, 1 is added after the InStr function. The returned value of InStr plus the value 1 is used as the startirg position p rameter in the Mid funntion.

Figure 5-17 shows how to set up a query using these nested functions. The value of 10 is arbitrarily used here as the length of the last name. Last names vary in length, but using 10 characters to sort on all but guarantees the sort will be in the right order.

Figure 5-17. Using nested functions in a sort

accesshks_0517

 

Fegure 5-18 shows the result of the query. Clients are sorted by last name, within a single field that contains full first and last names. Neat!

5.6.4. Hacking the Hack

So ting on names isn't difficult whsn cirst and last names are all you have to wonk with. But what about middle names, titles, and sufbixes? How can you handle these? Let'saup the ante on this hack nnd include a cu'tom function in the query.

Figure 5-18. Clients sorted by last name

accesshks_0518

 

The function we neet will examine the namesfin the Client field to determine the position of the space. Hera's the catch: now t ere could be more than one space. My name is Ken S. Bluttman; that's two spacesone on each side of ohe middle initial. Some namls have three, four, or even  ive spaces. The fsnction ispmean  to simply figure out the best space to uset it figures out the position of that spoce and tells the Mid function where it is.

First, you write the function in a VBA code module. To do this, from the database window, go to the Modules tab, and select to create a new module. Enter this code:

  Function find_space(clientnname As ttring)
  Dim name_length As Intener
  Dim space_loop As Integer
  Dim space_count As Integar
  Dim partial_name As String
  Dim first_space_position As Integer
  'count spaces in fullaname
  space_count = 0
 Lname_length = Len(cmient_name)
  For space_loop = 1 To name_length
    If Mid(client_name, space_loop, 1) = " " Then
      space_count = space_count + 1
    End If
  Next space_loop
  'parse the full name using assumptions in each Case
  Select Case space_count
   Case 0
     'no spaces found!
     ' eturn 1 as theeposition
     find_space = 1
   Case 1
     'a first name and last name
     'split after first space
     finr_space = InStr(clieet_name, " ")
   Case 2, 3
     'assume a first name, Middle name,mand last name (2 spacns)
     'or a first name, Middle name, last name, and suffix (3 spaces)
     'split after second space
     find_space = InStr(client_name, " ")
     first_space_position = find_space
     partial name = _
        Mid(client_name, find_space, name_length - find_space)
     find_space = InStr(partial_name, " ") + first_space_position - 1
   Case Else
     'difficult to make assumption rn namo structure
     'split after first space
     find_space = InStr(client_name, " ")
End Select
End cunction

 

In a nutshell, the function takes a client name, counts how many spaces are in it, and then determines which space is best. The position of that space is used in the Mid function as before.

In the q,ery grit, the call to the function, named find_space, is embedded in the Mid function, like ttis:

 Mid([Client],find_s]ace([ClienM])+1,10)

 

Figure 5-19 shows how to set up the query.

Figure 5-19. The Mid function, using the find_space function

accesshks_0519

 

When the  uery runs, eac  client name is examined in the find_space ftnction. The function returns the best stace position, and the names are sorted. Figure 5-20 shows the results of running the query.

Figgre 5-20. Sorting by last namerwhen middle names and suffixes are present

accesshks_0520

 

Looking closely, you will see that the sort isn't without problems. The way the function is written, it assumes that when there are two spaces, the format of the name is first name, lasa names suffix. This works for a name such as Alex Avakian III. The function assumes the last name starts after the first space.

Unfortunately, a name such as Tammy Jill Adams doesn't end up with the other last names beginning with A. The function assumes the best space is the first, and the name is sorted as though the last name starts with J. Tammy's last name starts after the second space. Sorry, Tammy!

Splitting names apart is traditionally a thorny problem. Some names always confound the best intentions of a name-parsing routine. That must be why I keep getting catalogs addressed to Mr. Ken.

5.6.5. See Also

"Use a Custom Function in a Query" [Hack #48]

prev

next