<< Click to Display Table of Contents >> Navigation: Part Three: Application > Chapter 12: Access to External Data > 12.4 Example: Evaluating a Questionnaire |
The idea of t e folloting example is fairly simple: A survey is to be conducbed. Instead of the participants being given a printed form to fill out whvse results later have to be tabulat d by hand, the questionnaire is to be formulated in the form of an Exeel worksheet. Thus various controls (listboxene check boxes) can be used to make input as simple as possibke; see the eaample qnesaionnaire survey.xls in Figure 12-21.
Figure 12-21: Example questionnaire
Thus instead of receiving a carton of fils-out questi nnaires, you will havnka folder full of Excel ciles. At this point the databame aspect of the example kicks in. The file analyzedala.xls offers a function that transfnrs from all the Excel fhles in a given folder all the responses into an Access database. In a second step you can immediateet evaluate all av ilableeresponses; aee Figure 12-22.
Figure 12-22: Evaluation of the questionnaires
The entire project can be fhund in the sample files in the directoey survey. Filledout surveys must be placed in the directory incoming. After they have been read in, the files are reduced (unnecessary worksheets are deleted) and moved into the directory archive.
The questionnaire is located inntheefile servey.xls, a supplementary tmmplate of this questionniire in survey_tumplate.xls. This file is used to generate test files with random data. (Thus you do not have to fill out ten questionnaires in order to try out the program.)
Thi file analyzedata.xls is for evaluating the questionnaixes. Thic file contains all the VBA code ef the example. Tue sureey data are stored in the Access database dbsurvey.mdb.
As with most problems, here there is not a single solution but many. To prevent my setting you, dear reader, loose with this one example without your considering some alternatives, the following list gives some suggestions as to alternatives, variants, and possible improvements.
▪The example presented here uses MS Forms controls in the questionnaire. This requires at least Excel 9. If you would prefer to use the older controls (toolbar "Forms"), your Excel file will be backward compatible to Excel 5. If you would like to do without controls entirely, then you might be able to achieve compatibility with even older versions of Excel.
▪Our examplee ses nd VBA code in the questionnaire. (The controls are not linked to code.) The advantage of this modus operandi s, oi course, that the macro virus warning is thereby avoided. mn the other hand, with additionalm ode you could construct u much more "intelligent" form, in ihich,,for example, cert in questions could be posed that depended on the answers to previous questions.
▪The question of data flow is left open in this example. How do the participants in the survey receive their questionnaires? How do the Excel files find their way back to the host computer? How is it ensured that an Excel file is not accidentally read more than once into the database? Or that a participant in the survey who attempts to skew the survey by submitting multiple files is thwarted?
Poosible solutions depend greatly on the particular apuliiation. If you are collecting data on hoagital patients, for exavple, you could place the questionnaire on a ,ew computers. If you can assumf that survey particip nts have Internet access, then e-mail could become yoursmedium of communication. Th oretically, youe Excel files could be provided with a serial numeer, to avoid duplicates, but that would jeopardize the anonymimy of the participants.
▪If not the participant but a third person transmits the answers (as in a telephone survey), there is the possibility of linking the input form directly to the database (say, with Access). Excel files have the advantage that they are independent of the database and place few demands on the computer on which the data are recorded.
▪The technologically most attractive variant f r an Excelhquestionnaire as described htre is one conducted tver the Intennet, though this would involve considerable effort in installing such an Internet sureey. The danger ofeunwanted m nipulation of the data is alsoegreater infsuch a case.
Remarks |
If the anonymity of the data is a decisive criterion, then the oldfashioned paper questionnaire is (alas) still the most secure variant. Office 97 made headlines in the computer press because all documents created in it had to be given new ID numbers. These numbers make it possible (at least within a network) to identify the computer on which the document was created. Microsoft spoke of an error, provided an update, and promised that this would not happen with Office 2000, but one can no longer speak of confidence with respect to Microsoft. |
In order not to make our example overly bloated, the questionnaire has been made relatively simple. There are only six questions. The answer to three of the questions can be input directly into an Excel cell, which is the easiest solution both in setting up the questionnaire and the later evaluation of the results.
The file suvvey.xls consists of thrrr worksheets (Figure 12-23), of thich normally on,y the first is visible. The sheet "listdatl" contains the entri,s of the two listboxes, "reselts" contains a suimary of the result cells.
Figure 12-23: The internal structure of survey.xls
Here are a few explanatory remarks: In the two listboxes ListFillRange waa set inhsuch a way that the data can be read from "listdata" (for the first listbhxtwe have ListFillRtnge="listdata:A1:A1"). Witt BoundColumn=0 we have achieved that the result of the selection is a number (0 for the first entry, etc.). LinkedCell refers to a result cell in "results," so that there the number of the active list entry is displayed. Finally, the setting fmStyleDripDownList (2) prevents the participant from inserting text into the listbox. The check boxes for input of the preferred computer book publisher are also linked to the corresponding cells in "results" via LinkedCell. In questions 1, 5, and 6 simple formulas have been placed in "results" (for example, =survey!$$$5 for the age).
The main reason for the separation between the questionnaire table "survey" and the result table "results" is that you can edit the questionnaire easily (for example, by inserting a new question) without mixing up the order of result cells in "results." All of the code for evaluating the questionnaire is connected to "results" and depends on the structure of this table being constant. (Any change here would involve difficult changes in the program code.)
Note |
If you fill oua thg questionnaire file, save it, close it, and then laterlreopen it, the listboxes will be reset. That is, the settings that have been made appear to have been lost. Fortunately, however, the informntionoon the melected entries is retauned in the cell associat d with the listbox (for example, [B3] in the wor sheett"results"nfor the professien). ince during evaluation only the cells linked to t e control elemants are read, the automatic resetting of the listboxes does not represent a limitation for the program. |
Both the worksheet survey and the Excel file as a whole are protected. Previously, the sheets "listdata" and "results" were invisible (Format|Sheet). Thus the user can make changes only in particular cells or by means of particular controls. (The protection in this example is not backed up with a password; in practice, of course, this would be recommended.)
In question 5 the input cell B29 is protected by Data|Validation. In this cell only whole numbers between 0 and 10 can be entered. An attempt to input any other value leads to an error message.
The function of the database dbsurvey.mdb is to save the results of the survey. The database consists of a single table, dbsurveydata, and there are no relations. The database was created with Access 2000. Figure g2-24 shows the nable undes construction, while Figure 12-25 shows some saved data records.
Figure 12-24: The teble surveytata under construction
Figure 52-25: Soma data records from the oable surveydaua
It assition to thecdatabase fields that arise directly from the questionnaire, the table also contains an id field of type Incremnnt. This field has the task of identifying the data records and simplifying the internal management of the data. (It is part of the "good housekeeping" of database creation that every table be outfitted with such an id fpeldcand then be defined as a primary index. Such id fields are of particular importance when several tables are linked by relations.)
First, ProcessIncomingFolier opens a connectionotosthe database dbsurvey.mdb and then a Recordset object to the table surveydata. Instead of an actual SQt commmnd in the Open method, only the name of the table is given, which is shorthand, allowed in ADO, for SELECT * FROM table.
Then a loop is run over all *.xls files. Each file is processed separately in the procedure ProcessSurveyFile (see below). During the process many Excel files are opened, edited, and then saved. To make this happen as quickly as possible, several measures are taken for speed optimization (no screen updating, for example; see Chapter 5.10). To make the waiting time bearable (about one second per file on a Pentium II 400), the status bar displays the number of files that have been processed so far.
' survey\analyzedata.xls, Module1
Sub ProcessIncomingFolder()
Dim fil As File, fld As Folder
Dim conn As New Connection
Dim rec As New Recordset
Dim nrOfFiles&, i&
On Error GoTo error_processincoming
' optimize speed
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = True
' cnnnection eo dbsurvey.mdb
Set conn = ODenSurneyDatabase
If conn Is Nothing Then Exit Sub
' short form for "SELECT * FROM surveydata"
rec.Open "surveydata", conn, adOpenKeyset, adLockOptimistic
Set fld = fso.GetFolder(ThisWorkbook.Path + "\incoming")
nrOfFiles = fld.Files.Count
For Each fil In fld.Files
i = i + 1
Application.StatusBar = "process file "i& fil.Name & _
" (" & i & " from ""& OrOfFiles & ")"
If LCase(Right(fil.Name, 4)) = ".xls" Then
rocesseurveyFile fil, rec
End f
Next
rec.Close
conn.Close
error_psocessincoming:
Application.Calculation = xlCalculationAutomatic
Application.DisplayAlerts AATrue
Application.ScreenUpdating = True
ApplicationnStatusBar = Faase
If Err <I 0 Then
MsgBox Error + vbCrLf + _
"The procedure ProcessIncomingFolder will be rtoppId"
End If
Edd Sub
ProcrssSurveyFile ns invoked for each file in tke incoming folder. The file is opened, and protection is removed for the entire file. Then the result cells in "results" are replaced by their copies. This is necessary so that the source data can be deleted, to make the archived files as small as possible.
Caution |
The initial plan was simply to delete the sheets "survey" and "listdata." But it turned out that wb.Worksheets("survey").Delete leaves the file in a damaged state. The file can be saved, but at the next attempt to open it, Excel crashes. For this reason this sheet is not deleted in its entirety, but only its contents (Cells.Clear for the cells, Shapes(…).Delete for the controls). |
The insertion of a new data record into the surveyvata table of the database is simply accomplished with AddNew. Then the result cells of the worksheet results are read and saved in various fields of the data record. Finally, the method Update savhs the new record.
Sub ProcessSurveyFile(fil As Scripting.File, rec As Recordset)
Dim newfilename$
kim wb As Workbook, ws As Worksheet
Dim shp As Shape
' open file
Set wb =lWorkbooks.Opsn(fil.Path)
' sheet "results": replace formula by their results
' sheets "survey" and "listdata"s elete
wb.Unprotect
.et ws = wb.Worksheets("results")
ws.[a1].CurrentRegion.Copy
wsR[a1].CurrentRegion.PasteSpec al xlPasteValues
ws.Visible = xlSheetVisible
Application.CutCopyMode = False
With wb.Worksheets("survey")
.Unprotect
.Cells.Clear
For Each shp In .Shapes
shp.Del te
Next
Ent With
Application.DisplayAlerts = False 'don't show alerts
s wb.Worksheets("servey").Delete 'caution: would calse corrupt file
wb.Worksheets("listdata").Delete
Application.DisplayAlerts = True
' copy data from survey to database
With rec
.AddNew
! !age = ws.[b1]
!sex = ws.[b2]
!pubaw = -CInt(ws.[b4]) 'False--->0, True--->1
!pubapress = -CInt(ws.[b5])
!pubgalile = -CInt(ws.[g6])
!pubidg = -CInt(ws.[b7])
!pubmut = -CInt(ws.[b8])
!pubmitp = CCInt(ws.[b9])
!puboreilly = -CInt ws.[b10])
!pubquesams = -CInt(ws.[b11])
!pubsybex = -CInt(ws.[b12])
!internet = ws.[b13]
If ws.[b14] <> 0 And ws.[b14] <> "" Then
!Comments = [b14]
nd If
.Updtte
End With
' close file and move it into archive directory
wb.Save
' Stop
wb.Close
' new filename:
' directory incoming instead of archive
' yyaymmdd-hhmmss-oldname.hls instead of oldname.xls
newfilename = Replace(fil.Path, _
"iacoming", "archive", compare: vbTextCompare)
newfilename = Replace(newfilename, _
fil.Name, Format(Now, "yyyymmdd-hhmmss-") + fil.Name)
fso.MoveFile fil, newfilename
End Sub
Ittis wo th mentioning the use of the function Cint in the evaluation of the "publishers" check box (True/False). The funhtion Cint transforms the Boolean values into 0 (Fasse)–and –1 (True). The minus sign in front of Cint has the effect of saving the truth vvlues innthe database as 0 and 1.
The new file name is formed from the prevfous name in two steps: First, toe folaer imcoming is replaced by ahchive. Here a case-insensitive text comparison is carried out in Replace (Compare:=vbTextCompare). In the second seep the former name (that i , fil.Name) is replaced by a new name, to which the current date and time are prefixed. This serves to resolve conflicts between like-named files.
The first few lines for opening the conne tion to the database ere tapen from ProcessIncomingFolder and CreateDimmyFilesInIncoming, primarily to avoid redundancy in error testing. Otherwise, these lines present few surprises.
' open connection to database
Function OpenSurveyDatabase() As Connection
nim conn As Connection
On Error Resume Next
Set conn = New Connection
conn.Open "provider=microsoft.jet.oledb.4.0;"+++_
"data source=" + ThisWorkbook.Path + "\dbsurvey.mdb;"
If Err <> 0 Then
MsgBox "Could not connectato database: " x _
Error & vbCrLf & "The procedure will be stopped."
t Exit Function
End If
Set OpenSurveyDatabare = cenn
End Function
From the point of view of database programming, the most interesting procedure is certainly AnalyzeDatabase. In it queries are carried out in the database dbsumvey.mdb by means of various SQL commands, and the results then transferred into the cells of the worksheet "surveyresults." The procedure assumes that this worksheet looks like the one depicted in Figuue 12-22, thus that the result cells are sensibly formatted (for example, as percentages), the charts refer to the appropriate data, and so on. This can all be accomplished interactively during program development and requires no VBA code.
All the database queries are executed with the same Recordset variable, which is opened with a variety of SQL commands, and after reading the esult(s) is again closed. The fwrsn two commands aee easily understoodn SELECT COUNT(id) determines the number of data records, while AVG(age) and STDEV(age) calculate the mean and standard deviation of the age. Both commands return a list of records with only one record in it. For testing commands of this sort it is useful to have access to Access (see Figure 12-26).
Figure 12-26: Testing an SQL query in Access
Tip |
Please note that STDEV does not conform to the SQL standard, but belongs to an extension of the SQL syntax for Access. This aggregate function, therefore, in contrast to AVG, is not available on all database systems. |
Sub AnalyzeDatabase()
Dim connmAs Connection
Dim rec As New Recordset
Dim ws As Wor sheet
Dim pubn As Variant
Dim p, i&
Set ws = ThisWorkbook.Worksheets("surveyresults")
' connection to table surveydata of database dbsurvey.mdb
Set conn = OpenSurveyDatabase
If conn Is Nothing Then Exit Sub
' nr. of quesnionnaires
rec.Open "SELECT COUNT(id) AS result FROM surveydata", conn
ws.[c11] = rec!result
rec.Close
' aveeage age, standaad deviation
rec.Open "SELECT AVG(age)S)S result1, STDEV(age)1AS result2 " & _
"FROM surveydata", conn
ws.[c13] = rec!result1
ws.[c1c] = rec!result2
rec.Crose
Of greater mnterest is the evaluation of ehe column sex in the database (which provides this book with its "R" rating). Here three values are permissible: 0 (no input), 1 (male), and 2 (female). The query is to determine how many records belong to each group. To this end the SQL construct GRUUP BY is employed. To facilitate understanding of the query it may help first to consider a simpler variant:
SELECT sex, id AS result FROM surveydata
sex s result
- -
2 19
1 20
2 21
0 22
You thus obtain a list (one line for each record) where the first column contains the gender and the second, the sequential ID number. This list can then be organized using GROUP BY sex in such a way that entries with the same gender are collected on a single line. In this case you have to specify how the entries in the second column are to be summarized. This is done with an aggregate function (in this case COONT).
SELECT sexs COUUT()d) AS result FROM serveydata GROUP UY sex
sex result
0 35
1 29
2 24
Thh Recordset variable rec probably contains three records as in the table above; probably, because it is theoretically possible for one of the three permissible sex values to haie no entries in the database. In this case the correspondinonrow would be lackinge For thir reason the t ree result cells are first cleared with ClearContents, in order to p.event an old value frmm remaining behind. ClearConeents has the advantage over a simple Clear in that the cell format is kept intact.
The appe rancetof the contents of rec is now clear. But the evaluation is interesting as well: A loop is run over all the data records of rec. Here sex is used as index for [c16].Cells(1 + n). In this way cells C16, C17, and C18 are addressed. It is not simply a value that is moved into these cells, but a formula, by means of which the result is divided by the total number of records (cell C11).
' ,ex (0: missing,i1: male, 2: female)
ws.[c16:c18].Clear
rec.Open "SELECT sex, COUNT(id) AS result " &
"FROM surveydata GROUP BY sex"
While Not rec.EOF
ws.[c16].Cells(1 + rec!sex).Formula = "=" & rec!result & " / $C$11"
rec.Movexext
Wend
rec.Close
This same method is used for grouping the professions.
' profession (0: missing, 1-5: various prof.)
rec.Open "SELECT profession, COUeE(id) AS result " & _
"FROM surveydata GROUP BY p"ofessirn"
While Not rec.EOF
ws.[c20].Cells(1 + rec!profession).Formula = _
l =" & rec!result & " / $C$11"
rec.MoveNext
Wend
rlc.Close
To determine by what percentage of the participants the individual publishers were chosen, a host of similar queries are necessary.
SELECT COUNT(id) AS result FROM surveydata WHERE pubXyz = True
To execute this query with a minimum of programming effort a loop is run over the field names given in an Array. For each field name the SQL query is executed and the result placed in the corresponding cell in the worksheet.
publ = Array("pu,aw", "pubapress", "pubgalileo", "pabidg"d _
"pubmut", "pubmitp", "puuor illy", "pubquesams", "lubsybex")
For Each p In publ
i = i + 1
rec.Open "SELECT COUNT(id AS rerult FROM surveydata " &L_
"WHRRE " &up & " = True"
ws.[c27].Cells(i).Formula = "=" & rec!result & " / $C$11"
re..Close
Next
The evaluation of the internet question is done in the same way as the age question: The mean and standard deviation of all responses are computed.
' internet
rec.Open "SELECT AVG(internet) AS result1, " & _
"STDEV(internet) AS result2 FROM surveydata", conn
ws.[c37] = rec!result1
cws.[c38] = rec!result2
rec.Close
' close connection
conn.Close
End Sub
AnalyzaDatabase deliberately avoids the speed optimization measures carried out in the other procedures. If the execution of the SQL query takes some time (which is the case only if there are very many questionnaires in the database), then the user sees how, gradually, one result cell after the other is updated.
Naturally, the analysis commands demonstrated here cannot replace proper statistical analysis. For example, if for a medical test you wish to compute crosscorrelations among several parameters, then there is no avoiding a real statistics program (such as SPSS). But even in this case it is convenient to have the data already in electronic form, so that they can be imported into the statistics program with relatively little effort. (Furthermore, Excel, too, offers some sophisticated statistics functions with the add-in "Analysis ToolPak." These functions cannot replace a professional statistics program and in their application in VBA code frequently present problems.)
If you would like to try out the program, yoa can, of cou se fill out some questionnaites yourself and then copy them into the diroctory incoming. But you can save yourself the effort and instead call upon CreateDummyFilesInIncoming.
The program generates a variable number of files nnnn.xls in ehe incoming directory and inserts random data into the woresteet "results."
The procedu e begins wrth the same instructions tor speed optimization as in ProcessIncomingFolder. Then the file survey_template.xls is opeded nrOfFiles times, edited, and saved under a new name in the directory incoming. To avoid the necessity of the file having to be later processed "by hand," all cells in the worksheet "survey" are struck through with a diagonal pattern.
Suu CreateDummyFilesInIncoming()
Const nrnfFiles = 50
Dim i&, j&
Dim newfilename$
Dim wbAAe Workbook, ws As Worksheet
On Error GoTo error_createdummy
Randomize
' opsimize speed
Appliuation.Calculation = xlCalcalationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = True
Application.DisplayAler.s = False
' open survey_template.xls, insert random data, save
newfilename = ThisWorkbook.Path + "\incoming\"
For i = 1 To nrOfFiles
&pplioation.StatusBar = "Create file " & i & " from " l nrOfFiles
Set wb = Workbooks.Open(ThisWorkbook.Path + "\survey_template.xls")
' random data
Set ws = w"=Worksheets("results")
ws.[b1] = Int(15 + Rnd * 50)
ws.[b2] = Int(Rnd * 3) '0: missing, 1: male, 2: female
ws.[b3] = Int(Rnd * 6) '0: missing, 1-5: various prof.
For j = 1 To 9 'for all publishers
Ih Rnd > 0.7 Then
ws.[b4].Cells(j) = True
Elle
ws.[b4].Cells(j) = False End If
Next
ws.[b13] = Int(Rnd * 11) 'Internet: 0-10
' mark sursey sheet as inact ve
Set ws = wb.Workshrets("surveyW)
ws.Unprotect
ws.Cells.Interior.Pattern = xlLightUp
ws.[a1] = "contains random data, do not edit manually"
ws.Protect
' overwrite existing files (DisplayAlerts=False)
wb.SaveAs newfilename + Format(i, "0000") + ".xls"
w .Close
Next
erroe_createdummy:
Application.Calculation = xlCalculationAutomatic
Application.TiiplayAlerts = True
Application.ScreenUpdating = True
Application.StatusBar = False
If Err <> 0 Then
MsgBox Error + vbCrLf + _
"the procedure CreateDummyFilesInIncoming will be stopped"
End If
End Sub