Monitoring database creation guidance
You can apply these guidelines and the following examples to SPSS, Excel or Access.
1. Create a variable for each question or response
The structure of the question will determine if each question requires one variable or
multiple variables in the database. Create a single variable for questions that allow
for only one answer. ―Yes‖ or ―no‖ questions require only a single variable in the
database. Similarly, you would create one variable for question A5 below.
How far is the nearest drinking water source from
your home (in minutes)?
You would enter the number of minutes in this variable. You could name this
variable ―A5‖ or ―watmin.‖ If you are using SPSS, you may choose to use ―A5‖ in
the name column and ―watmin‖ in the label column.
Use a standard approach in naming your variables that can be easily understood by
team members who work with the database.
If there are multiple responses allowed for a question, as in question B16 below,
create one variable for each possible response. For question B16, create a total of five
variables: one for each possible option (―canal,‖ ―spring,‖ ―well‖ and ―other‖) and a
variable to enter the ―other‖ information specified.
Where do people in your
community collect water (circle
all that apply)?
4. Other (specify) __________
Each of the first four variables essentially becomes a ―yes/no‖ variable, with ―yes‖
recorded for each option selected by the respondent. ―Yes‖ should be recorded as ―2‖
and ―no‖ should be recorded as ―1.‖ Each of these is a numeric variable.
The variable used to record the specific ―other‖
information will be a ―string variable,‖ that is
a variable that contains letters instead of
numbers. String variables are not coded and
can house any information provided by the
Always include an additional ―other‖ variable (a string variable in SPSS) in the
database to capture these responses. The specific responses entered after ―other‖ can
be used in analysis and for designing future coded responses for quantitative
1. Record the coded responses in the database
Many of the questions in the questionnaire are likely to have coded responses (e.g., 1
= canal, 2 = spring, 3 = well, 4 = other or 1 = no, 2 = yes). The data enterers will enter
the number corresponding to each response (e.g., ―2‖ for ―spring‖or ―2‖ for ―yes‖).
For data analysis, it will be useful to have the description for each code included in
the database. In SPSS, enter the code for each response in the value column. In Excel,
include a list of coded responses on a separate sheet to use in data analysis.
2. Account for nonresponse or missing data
It is important to differentiate between a nil response and missing data. Nil is a zero
(0) value. Missing data are data that were not recorded in the questionnaire. Missing
data may occur if each question does not apply to every respondent (due to skip
rules), if respondents chose not to answer a question, or due to human error during
It is standard practice to designate ―999‖ to represent missing data. Data enterers
can input ―999‖ to indicate that questions or data were not included in the
questionnaire. If you are using SPSS, enter ―999‖ in the missing column so that SPSS
will not include these values in calculating valid responses. With an appropriate
database design, the person(s) analyzing the data will be able to identify which
respondents reported that it took ―0 minutes‖ to reach the nearest drinking water
source and which respondents did not answer this question.