# Steps for analyzing quantitative data

1.  Run descriptive statistics. Descriptive statistics include frequencies, percentages,
median and mean values.
Frequencies and percentages
For noncoded responses (e.g., value in local currency for monthly income or
minutes to nearest water source):
o  What were the maximum and minimum values? Any values that do not
seem feasible should be cross-checked with the data included in the
questionnaires (refer to Data Entry and Cleaning).
o  What is the spread of these responses? Are the responses clustered in any
way? What does this tell us about the target population?
For coded responses (e.g., 1 = less than 15 minutes, 2 = 15 to 30 minutes, 3 = 30
minutes or more):
o  What were the most common responses to questions with coded
responses? What were the least common responses?
o  Was the frequency of any of these responses unexpected?
o  What proportion of respondents cited ―other‖ for these questions? What
were the other responses they provided in addition to the coded list?
  If many of the responses included in the ―other‖ data have the same meaning (aside
from slight variations in wording), create additional responses or categories with
these data and include them in your results.
Missing data
If 30 percent or more of the questionnaires do not have a response for one of the
questions, then the information for that question may give you a false understanding
of the situation.
If there is a high proportion of missing data, do you still have enough data to
accurately represent the situation? Consider not including results for indicators
with a high proportion of missing data.
What could explain this high percentage of missing data? Consider any problems
encountered during fieldwork as well.
In future surveys, could questions be asked in a different way to reduce missing
data?
Mean and median values
o  What were the mean values, or average values, for the survey population?
o  Also determine the median value (i.e., the value in the middle of the
range).

o  Are the mean and median values quite different? If so, this suggests that
there are clusters of values within the spread of data. What are possible
reasons for this?
2.  Run inferential statistics. Inferential statistics include comparisons between
subgroups and tests for statistical significance in results.
Compare key subgroups. Common subgroups include wealth groups, as well as
Create the variables required by your analysis plan. For example, you may
need to sum the amounts received from different sources of income to
calculate the total monthly household income in order to create wealth
groups. Indicate that these variables are ―created‖ in their names (e.g.,
including ―c‖ for ―created‖ in ―c_income‖).
Run frequencies and percentages for each subgroup. What could account for
differences in minimum and maximum values or percentages between
groups? What could account for similarities? Are the percentages statistically
significant?
Identify mean values for each subgroup. Again, look for significant
differences between groups.
When calculating means, the characteristics used to identify your subgroups (e.g.,
low-wealth group, female-headed households) are considered independent variables
and the variables you would like to compare (e.g., monthly income, minutes to
nearest drinking water source) are considered dependent variables.
  If comparisons between subgroups were not statistically significant (e.g., chi-squared
tests had p-values of more than 0.05), state that the results were not statistically
way you will not receive requests for significance values.

3.  Revisit your analysis plan. Have these initial results raised additional questions?
Can you answer these questions with your existing quantitative data? If so, run
additional frequencies, comparisons or tests to answer these questions. It is likely
that the initial quantitative results also have raised questions that cannot be