Common pitfalls in data collection
Common pitfalls in data collection
Understand why it’s important to select the right data
In the realm of measurement, various factors can unintentionally impact data selection. Some of these factors are faults in our logic, known as fallacies, while others are faults in our consideration of data and evidence, known as biases1 2 All of them are common, human pitfalls that we can work against if we’re aware of them.
This section explores some common fallacies and biases we might engage with when we work with data. To guard against any of these pitfalls, be aware of why you’re selecting a particular dataset, document your choices and the logic behind them, and use a variety of data types and sources to create a check and balance against one type or source dominating your measurement instrument. Common fallacies and biases include:
- The McNamara fallacy
- Confirmation bias
- Findability bias
- The magic survey number
The McNamara fallacy
The pitfall: Only measure what’s easy to quantify, and ignore harder-to-measure but equally important factors.3
Also known as the quantitative fallacy, the McNamara fallacy is named for U.S. Secretary of Defense (1961-68) Robert McNamara, who believed that quantitative, “hard” data was the only concrete basis on which to make decisions. For this reason, he established that the number of enemy combatants killed would be used as the primary metric by which the U.S. evaluated success in Vietnam. This metric did not result in success in the conflict, as a blind pursuit of kill score resulted in unnecessary, arbitrary, and cruel actions.
How to avoid this pitfall: Ensure that you’re not making decisions based solely on quantitative metrics. Evaluate each metrics’ accuracy in helping you understand the core problem or goal. If data is easy to count, that’s just what it is: easy to count. Just because it’s easy to count doesn’t mean the data is significant or even marginally useful to our purposes.
Confirmation bias
The pitfall: The tendency to believe or give more weight to data that confirms existing beliefs and methodologies, over data that challenges those existing beliefs and methodologies. Confirmation bias “…connotes the seeking or interpreting of evidence in ways that are partial to existing beliefs, expectations, or a hypothesis in hand.”4
It is endemic to human decision-making; we all make decisions based on confirmation bias. We do this because finding data that supports our beliefs means we don’t necessarily need to change anything about how we think or what we’re doing. Resisting change, however, inhibits your understanding of complex spaces. To truly understand the effects of your intervention, you can’t just look at the data you agree with.
How to avoid this pitfall: Use a wide variety of source data, preferably across qualitative, quantitative, and historical dataset types, to help build an accurate picture of the truth of impact. Once you’ve gathered and interpreted your data, and published measurements, you can continue to combat confirmation bias by making your methodology and datasets available and transparent to others, and to invite questioning of them.
Findability bias 5
The pitfall: This pitfall specifically refers to research on the internet. It means favoring and selecting data that is easy to find, while ignoring or underestimating information that is harder to access or retrieve. One of the biggest drawbacks to data use is that you only find what’s there. A famous, unattributable quote is that “history belongs to the victors.” This is somewhat true, but could more accurately be framed as “history belongs to the people who record things, and whose records continue to be findable and readable by later humans.” We only find what’s there, and what we can actually use, given our skill sets.
How to avoid this pitfall: Recognize when you’re limiting your search for data due to its ease of availability. Like the pitfalls above, the key to recognizing and working against availability bias is awareness of your actions. If you only have the resources to perform simple, general searches for data on your intervention, it’s important to note those limitations in your project documentation. It doesn’t necessarily mean that your measurement data sets are all wrong; it just means that your measurement work will be limited by the scope of your data research. By documenting that limitation, however, you set future researchers up to evaluate your work and build from it.
The magic survey number 6
The pitfall: Fixating on achieving a certain threshold of, or answers to, survey responses, without context about what the number is, why it matters, or what data it’s based on.
Surveys are the federal government’s default customer research tool. When you start collecting data for your measurement project, you may want to administer a survey; you’ll also likely unearth relevant surveys previously conducted by others. These can be valuable resources– if you know how to interpret the results.
Survey data is frequently misused. The veneer of mathematical precision obscures a lot of human choices and biases. Surveys can measure quantities (like the US population as reported by the Census Bureau) and/or qualitative concepts (which agencies are the best places to work, according to the Federal Employee Viewpoint Survey). In either case, the outcomes depend heavily on which questions the survey designers ask, how respondents interpret those questions, the sampling plans researchers devise, and the statistical methods analysts select to tally up the results. Surveys look simple to the untrained eye, but a world of human complexity lurks just beneath the surface.
So what are surveys for?
The bottom line is that surveys collect a thin slice of information from a large number of people. Each respondent answers the same set of questions, so the response data is structured and well-suited to statistical analysis. That’s one reason they’re popular in the federal government: graphed survey results resemble the quantitative key performance indicators (KPIs) that leaders are accustomed to consuming.
In other words, surveys turn human concepts like satisfaction and trust into columns of numbers. This carries benefits and risks. Numerical sentiment data is a straightforward way to track changes in sentiment over time. Running a regression analysis on qualitative data can identify patterns and relationships to investigate in other, unstructured data sets. It works the other way around, too; if you sense a trend in your interview transcripts, you can use regression analysis on related survey results to validate it.
How to avoid this pitfall: Recognize the possibility of this pitfall if you feel pressure from yourself, peers, or leaders to gather a certain number or type of responses. Work with survey experts to design your survey, particularly if you need to gather information from a statistically significant number of people, since those experts can run the required mathematical analysis for you.
If, on the other hand, you can survey a small group and then follow up with questions about their answers, survey experts may not be necessary. As in the other pitfalls, avoiding the magic survey number depends on rigorously evaluating your need and use of the data you’re gathering.
Conclusion
This is by no means a comprehensive list of pitfalls in data selection, but is merely a primer to some of the more common types. The takeaway here is that having completely perfect data is impossible, but it is possible to be self-aware in how and why you’ve selected the data you have, to document your choices, and to use a variety of data types and sources to check and balance your selections.
Footnotes
-
Simundić AM. Bias in research. Biochem Med (Zagreb). 2013;23(1):12-5. doi: 10.11613/bm.2013.003. PMID: 23457761; PMCID: PMC3900086. ↩
-
Bias is colloquially defined as any tendency that limits impartial consideration of a question or issue. In academic research, bias refers to a type of systematic error that can distort measurements and/or affect investigations and their results. It is important to distinguish a systematic error, such as bias, from that of random error. Random error occurs due to the natural fluctuation in the accuracy of any measurement device, the innate differences between humans (both investigators and subjects), and by pure chance. Random errors can occur at any point and are more difficult to control. Systematic errors…occur at one or multiple points during the research process, including the study design, data collection, statistical analysis, interpretation of results, and publication process. Popovic A, Huecker MR. Study Bias. [Updated 2023 Jun 20]. In: StatPearls Internet. Treasure Island (FL): StatPearls Publishing; 2024 Jan. ↩
-
Carmody JB. On Residency Selection and the Quantitative Fallacy. J Grad Med Educ. Aug;11(4):420-421. doi: 10.4300/JGME-D-19-00453.1. PMID: 31440336; PMCID: PMC6699544. 2019. ↩
-
Nickerson, R. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Review of General Psychology. 2. 175-220. 10.1037/1089-2680.2.2. 175. 1998. ↩
-
Jacob, E.K. and Loehrlein, A. , Information architecture. Ann. Rev. Info. Sci. Tech., 43: 1-64. 2009. https://doi.org/10.1002/aris.2009.1440430110 ↩
-
Petway, K OCE Liaison to the Federal Acquisition Service (FAS). Program Analyst, Voice of the Customer team, Office of Customer Experience, General Services Administration. 2023. ↩