Weekly Development Links #7

1. 11 years later: Experimental evidence on scaling up education reforms in Kenya (TL;DR gov’t didn’t adopt well)

(This paper was published in Journal of Public Econ 11 years after the project started and 5 years after the first submission!) “New teachers offered a fixed-term contract by an international NGO significantly raised student test scores, while teachers offered identical contracts by the Kenyan government produced zero impact. Observable differences in teacher characteristics explain little of this gap. Instead, data suggests that bureaucratic and political opposition to the contract reform led to implementation delays and a differential interpretation of identical contract terms. Additionally, contract features that produced larger learning gains in both the NGO and government treatment arms were not adopted by the government outside of the experimental sample.”

2. Argument for reporting the “total causal effect”

  • Total causal effect (TCE) = weighted average of the intent to treat effect (ITT) and the spillover effect on the non-treated (SNT)
  • Importance: “RCTs that fail to account for spillovers can produce biased estimates of intention-to-treat effects, while finding meaningful treatment effects but failing to observe deleterious spillovers can lead to misconstrued policy conclusions. Therefore, reporting the TCE is as important as the ITT, if not more important in many cases: if the program caused a bunch of people to escape poverty while others to fall into it, leaving the overall poverty rate unchanged (TCE=0), you’d have to argue much harder to convince your audience that your program is a success because the ITT is large and positive.”
  • Context: Zeitlin and McIntosh recent paper comparing cash and a USAID health + nutrition program in Rwanda. From their blog post: “In our own work the point estimates on village-level impacts are consistent with negative spillovers of the large transfer on some outcomes (they are also consistent with Gikuriro’s village-level health and nutrition trainings having improved health knowledge in the overall population). Cash may look less good as one thinks of welfare impacts on a more broadly defined population. Donors weighing cash-vs-kind decisions will need to decide how much weight to put on non-targeted populations, and to consider the accumulated evidence on external consequences.”

3. Why don’t people work less when you give them cash?

Excellent post by authors of new paper on VoxDev, listing many different mechanisms and also looks at how this changes by type of transfer (e.g. gov’t conditional and unconditional, remittances, etc.)

BONUS: More gender equality = greater differences in preferences on values like altruism, patience or trust (ft. interesting map)

Falk & Hermle 2018

Causal Inference: The Mixtape

“Identifying causal effects involves assumptions, but it also requires a particular kind of belief about the work of scientists. Credible and valuable research requires that we believe that it is more important to do our work correctly than to try and achieve a certain outcome (e.g., confirmation bias, statistical significance, stars). The foundations of scientific knowledge are scientific methodologies. Science does not collect evidence in order to prove what we want to be true or what people want others to believe. That is a form of propaganda, not science. Rather, scientific methodologies are devices for forming a particular kind of belief. Scientific methodologies allow us to accept unexpected, and sometimes, undesirable answers. They are process oriented, not outcome oriented. And without these values, causal methodologies are also not credible.”

Causal Inference: The Mixtape by Scott Cunningham, associate professor of economics at Baylor University (oh and there’s an accompanying Spotify playlist)

Weekly Development Links #2

This is part 2 of me taking over IDinsight’s internal development link round-up.

1. This week in gender & econ

2. Two papers on p-hacking or bad reporting in econ papers

3. Mapping trade routes Tilman Graff shared some really cool visualizations of trade routes, aid, and infrastructure in several Africa countries. They were created as part of his MPhil thesis.

Grounded Theory, Part 1: What is it?

Photo by Calum MacAulay on Unsplash

I recently read Brené Brown’s Daring Greatly. The book presents Brown’s research, but it can feel more like a personal guidebook to tackling issues of vulnerability and shame.

Because the research has a conversational feel, it’s hard to understand how much of the book is based in research and how much in Brown’s individual experiences. She weaves in personal stories frequently, often to demonstrate a prickly emotional experience that was common across her interviews. But when I reached the end of the book, I wanted to know how she drew these theories from the data. I’ve only worked sparingly with qualitative data: how does one “code” qualitative data? How do you analyze it without bringing in all sorts of personal biases? How do you determine its replicability, internal and external validity, and generalizability?

Ingeniously, Brown grounds the book in her research methods with a final chapter on grounded theory methodology. Her summary (also found online here) was a good introduction to how using grounded theory works and feels. But I still didn’t “get” it.

So I did some research.

Grounded Theory

Brown quotes 20th century Spanish poet Antonio Machado at the top of her research methods page:

“Traveler, there is no path. / The path must be forged as you walk.”

This sentiment imbued the rest of the grounded theory (GT) research I did. Which seemed bizarre to a quant-trained hopeful economist. I’m used to pre-analysis plans, testing carefully theorized models, and starting with a narrow question.

Grounded theory is about big questions and a spirit of letting the data talk to you.

Founded by Barney Glaser and Anselm Strauss in 1967, GT is a general research methodology for approaching any kind of research, whether qual- or quant-focused. When using GT, everything is data – your personal experiences, interviews, mainstream media, etc. Anything you consume can count, as long as you take field notes.

Writing field notes is one of the key steps of GT: coding those notes (or the data themselves – I’m still a little blurry on this) line-by-line is another. The “codes” are recurring themes or ideas that you see emerging from the data. It is a very iterative methodology: you collect initial data, take field notes, code the notes/data, compile them into memos summarizing your thoughts, collect more data based on your first learnings, code those, compile more memos, collect more data…

Throughout the whole process, you are theorizing and trying to find emergent themes and ideas and patterns, and you should actively seek new data based on what your theories are. You take a LOT of written notes – and it sounds like in the Glaserian tradition, you’re supposed to do everything by hand. (Or is it just not using any algorithms?)

Brown describes the data she collected and her coding methodology:

“In addition to the 1,280 participant interviews, I analyzed field notes that I had taken on sensitizing literature, conversations with content experts, and field notes from my meetings with graduate students who conducted participant interviews and assisted with the literature analysis. Additionally, I recorded and coded field notes on the experience of taking approximately 400 master and doctoral social-worker students through my graduate course on shame, vulnerability, and empathy, and training an estimated 15,000 mental health and addiction professionals.

I also coded over 3,500 pieces of secondary data. These include clinical case studies and case notes, letters, and journal pages. In total, I coded approximately 11,000 incidents (phrases and sentences from the original field notes) using the constant comparative method (line- by- line analysis). I did all of this coding manually, as software is not recommended in Glaserian-grounded theory.” [emphasis mine]

The ultimate goal is to have main concepts and categories emerge from the data, “grounded” in the data, that explain what main problem your subjects are experiencing and how they are trying to solve that problem. For example, Brown’s work centers on how people seek connection through vulnerability and try to deal with shame in various health and unhealthy ways. She started with this big idea of connection and just started asking people about what that meant, what issues there were around it, etc. until a theory started to arise from those conversations.

You’re not supposed to have preexisting hypotheses, or even do a literature review to frame specific questions, because that will bias how you approach the data. You’re supposed to remain open and let the data “speak to you.” My first instinct on this front is that it’s impossible to be totally unbiased in how you collect data. Invariably, your personal experience and background determine how you read the data. Which makes me question – how can this research be replicable? How can a “finding” be legitimate as research?

My training thus far has focused on quantitative data, so I’m primed to preference research that follows the traditional scientific method. Hypothesize, collect data, analyze, rehypothesize, repeat. This kind of research is judged on:

  • Replicability: If someone else followed your protocol, would they get the same result?
  • Internal validity: How consistent, thorough, and rigorous is the research design?
  • External validity: Does the learning apply in other similar populations?
  • Generalizability: Do the results from a sample of the population also apply to the population as a whole?

GT, on the other hand, is judged by:

  • Fit: How closely do concepts fit the incidents (data points)? (aka how “grounded” is the research in the data?)
  • Relevance: Does the research deal with the real concerns of participants and is it of non-academic interest?
  • Workability: Does the developed theory explain how the problem is being solved, accounting for variation?
  • Modifiability: Can the theory be altered as new relevant data are compared to existing data?

I also read (on Wikipedia, admittedly), that Glaser & Strauss see GT as never “right” or “wrong.” A theory only has more or less fit, relevance, workability, or modifiability. And the way Brown describes it, I had the impression that GT should be grounded in one specific researcher’s approach:

“I collected all of the data with the exception of 215 participant interviews that were conducted by graduate social-work students working under my direction. In order to ensure inter-rater reliability, I trained all research assistants and I coded and analyzed all of their field notes.”

I’m still a bit confused by Brown’s description here. I didn’t know what inter-rater reliability was, so I had assumed it meant that the study needed to have internal consistency in who was doing the coding. But when I looked it up online, it appears to be the consistency of different researchers to code the same data in the same way. So I’m not sure how having one person do all of the research enables this kind of reliability. Maybe if your GT research is re-done (replicated) by an independent party?

My initial thoughts are that all GT research sound like they should have two authors that work in parallel but independently, with the same data. Each develops separate theories and then at the end, the study can compare the two parallel work streams to identify what both researchers found in common and where they differed. I still have a lot of questions about how this works, though.

Lingering Questions

A lot of my questions are functional. How do you actually DO grounded theory?

  • How does GT coding really work? What does “line-by-line” coding mean? Does it mean you code each sentence or literally each line of written text?
  • Do these ever get compiled in a database? How do you weight data sources by their expertise and quality (if you’re combining studies and interviews with average Joes, do you actively weight the studies)? -> Can you do essentially quantitative analysis on a dataset based on binary coding of concepts and categories?
  • How do you “code” quantitative data? If you had a dataset of 2000 household surveys, would you code each variable for each household as part of your data? How does this functionally work?
  • If you don’t do a literature review ahead of time, couldn’t you end up replicating previous work and not actually end up contributing much to the literature?

And then I also wondered: how is it applicable in my life?

  • Is GT a respected methodology in economics? (I’d guess not.)
  • How could GT enhance quant methods in econ?
  • Has GT been used in economic studies?
  • What kinds of economic questions can GT help us answer?
  • Should I learn more about GT or learn to use it in my own research?

Coming up: Part 2, Grounded Theory & Economics

To answer some of my questions, I want to do an in-depth read of a paper from the 2005 Grounded Theory Review by Frederick S. Lee: “Grounded Theory and Heterodox Economics.” (The journal has another article from 2017 entitled “Rethinking Applied Economics by Classical Grounded Theory: An invitation to collaborate” by Olavur Christiansen that I hope to read, too.)