Grounded Theory, Part 1: What is it?

Photo by Calum MacAulay on Unsplash

I recently read Brené Brown’s Daring Greatly. The book presents Brown’s research, but it can feel more like a personal guidebook to tackling issues of vulnerability and shame.

Because the research has a conversational feel, it’s hard to understand how much of the book is based in research and how much in Brown’s individual experiences. She weaves in personal stories frequently, often to demonstrate a prickly emotional experience that was common across her interviews. But when I reached the end of the book, I wanted to know how she drew these theories from the data. I’ve only worked sparingly with qualitative data: how does one “code” qualitative data? How do you analyze it without bringing in all sorts of personal biases? How do you determine its replicability, internal and external validity, and generalizability?

Ingeniously, Brown grounds the book in her research methods with a final chapter on grounded theory methodology. Her summary (also found online here) was a good introduction to how using grounded theory works and feels. But I still didn’t “get” it.

So I did some research.

Grounded Theory

Brown quotes 20th century Spanish poet Antonio Machado at the top of her research methods page:

“Traveler, there is no path. / The path must be forged as you walk.”

This sentiment imbued the rest of the grounded theory (GT) research I did. Which seemed bizarre to a quant-trained hopeful economist. I’m used to pre-analysis plans, testing carefully theorized models, and starting with a narrow question.

Grounded theory is about big questions and a spirit of letting the data talk to you.

Founded by Barney Glaser and Anselm Strauss in 1967, GT is a general research methodology for approaching any kind of research, whether qual- or quant-focused. When using GT, everything is data – your personal experiences, interviews, mainstream media, etc. Anything you consume can count, as long as you take field notes.

Writing field notes is one of the key steps of GT: coding those notes (or the data themselves – I’m still a little blurry on this) line-by-line is another. The “codes” are recurring themes or ideas that you see emerging from the data. It is a very iterative methodology: you collect initial data, take field notes, code the notes/data, compile them into memos summarizing your thoughts, collect more data based on your first learnings, code those, compile more memos, collect more data…

Throughout the whole process, you are theorizing and trying to find emergent themes and ideas and patterns, and you should actively seek new data based on what your theories are. You take a LOT of written notes – and it sounds like in the Glaserian tradition, you’re supposed to do everything by hand. (Or is it just not using any algorithms?)

Brown describes the data she collected and her coding methodology:

“In addition to the 1,280 participant interviews, I analyzed field notes that I had taken on sensitizing literature, conversations with content experts, and field notes from my meetings with graduate students who conducted participant interviews and assisted with the literature analysis. Additionally, I recorded and coded field notes on the experience of taking approximately 400 master and doctoral social-worker students through my graduate course on shame, vulnerability, and empathy, and training an estimated 15,000 mental health and addiction professionals.

I also coded over 3,500 pieces of secondary data. These include clinical case studies and case notes, letters, and journal pages. In total, I coded approximately 11,000 incidents (phrases and sentences from the original field notes) using the constant comparative method (line- by- line analysis). I did all of this coding manually, as software is not recommended in Glaserian-grounded theory.” [emphasis mine]

The ultimate goal is to have main concepts and categories emerge from the data, “grounded” in the data, that explain what main problem your subjects are experiencing and how they are trying to solve that problem. For example, Brown’s work centers on how people seek connection through vulnerability and try to deal with shame in various health and unhealthy ways. She started with this big idea of connection and just started asking people about what that meant, what issues there were around it, etc. until a theory started to arise from those conversations.

You’re not supposed to have preexisting hypotheses, or even do a literature review to frame specific questions, because that will bias how you approach the data. You’re supposed to remain open and let the data “speak to you.” My first instinct on this front is that it’s impossible to be totally unbiased in how you collect data. Invariably, your personal experience and background determine how you read the data. Which makes me question – how can this research be replicable? How can a “finding” be legitimate as research?

My training thus far has focused on quantitative data, so I’m primed to preference research that follows the traditional scientific method. Hypothesize, collect data, analyze, rehypothesize, repeat. This kind of research is judged on:

  • Replicability: If someone else followed your protocol, would they get the same result?
  • Internal validity: How consistent, thorough, and rigorous is the research design?
  • External validity: Does the learning apply in other similar populations?
  • Generalizability: Do the results from a sample of the population also apply to the population as a whole?

GT, on the other hand, is judged by:

  • Fit: How closely do concepts fit the incidents (data points)? (aka how “grounded” is the research in the data?)
  • Relevance: Does the research deal with the real concerns of participants and is it of non-academic interest?
  • Workability: Does the developed theory explain how the problem is being solved, accounting for variation?
  • Modifiability: Can the theory be altered as new relevant data are compared to existing data?

I also read (on Wikipedia, admittedly), that Glaser & Strauss see GT as never “right” or “wrong.” A theory only has more or less fit, relevance, workability, or modifiability. And the way Brown describes it, I had the impression that GT should be grounded in one specific researcher’s approach:

“I collected all of the data with the exception of 215 participant interviews that were conducted by graduate social-work students working under my direction. In order to ensure inter-rater reliability, I trained all research assistants and I coded and analyzed all of their field notes.”

I’m still a bit confused by Brown’s description here. I didn’t know what inter-rater reliability was, so I had assumed it meant that the study needed to have internal consistency in who was doing the coding. But when I looked it up online, it appears to be the consistency of different researchers to code the same data in the same way. So I’m not sure how having one person do all of the research enables this kind of reliability. Maybe if your GT research is re-done (replicated) by an independent party?

My initial thoughts are that all GT research sound like they should have two authors that work in parallel but independently, with the same data. Each develops separate theories and then at the end, the study can compare the two parallel work streams to identify what both researchers found in common and where they differed. I still have a lot of questions about how this works, though.

Lingering Questions

A lot of my questions are functional. How do you actually DO grounded theory?

  • How does GT coding really work? What does “line-by-line” coding mean? Does it mean you code each sentence or literally each line of written text?
  • Do these ever get compiled in a database? How do you weight data sources by their expertise and quality (if you’re combining studies and interviews with average Joes, do you actively weight the studies)? -> Can you do essentially quantitative analysis on a dataset based on binary coding of concepts and categories?
  • How do you “code” quantitative data? If you had a dataset of 2000 household surveys, would you code each variable for each household as part of your data? How does this functionally work?
  • If you don’t do a literature review ahead of time, couldn’t you end up replicating previous work and not actually end up contributing much to the literature?

And then I also wondered: how is it applicable in my life?

  • Is GT a respected methodology in economics? (I’d guess not.)
  • How could GT enhance quant methods in econ?
  • Has GT been used in economic studies?
  • What kinds of economic questions can GT help us answer?
  • Should I learn more about GT or learn to use it in my own research?

Coming up: Part 2, Grounded Theory & Economics

To answer some of my questions, I want to do an in-depth read of a paper from the 2005 Grounded Theory Review by Frederick S. Lee: “Grounded Theory and Heterodox Economics.” (The journal has another article from 2017 entitled “Rethinking Applied Economics by Classical Grounded Theory: An invitation to collaborate” by Olavur Christiansen that I hope to read, too.)

Are we murderers for not donating our organs? [repost]

Zell Kravinsky risked his life to donate his healthy kidney to a complete stranger. Would you do the same?

Kravinsky is a radical altruist. He believes in giving away as much as possible to others, including his nearly $45 million fortune and his own body parts. Most people would consider donating a kidney as going above and beyond, but Kravinsky told the New Yorker in 2004 that he considers anyone who doesn’t donate their extra kidney a murderer.

We probably don’t, as individuals, have a moral responsibility to donate our organs, but maybe we do have a societal responsibility to find a system by which we can match kidney donors and recipients so that no one has to die just because there isn’t a transplant available. In 2012, there were 95,000 Americans on the wait list for a life-saving kidney, according to economists Gary Becker and Julio Elias. The average wait time for a kidney in 2012 was over four years.

Becker and Elias are proponents of creating a formal, legal market for organs to eliminate long wait times and better match recipients with donors. Right now, it is illegal to sell your organs in most of the world, including in the U.S.

The main risks of monetary compensation for organ donations are the coercion of unwilling donors, the potentially unequal distribution of donors — poor people would be more likely to become donors, and the moral question of whether or not it is okay to sell body parts, even if they are our own.

Purely moral arguments aside for a moment, there are ways to alleviate the risks of a market for organs. Waiting periods between registration and donation, psychiatric evaluation ahead of registration as an organ donor, and strict identification requirements or even background checks can all combat coercion in the market for organs, while saving the lives of the many Americans who die on an organ waitlist. Becker and Elias also point to the fact that people in lower income brackets are disproportionately affected by long waitlists: the wealthy can fly abroad to obtain a healthy organ or manipulate the current waitlist system in their favor, while poorer Americans face longer wait times. While donors may be disproportionately poor, which raises concerns of implicit economic coercion, the lower income brackets also benefit disproportionately from the policy.

Even more powerful than a legal market alone would be a combination of a legal market for organs and an implied consent law, which would mean people would have to opt out of being an organ donor, rather than the U.S. standard of opting into being a donor. A 2006 study by economists Alberto Abadie and Sebastien Gay found that implied consent laws have a positive impact on organ donations. Under a combination of these two initiatives, essentially all organ donor needs might be met, and a person’s will might come to include provisions for their organs to be harvested and family members to be compensated.

While Kravinsky donated his kidney for free, he once offered a journalist $10,000 to donate a kidney to a stranger, according to Philadelphia Magazine. But the journalist backed out of the deal he struck with Kravinsky after his wife and friends convinced him not to go through with it. They convinced him that the risk of surgery, though relatively minor, was not worth saving a life. But if a safe, legal market for organ sales is established, perhaps the establishment of a market price for organ donation and a normalization of the procedure will allow Americans to save lives and make money, without requiring Kravinsky’s extreme, and perhaps aggressive, sort of altruism.

Originally written for my Economics of Sin senior seminar, spring 2017; previously published at the Unofficial Economist on Medium.

Is my job moral? [repost]

If I continue on my current career path, I may end up arbitrating who lives and who dies. (And maybe I’ll tell their story in an economics journal and make a living doing so.)

I am planning on pursuing a career in development work, specifically in the evaluation of development programs. The “gold standard” for evaluating programs is a Randomized Control Trial (RCT).

Consider a non-profit distributing books to children with the goal of improving literacy. The non-profit wants to know whether their books really have any impact on children’s literacy. Ideally, they could look at what happens when they give a group of children the books and also what happens when they don’t give the same children books.

However, due to thus far unchangeable time-space continuum properties, this isn’t possible. So, in order to confidently say that their books had an impact, the non-profit needs to compare the literacy scores of children who received the books with other very similar children who didn’t get books. Let’s say they hire me to run an RCT for this very purpose.

To determine which children will get the books (the treatment group) and which children will serve as the comparison group (the control group), I take a list of 100 schools and randomly assign half of them to receive the extra books program. After the books are distributed and some time has passed, I go back to the schools and I have all the children take literacy tests. I compare the test scores of children in each group, and find that, on average, children who received books did much better on the literacy tests.

The non-profit is very happy and uses the results to convince more people to donate to their program. Now they can give books to many more children, and presumably those children’s literacy scores will also increase.

This is all good and well. Even if some children in the study were chosen not to receive books, there are several commonly accepted justifications for why we studied them without providing a service:

  • The non-profit did not have enough money to give books to all the schools anyway. Randomly determining which schools received the books makes it as fair as possible.
  • While the books program was unlikely to have negative effects on children, we didn’t know if it would have no effect or a positive effect at the start. So we didn’t know if we were really depriving children of a chance to improve their literacy.
  • Being able to conduct the evaluation could inform policy and global knowledge on effective ways to improve literacy, and could improve decision-making at the non-profit.
  • In this case, maybe the control group children were the first to receive books when the non-profit’s funding increased.

These are common justifications for development evaluations. They seem quite reasonable — randomly giving out benefits might be the fairest option, we don’t know what the effect really is, and the study will contribute to our shared knowledge and lead to better decisions and even better outcomes in the future.

What if, instead of working on literacy, the non-profit wanted to reduce deaths from childbirth by improving access to and use of health facilities by pregnant women?

Suddenly, so much more is at stake.

If I randomly assign half a county to have access to a special taxi service that drives pregnant women to hospitals for safer deliveries, and one of the women who was assigned NOT to receive the taxi service dies because she gave birth at home, is the evaluation immoral? Am I morally culpable for her death?

Because I work with numbers and data, it is easy to separate myself from the potential negative consequences of the work. I didn’t choose her to die — the random number generator made me do it. 

Photo by Markus Spiske on Unsplash

So what if we’re in a situation where a randomized control trial seems immoral? How can we still learn about what works and what doesn’t?

There are other evaluation methods that can give us an idea of what programs work and which don’t. For example, quasi-experimental methods look at situations where comparable control and treatment groups are incidentally defined by the implementation of a policy. Then we can compare two groups without having to be responsible for directly assigning some people to receive a program while others go without.

Qualitative or other non-experimental methods involve gathering data by talking to people, doing research, and meeting with different groups to get various opinions on what’s happening. These methods can also help paint a picture of whether a program is having a positive effect.

But the RCT is the gold standard for a reason. A well-designed RCT can tell us what the effect of a program is with much higher confidence and precision than other methods.

UNICEF Social Policy Specialist Tia Palermo recently wrote a post titled “Are Randomized Control Trials Bad for Children?” for UNICEF’s Evidence for Action blog. She makes a powerful point to consider: What are the alternatives to running RCTs? Are they better or worse?

Palermo sees the alternative as worse: “Is it ethical to pour donor money into projects when we don’t know if they work? Is it ethical not to learn from the experience of beneficiaries about the impacts of a program?” she asks.

Her most convincing argument is that there are ethical implications every research method we might choose:

“A non-credible or non-rigorous evaluation is a problem because underestimating program impacts might mean that we conclude a program or policy doesn’t work when it really does (with ethical implications). Funding might be withdrawn and an effective program is cut off. Or we might overestimate program impacts and conclude that a program is more successful than it really is (also with ethical implications). Resources might be allocated to this program over another program that actually works, or works better.”

And there are ethical implications to not evaluating programs at all. If non-profits aren’t held to any standard and don’t measure the effect of their program at all, there’s no way to tell which interventions and which non-profits are helping, having no effect on, or even harming the program recipients.

In the case of the woman who died because she didn’t get to a health facility, if the study had never taken place, would she have gotten to a health facility or not? It is impossible to know what would have happened, but it’s not impossible to minimize the risk of harm and maximize the benefits to all study participants. 

Photo by Anes Sabitovic on Unsplash

Ultimately, RCTs generate important evidence when they are well executed. The findings from such studies can be used to make better decisions at non-profits, at big donor foundations like the Gates Foundation or GiveWell, and at government agencies. All of which can lead to more lives saved, which is the ultimate goal.

So what to do about the ethical implications of randomly determining who gets access to a potentially life-saving program? Or any program that could have a positive impact on people’s lives?

There are a variety of measures in place to ensure ethical conduct in research and many more ~official~ economists are thinking about these ideas.

The 1979 Belmont Report in helped establish criteria for ethics in human research, focusing on respect for people’s right to make decisions freely, maximizing benefits and doing no harm, and fairness in who bears any risks or benefits. Institutional Review Boards (IRBs) are governing bodies that ensure these principles are being upheld for all research.

Economists Rachel Glennerster Shawn Powers wrote a highly recommended piece on these ethical considerations, “Balancing Risk and Benefit: Ethical Tradeoffs in Running Randomized Evaluations,” which I’m currently reading.

Yet persistent concerns about how to run ethical evaluations suggest that there is more work to do.

Taking the time to consider the ethical implications of each project is key. And I think there is more room for evaluators to read deeply on the subject and really dig into how to make evaluations more just and more beneficial to even those in the control group who don’t receive the program.

A driving principle, especially for researchers running RCTs in the development field, could be that an evaluation must have a direct positive impact on all study participants, either during the study or immediately following its completion. There are a variety of ways, some more commonly used than others, that researchers can apply this principle:

  • If we truly don’t know whether the effect of the program is positive or negative, we can make plans to provide the program to control households if it is found to have a positive effect.
  • If we suspect the program has a positive effect, the control group can be offered the program immediately after the study period has ended.
  • We can offer everyone in the study a base service, while the study tests the effectiveness of an additional service provided only to the treatment group. This way, everyone who is contributing time and information to the study receives some benefit in return.
  • Extensive piloting (testing different ideas and aspects of the evaluation before the start of the study) can also reveal potential moral dilemmas to evaluating any particular program.
  • Community interest meetings can be held before the study is implemented to gain community-level consent to participate in the study. These meetings could also be held quite early on to inform research designs and improve the quality of the study results. For example, in some cultures, it is not appropriate for a man to be alone with a woman he is not related to. If this is the case in a study area, then hiring male staff to conduct surveys would lead to a less successful study.
  • Local staff can be hired to conduct any surveys or data collection to ensure that the surveys are culturally appropriate.
  • We always obtain full and knowledgable consent from participants, which may require translating surveys into participants’ native language.
  • If study participation requires much time or effort from control group individuals, they can be appropriately compensated.
  • All reports on evaluations (RCTs and other designs) can be fully transparent about research decisions and how ethical concerns were addressed. This will contribute to the international research community’s combined knowledge of how to ensure the rights of participants are provided for in RCTs and other research.
  • The learnings from the study can also be shared with the participating community and should add to their knowledge about their own lives; contributing to the abstract “international research community” is not enough.

Enacting these measures requires more of researchers: some have the potential to affect the legitimacy of the evaluation results if they are not properly accounted for in analysis. But a strong sense of ethics and a dedication to the population being served (often low-income individuals from the Global South, contrasted with well-off researchers from the West) demand that we take the extra time in our research to consider all ethical implications.

Originally published on my Unofficial Economist Medium publication, November 4, 2017.