A new paper by Jakiela and Ozier sounds like an insane amount of data work to classify 4,336 languages by whether they gender nouns. For example, in French, a chair is feminine – la chaise.
They find, across countries:
Gendered language = greater gaps in labor force participation between men and women (11.89 percentage point decline in female labor force participation)
Gendered language = “significantly more regressive gender norms … on the magnitude of one standard deviation”
Within-country findings from Kenya, Niger, Nigeria, and Uganda – countries with sufficient and distinct in-country variation in language type – further show statistically significant lower educational attainment for women who speak a gendered language.
(Disclaimer: The results aren’t causal, as there are too many unobserved variables that could be at play here.)
As the authors say: “individuals should reflect upon the social consequences of their linguistic choices, as the nature of the language we speak shapes the ways we think, and the ways our children will think in the future.”
3ie wroteon June 11 about why you may need a pilot study to improve power calculations:
Low uptake: “Pilot studies help to validate the expected uptake of interventions, and thus enable correct calculation of sample size while demonstrating the viability of the proposed intervention.”
Overly optimistic MDEs: “By groundtruthing the expected effectiveness of an intervention, researchers can both recalculate their sample size requirements and confirm with policymakers the intervention’s potential impact.” It’s also important to know if the MDE is practically meaningful in context.
Underestimated ICCs: “Underestimating one’s ICC may lead to underpowered research, as high ICCs require larger sample sizes to account for the similarity of the research sample clusters.”
The piece has many strengths, including that 3ie calls out one of their own failures on each point. They also share the practical and cost implications of these mistakes.
At work, I might be helping develop an ICC database, so I got a kick out of the authors’ own call for such a tool…
“Of all of the evaluation design problems, an incomplete understanding of ICCs may be the most frustrating. This is a problem that does not have to persist. Instead of relying on assumed ICCs or ICCs for effects that are only tangentially related to the outcomes of interest for the proposed study, current impact evaluation researchers could simply report the ICCs from their research. The more documented ICCs in the literature, the less researchers would need to rely on assumptions or mismatched estimates, and the less likelihood of discovering a study is underpowered because of insufficient sample size.”
…although, if ICCs are rarely reported, I may have my work cut out for me!
I was reading about the new African journal – Scientific African – that will cater specifically to the needs of African scientists. Awesome!
Among the advantages of the new journal is the fact that “publication in Scientific African will cost $200, around half of what it costs in most recognised journals.”
You have to pay to be published in an academic journal? Dang.
I guess that cost is probably built into whatever research grant you’re working on, but in most other publications, I thought writers got paid to contribute content. I guess it’s so that there’s not a direct incentive to publish as much as possible, which could lead to more falsified results? Although it seems like the current model has a lot of messed up incentives, too.
Andrew Gelman’s recent blog post responding to a Berk Özler hypothetical about data collection costs and survey design raised a good point about counterfactuals that I theoretically knew, but was phrased in a way that brought new insight:
“A related point is that interventions are compared to alternative courses of action. What are people currently doing? Maybe whatever they are currently doing is actually more effective than this 5 minute patience training?”
It was the question “What are people currently doing?” that caught my attention. It reminded me that one key input for interpreting results of an RCT is what’s actually going on in your counterfactual. Are they already using some equivalent alternative to your intervention? Are they using a complementary or incompatible alternative? How will the proposed intervention interact with what’s already on the ground – not just how will it interact in a hypothetical model of what’s happening on the ground?
This blogpost called me to critically investigate what quant and qual methods I could use to understand the context more fully in my future research. It also called me to invest in my ability to do comprehensive and thorough literature reviews and look at historical data – both of which could further inform my understanding of the context. And, even better, to always get on the ground and talk to people myself. Ideally, I would always do this in-depth research before signing onto the kind of expensive, large-scale research project Özler and Gelman are considering in the hypothetical.
Academic writing is full of bad habits. For example, using words like “obviously,” “clearly,” or “of course.” If the author’s claim or reasoning really is obvious to you, these words make you feel like you’re in on the secret; you’re part of the club; you’ve been made a part of the “in” group.
But when you don’t know what they’re talking about, the author has alienated you from their work. They offer no explanation of the concept because it seems so simple to them that they simply won’t deign to explain themselves clearly to those not already “in the know.”
Part of an academic’s job is to clearly explain every argument in their papers. It is lazy and exclusionary to imply readers should already understand a concept or a path of reasoning.
At worst, it just makes you sound rude and superior:
He really doubled-down on how evident this fact is, which only tells the reader how smart he thinks he is. The sentence could have read, “Advertising is the preferred modern method of identifying buyers and sellers,” and could have included a citation.
On the other hand, a non-exclusionary use of “obviously”:
“Obviously, rural Ecuador and the United States are likely to differ in a large number of ways, but the results in this (and other recent) papers that show a shifting food Engel curve point to the risks inherent in assuming that the Engel curve is stable.” – Shady & Rosero paper on cash transfers to women
The authors had previously compared two papers from two very different contexts; they use “obviously” to acknowledge the potential issues with comparing these two settings. This is an acceptable use case because the statement that follows actually is obvious and is bringing any reader on board by acknowledging a possible critique of the argument. It is an acknowledgement of possible lack on the author’s part, rather than a test of the reader’s intelligence or prior knowledge.
I recently read Brené Brown’s Daring Greatly. The book presents Brown’s research, but it can feel more like a personal guidebook to tackling issues of vulnerability and shame.
Because the research has a conversational feel, it’s hard to understand how much of the book is based in research and how much in Brown’s individual experiences. She weaves in personal stories frequently, often to demonstrate a prickly emotional experience that was common across her interviews. But when I reached the end of the book, I wanted to know how she drew these theories from the data. I’ve only worked sparingly with qualitative data: how does one “code” qualitative data? How do you analyze it without bringing in all sorts of personal biases? How do you determine its replicability, internal and external validity, and generalizability?
Ingeniously, Brown grounds the book in her research methods with a final chapter on grounded theory methodology. Her summary (also found online here) was a good introduction to how using grounded theory works and feels. But I still didn’t “get” it.
So I did some research.
Brown quotes 20th century Spanish poet Antonio Machado at the top of her research methods page:
“Traveler, there is no path. / The path must be forged as you walk.”
This sentiment imbued the rest of the grounded theory (GT) research I did. Which seemed bizarre to a quant-trained hopeful economist. I’m used to pre-analysis plans, testing carefully theorized models, and starting with a narrow question.
Grounded theory is about big questions and a spirit of letting the data talk to you.
Founded by Barney Glaser and Anselm Strauss in 1967, GT is a general research methodology for approaching any kind of research, whether qual- or quant-focused. When using GT, everything is data – your personal experiences, interviews, mainstream media, etc. Anything you consume can count, as long as you take field notes.
Writing field notes is one of the key steps of GT: coding those notes (or the data themselves – I’m still a little blurry on this) line-by-line is another. The “codes” are recurring themes or ideas that you see emerging from the data. It is a very iterative methodology: you collect initial data, take field notes, code the notes/data, compile them into memos summarizing your thoughts, collect more data based on your first learnings, code those, compile more memos, collect more data…
Throughout the whole process, you are theorizing and trying to find emergent themes and ideas and patterns, and you should actively seek new data based on what your theories are. You take a LOT of written notes – and it sounds like in the Glaserian tradition, you’re supposed to do everything by hand. (Or is it just not using any algorithms?)
Brown describes the data she collected and her coding methodology:
“In addition to the 1,280 participant interviews, I analyzed field notes that I had taken on sensitizing literature, conversations with content experts, and field notes from my meetings with graduate students who conducted participant interviews and assisted with the literature analysis. Additionally, I recorded and coded field notes on the experience of taking approximately 400 master and doctoral social-worker students through my graduate course on shame, vulnerability, and empathy, and training an estimated 15,000 mental health and addiction professionals.
I also coded over 3,500 pieces of secondary data. These include clinical case studies and case notes, letters, and journal pages. In total, I coded approximately 11,000 incidents (phrases and sentences from the original field notes) using the constant comparative method (line- by- line analysis). I did all of this coding manually, as software is not recommended in Glaserian-grounded theory.” [emphasis mine]
The ultimate goal is to have main concepts and categories emerge from the data, “grounded” in the data, that explain what main problem your subjects are experiencing and how they are trying to solve that problem. For example, Brown’s work centers on how people seek connection through vulnerability and try to deal with shame in various health and unhealthy ways. She started with this big idea of connection and just started asking people about what that meant, what issues there were around it, etc. until a theory started to arise from those conversations.
You’re not supposed to have preexisting hypotheses, or even do a literature review to frame specific questions, because that will bias how you approach the data. You’re supposed to remain open and let the data “speak to you.” My first instinct on this front is that it’s impossible to be totally unbiased in how you collect data. Invariably, your personal experience and background determine how you read the data. Which makes me question – how can this research be replicable? How can a “finding” be legitimate as research?
My training thus far has focused on quantitative data, so I’m primed to preference research that follows the traditional scientific method. Hypothesize, collect data, analyze, rehypothesize, repeat. This kind of research is judged on:
Replicability: If someone else followed your protocol, would they get the same result?
Internal validity: How consistent, thorough, and rigorous is the research design?
External validity: Does the learning apply in other similar populations?
Generalizability: Do the results from a sample of the population also apply to the population as a whole?
GT, on the other hand, is judged by:
Fit: How closely do concepts fit the incidents (data points)? (aka how “grounded” is the research in the data?)
Relevance: Does the research deal with the real concerns of participants and is it of non-academic interest?
Workability: Does the developed theory explain how the problem is being solved, accounting for variation?
Modifiability: Can the theory be altered as new relevant data are compared to existing data?
I also read (on Wikipedia, admittedly), that Glaser & Strauss see GT as never “right” or “wrong.” A theory only has more or less fit, relevance, workability, or modifiability. And the way Brown describes it, I had the impression that GT should be grounded in one specific researcher’s approach:
“I collected all of the data with the exception of 215 participant interviews that were conducted by graduate social-work students working under my direction. In order to ensure inter-rater reliability, I trained all research assistants and I coded and analyzed all of their field notes.”
I’m still a bit confused by Brown’s description here. I didn’t know what inter-rater reliability was, so I had assumed it meant that the study needed to have internal consistency in who was doing the coding. But when I looked it up online, it appears to be the consistency of different researchers to code the same data in the same way. So I’m not sure how having one person do all of the research enables this kind of reliability. Maybe if your GT research is re-done (replicated) by an independent party?
My initial thoughts are that all GT research sound like they should have two authors that work in parallel but independently, with the same data. Each develops separate theories and then at the end, the study can compare the two parallel work streams to identify what both researchers found in common and where they differed. I still have a lot of questions about how this works, though.
A lot of my questions are functional. How do you actually DO grounded theory?
How does GT coding really work? What does “line-by-line” coding mean? Does it mean you code each sentence or literally each line of written text?
Do these ever get compiled in a database? How do you weight data sources by their expertise and quality (if you’re combining studies and interviews with average Joes, do you actively weight the studies)? -> Can you do essentially quantitative analysis on a dataset based on binary coding of concepts and categories?
How do you “code” quantitative data? If you had a dataset of 2000 household surveys, would you code each variable for each household as part of your data? How does this functionally work?
If you don’t do a literature review ahead of time, couldn’t you end up replicating previous work and not actually end up contributing much to the literature?
And then I also wondered: how is it applicable in my life?
Is GT a respected methodology in economics? (I’d guess not.)
How could GT enhance quant methods in econ?
Has GT been used in economic studies?
What kinds of economic questions can GT help us answer?
Should I learn more about GT or learn to use it in my own research?