Do I have to wait for tenure?

Yesterday, I attended the ASSA panel discussion, “How Can Economics Solve Its Race Problem?” with my colleague Soala Ekine.

I was deeply impressed with the leadership by example of the panel members in being open, vulnerable, and deeply conscientious in their discussion.

Throughout the event, the panel called on tenured professors and leaders in the profession (journal editors, leaders in the various professional organizations) to take responsibility for actively questioning and changing the racist, colonialist, and elitist culture in economics.

During audience questions, I asked the panel:

I know someone who is renouncing the label of economist and calling themselves a social scientist because of the culture in the profession. Despite warning signs, many of us still want to enter the profession – in addition to the advice for how leaders can drive change, what advice do you have for those of us coming up in the profession to also be drivers of change throughout our careers?

I think a more precise version of what I was trying to ask is, How and to what extent can non-tenured professors, grad students, and even RAs contribute to cultural changes without sacrificing longterm success in the profession?

Underlying even that question is: Assuming there would be backlash to being outspoken on these issues, would it still be worth it to be very outspoken? In order to establish what you believe is right and wrong, unequivocally, and unapologetically, directly try to make those changes? Or is your overall impact on the profession greater if you use more subtle and incremental techniques to try to make changes over a longer period of time?

There’s an empirical question that needs to be answered in all of this: To what extent does activism within the profession detract from your lifetime effectiveness as a researcher? And is it worth it anyways? (I lean yes, do it anyways, but maybe there’s certain better techniques?)

I’m imagining a situation where the leadership is NOT making changes. The expectation for young researchers is deference to seniors and adaptation to the toxic culture. It can be risky to speak against norms set by the people who you are going to rely on for recommendations in order to take the next step in this career, especially for non-white, non-hetero, non-male researchers.

If all goes perfectly according to plan, I will have tenure at a fancy research university in about 13-15 years. That is a long time to wait to be “allowed” to contribute to culture changes, or to wait to act in order to avoid backlash that could injure my career.

For those of us who are sticking with economics despite the many warnings signs (bullying, racism, sexism, colonialism, mental health challenges), who love the work, who want to be part of a change in these problems with the culture in the profession, there are non-trivial tactical questions of how to be change agents without decreasing our overall impact during our careers.

Unfortunately, we ran out of time for the panel to fully address any specific advice on that front.

I guess my hypothesis is that – assuming that it would indeed be really bad to be too “controversial” or outspoken too early in your career (I’d like to think that the warnings against causing too much trouble before tenure are overblown, but it does seem like there is a strong consensus that speaking up too early will damage your career prospects, thus reducing your overall career effectiveness) – the best way for young researchers to effect change will be through peer relationships, and how they interact with those coming up behind them.

Advocating for each other (especially for peers of color or who are not men), social support and encouragement, and a more collaborative mindset overall are some ways that younger researchers can reinforce better social norms in the profession. Also, we can resist the pervasive idea that the best researchers are the ones who minimize all service work to spend the maximum time possible on research. Service work is important and those willing to do that work can be the ones who set departmental norms to combat discrimination in the profession.

I think another path can be taking the role of student seriously by asking a lot of questions about how things are done, uncovering unspoken rules, asking for greater transparency in how the status quo operates, and trying to find the data to answer these questions. A lot of the panelists are doing amazing work on this front already, and we can all continue this work.

Every generation that we can improve the culture creates more room for the next generation to make progress, too.

This post is missing something else: How do these paths to making change look different for young economists of color? I’m a woman, so I’m seeing through that lens, but my small amount of Indian heritage doesn’t mean much in terms of understanding how the rules (and the urgency of these challenges) are different for economists of color in the U.S. In particular, grappling with these issues is not as much of a choice for economists of color as it is for white economists. Plus, as my colleague Soala pointed out during the Q&A, the rules for success also change for international students of color.

I also feel a bit yucky about the whole question because I don’t think the question of how and whether to speak up against discrimination should depend on the long-term career effects of speaking up. That is likely a case of over-optimization (a classic economist problem), at the expense of living up to my values.

My overall feeling after the session is in line with how many of the panelists wrapped up the discussion: I am cautiously optimistic. About the profession overall and about my own ability to effectively change the culture as I learn more and try to practice what I’m preaching here.

The panel included Randall Akee (UCLA), Cecilia Conrad (Pomona), Trevon Logan (The Ohio State University), Edward Miguel (UC-Berkeley), Marie T. Mora (U Missouri-St. Louis), and Ebonya Washington (Yale). It was chaired/introduced by Janet Yellen. 

Easterly: Academic publishing standards constrain work on important policy questions

220px-William_Easterly_by_Jerry_Bauer“Academic standards are leading us to concentrate on the less important policies. The worst case scenario is development economists risk becoming irrelevant because [they] concentrate on small issues that policy makers don’t think are important.” – Bill Easterly

 

Summary of interview w/ VoxDev

Trends:

  • Growth response to globalization policies (esp. in Africa, e.g.)
  • BUT Intellectual backlash against globalization

Doubts are legitimate – it’s hard to measure causality of these macro dynamics.

But do have persuasive correlations (high rates of inflation are strongly negatively correlated with growth rates). Linked to worse welfare. But rigorous causality determination is difficult – can’t rule out third factor or reverse causality.

Academics are reluctant to study inflation and growth and globalization; we need to present non-causal correlations if that’s the best we can do. It’s a responsibility of the economist to look at these as honestly as possible, even if isn’t most rigorous type of evidence. It’s what we’ve got!

Not enough research on big-picture policies.

Zimbabwe is relapsing into high inflation – will be very destructive and needs to be studied. Venezuela another example of poor policies around inflation. Not many policy makers / academic economists will think of these as good policies, but they used to be very common in S. America and Africa in 1970s-90s.

Incentives in economic publishing – prioritize rigorous resolution through causality. Young economists: by all means, stick with this! Tenured professors: stylized facts are also useful pieces of evidence. Need to work on these big, non-causal issues, too.

Evidence on small scale programs are less relevant for policy-makers.

Good model of what we should do more of: Acemoglu’s work. Easterly’s own research.

Paradox: development economists really want to talk about these big pictures, but very challenging to publish any research on them b/c of huge prioritization of rigor.

Wish the “brightest young minds in our field” have to do RCTs instead of look at the big questions.

IMF / World Bank are more interested in policy practicalities so they’re not as biased as journals.

(Small) RCTs useful for NGOs / specific aid agency programs.

Governments want institutional reforms or macro policy changes.

 

 

 

 

Extensive vs. Intensive Margins

Finally looked up intensive vs. extensive margins, which I’ve been able to decipher based on context, but for which I haven’t known the formal definitions.

Extensive margin: how many units/resources are used at the margin

Intensive margin: how “hard” (or “intensely”) resources are used at the margin (“how much per unit”)

For example, a policy change to increase teacher salary and set a portion of that increase as pay-for-performance might work a) at the extensive margin to increase the number of teachers (likely in the long-run, as teaching is viewed as a more attractive profession), and b) at the intensive margin, by increasing the effort put in by existing professors to achieve high performance (in this case, more of a short-term effect).

ILR CH1

India’s Long Road by Vijay Joshi is the start of my long road to understanding the Indian economy.

To follow are books on Indian history, the jobs crisis, and public institutions, as well as podcasts from Prof. Muralidharan’s course The Indian Economy. In his course, Prof. Muralidharan recommends noting three things from each class or reading:

  1. Something you learned
  2. Something you have a question about
  3. Something you are curious about

Presumably, I then follow up on the question and follow the thread of my curiosity.

So, India’s Long Road, Chapter 1 – India at the Cusp:

  1. Joshi is defining high quality growth as a) inclusive and b) environmentally friendly. Hooray on both accounts! Also:
    • Adumbrate – to report or represent in an outline
    • venality – openness to bribery or overly motivated by money
  2. What IS the mean world per capita growth rate? Answer: Around 1-3%
  3. Curious to see the practical sides of Joshi’s argument – what are the “radical reform model” components he recommends?

Dev links: Migration & Replication

Migration

No short-term effect of foreign aid on refugee flows

Overview: “We estimate the causal effects of a country’s aid receipts on both total refugee flows to the world and flows to donor countries.”

Data: “Refugee data on 141 origin countries over the 1976–2013 period [combined] with bilateral Official Development Assistance data”

Identification strategy: “The interaction of donor-government fractionalization and a recipient country’s probability of receiving aid provides a powerful and excludable instrumental variable (IV) when we control for country- and time-fixed effects that capture the levels of the interacted variables.”

Findings: “We find no evidence that aid reduces worldwide refugee outflows or flows to donor countries in the short term. However, we observe long-run effects after four three-year periods, which appear to be driven by lagged positive effects of aid on growth.”

Authors: Dreher, Fuchs, & Langlotz

Living abroad doesn’t change individual “commitment to development”

Overview: “Temporary migration to developing countries might play a role in generating individual commitment to development”

Data: “unique survey [of Mormon missionaries] gathered on Facebook”

Identification strategy: “A natural experiment – the assignment of Mormon missionaries to two-year missions in different world regions”

Findings: “Those assigned to the treatment region (Africa, Asia, Latin America) report greater interest in global development and poverty, but no difference in support for government aid or higher immigration, and no difference in personal international donations, volunteering, or other involvement.” (controlling for relevant vars)

Author: Crawfurd

Replication

Lessons from 3ie replications of development impact evaluations

Overview: “focus is internal replication, which uses the original data from a study to address the same question as that study”

Findings: “In all cases the pure replication components of these studies are generally able to reproduce the results published in the original article. Most of the measurement and estimation analyses confirm the robustness of the original articles or call into question just a subset of the original findings.” + some advice info on how to better translate study findings into policy

Authors: Brown & Wood

Practical advice for conducting quality replications 

Overview: The same authors share practical advice address the challenge “to design a replication plan open to both supporting the original findings and uncovering potential problems.”

Contribution:

1. Tips for diagnostic replication exercises in four groups: validity of assumptions, data transformations, estimation methods, and heterogeneous impacts, plus examples and other resources

2. List of don’ts for how to conduct and report replication research

 

 

Building State Capacity: Evidence from Biometric Smartcards in India

Preface: I always say I want to read more papers & summarize them. That can seem like an overwhelmingly massive undertaking. But I am forging ahead! This is the first step of what I hope to be a regular habit of reading and summarizing papers. “Building State Capacity” raised a lot of interesting points – it’s the first paper I’ve read in a while. As I refamiliarize myself with academic writing and various development econ concepts, I hope to become increasingly concise.


Summary

Program: Use of biometric identification system to administer benefits from two large welfare programs

Where: Andhra Pradesh, India

When: 2010 (baseline) – 2012 (endline)

Sample: 157 sub-districts, 19 million people

Identification strategy: RCT

Findings

  1. Payment collection became faster and more predictable
  2. Large reductions in leakage (fraud/corruption)
  3. Increase in program access: Reduction in gov’t officials claiming benefits in others’ names
  4. Little heterogeneity of results: No differences based on village or poverty/vulnerability of HH
  5. Strength of results: “Treatment distributions first-order stochastically dominate control distributions,” which means that “no treatment household was worse off relative to the control household at the same percentile of the outcome distribution”
  6. Drivers of impact? (non-experimental decomposition)
    • For payment process improvement: changed organization responsible for managing fund flow and payments
    • For decrease in fraud: biometric authentication
  7. Cost effective, for state and beneficiaries

Methodology details

Surveys: Baseline and endline household surveys (2 years between)

Randomization: Graduated rollout over 2 years. Treatment subdistricts were first wave, then buffer subdistricts (during survey time), then finally the control subdistricts (note: subdistricts = “mandals” in India)

Stratification: By district and a principal component of socioeconomic characteristics

Analysis: Intent-to-treat (ITT): “estimates the average return to as-is implementation following the ‘intent’ to implement the new system”

“Up-take”: 50% of payments transferred to electronic in 2 years

Main controls: district FEs, “the first principal component of a vector of mandal characteristics used to stratify,” baseline outcome levels where possible

Standard errors: clustered at mandal level (Lowest level of stratification)

Robustness checks:

  1. No differential misreporting: not driving results either due to collusion btwn officials and respondents or to inadvertent recall problems
  2. No spillovers: no evidence of either strategic spillovers (officials diverting funds to control mandals they can more easily steal from) or spatial spillovers (from neighbor gram panchayats – village counsels)
  3. No effects of survey timing relative to payment time
  4. No Hawthorne effects

Thoughts & Questions

  1. “Evaluated at full scale by government”: This minimizes risks around external validity that are often an issue for studies on NGO-operated programs at a smaller scale. Vivalt (2019) found that programs implemented by governments had smaller effect sizes than NGO/academic implemented programs, controlling for sample size; Muralidharan and Niehaus (2017) and others have discussed how results of small pilot RCTs often do not scale to larger populations.
  2. Love that they remind you of ITT definition in the text – makes it more readable. Also that they justify why ITT is the policy-relevant parameter (“are net of all the logistical and political economy challenges that accompany such a project in practice”)
  3. Again, authors define “first-order stochastically dominate” in the text, which I was wondering about from the abstract. Generally, well-written and easy to understand after a while not reading academic papers all the time!
  4. What does “non-experimental decomposition” mean? (This is describing how the authors identified drivers of treatment effects)
  5. Is it particularly strong evidence that treatment distribution was first-order stochastically dominant over control distribution? How do we interpret this statistically? Logically, if treatment was better for all HHs, relative to the closest comparison HH, that’s a good sign. But what if your results were not stat sig but WERE first-order stochastically dominant? What would that mean for interpreting the results?
  6. What is the difference between uptake and compliance? Uptake = whether treatment HHs take up the intervention/treatment. Compliance = whether the HH complies with its assigned status in the experimental design (applies to both treat and control households). Is that right?
  7. What does “first stage” mean? In this paper, it seems to be asking, How did treat and control units comply with the evaluation design, and what is the % uptake? (Basically, did randomization meaningfully work?) Is this always what first stage means for RCTs? How does its meaning differ for other identification strategies?
  8. Reminder: Hawthorne effects = when awareness of being observed alters study participant behavior
  9. What does “principal component” mean? Is it like an index?
  10. Authors note that the political case for investment in capacity depend on a) magnitude and b) immediacy of returns -> Does that mean policy makers are consistently biased toward policies w/ short-term pay-offs? (If yes, would expect there to be a drop-off for policies that have pay-offs on longer timeline than election cycle… or maybe policy just not data driven enough to see that effect?) Also, would this lead to fewer studies on long-term effects of programs/interventions with strong short-term payoffs because little policy appetite for longterm results?
  11. Challenges of working in policy space: program was almost ended b/c of negative feedback from local leaders (whose rents were being decreased!), but evidence from study, including positive beneficiary feedback, helped state gov’t stay the course! Crazy!
  12. Reference to “classic political economy problem of how concentrated costs and diffuse benefits may prevent the adoption of social-welfare improving reforms” In future, look up reference: Olson 1965
  13. Type I and II errors being referenced in a new (to me) way: Type I as exclusion, Type II as inclusion errors … I know these errors in statistical terms as Type I = false positive (reject true null) and Type II = false negative (fail to reject false null). In the line following the initial reference, the authors seem to refer to exclusion errors as exclusion of intended recipients, so not sure if these are different types of errors or I’m not understanding yet. To be explored further in future.

Muralidharan, K., Niehaus, P., & Sukhtankar, S. (2016). Building state capacity: Evidence from biometric smartcards in India. American Economic Review106(10), 2895-2929.

Weekly Development Links #8

Brought to you by #NEUDC2018! Check out mini summaries of the many awesome papers featured at this conference here,  and download papers here. These are three that really struck me.

1. Psychological trainings increase chlorination rates
Haushofer, John, and Orkin 2018: (RCT in Kenya) “One group received a two-session executive function intervention that aimed to improve planning and execution of plans; a second received a two-session time preference intervention aimed at reducing present bias and impatience. A third group receives only information about the benefits of chlorination, and a pure control group received no intervention.” Executive function and time preference trainings led to stat sig increases in chlorination and stat sig decreases in diarrhea rates.

2. Conditional cash transfers reduce suicides!
Christian, Hensel, and Roth 2018: (RCT in Indonesia) This paper is so cool! One mechanism is by mitigating the negative impact of bad agricultural shocks and decreasing depression. “We examine how income shocks affect the suicide rate in Indonesia. We use both a randomized conditional cash transfer experiment, and a difference-in-differences approach exploiting the cash transfer’s nation-wide roll-out. We find that the cash transfer reduced yearly suicides by 0.36 per 100,000 people, corresponding to an 18 percent decrease. Agricultural productivity shocks also causally affect suicide rates. Moreover, the cash transfer program reduces the causal impact of the agricultural productivity shocks, suggesting an important role for policy interventions. Finally, we provide evidence for a psychological mechanism by showing that agricultural productivity shocks affect depression.”

3. Women police stations increased reporting of crimes against women
Amaral, Bhalotra, and Prakash 2018: (in India) “Using an identification strategy that exploits the staggered implementation of women police stations across cities and nationally representative data on various measures of crime and deterrence, we find that the opening of police stations increased reported crime against women by 22 percent. This is due to increases in reports of female kidnappings and domestic violence. In contrast, reports of gender specific mortality, self-reported intimate-partner violence and other non-gender specific crimes remain unchanged.”

BONUS: Amazing 3-D map of world populations
(The Pudding has so many other really interesting and informative graphics, too!)

Weekly Development Links #7

1. 11 years later: Experimental evidence on scaling up education reforms in Kenya (TL;DR gov’t didn’t adopt well)

(This paper was published in Journal of Public Econ 11 years after the project started and 5 years after the first submission!) “New teachers offered a fixed-term contract by an international NGO significantly raised student test scores, while teachers offered identical contracts by the Kenyan government produced zero impact. Observable differences in teacher characteristics explain little of this gap. Instead, data suggests that bureaucratic and political opposition to the contract reform led to implementation delays and a differential interpretation of identical contract terms. Additionally, contract features that produced larger learning gains in both the NGO and government treatment arms were not adopted by the government outside of the experimental sample.”

2. Argument for reporting the “total causal effect”

  • Total causal effect (TCE) = weighted average of the intent to treat effect (ITT) and the spillover effect on the non-treated (SNT)
  • Importance: “RCTs that fail to account for spillovers can produce biased estimates of intention-to-treat effects, while finding meaningful treatment effects but failing to observe deleterious spillovers can lead to misconstrued policy conclusions. Therefore, reporting the TCE is as important as the ITT, if not more important in many cases: if the program caused a bunch of people to escape poverty while others to fall into it, leaving the overall poverty rate unchanged (TCE=0), you’d have to argue much harder to convince your audience that your program is a success because the ITT is large and positive.”
  • Context: Zeitlin and McIntosh recent paper comparing cash and a USAID health + nutrition program in Rwanda. From their blog post: “In our own work the point estimates on village-level impacts are consistent with negative spillovers of the large transfer on some outcomes (they are also consistent with Gikuriro’s village-level health and nutrition trainings having improved health knowledge in the overall population). Cash may look less good as one thinks of welfare impacts on a more broadly defined population. Donors weighing cash-vs-kind decisions will need to decide how much weight to put on non-targeted populations, and to consider the accumulated evidence on external consequences.”

3. Why don’t people work less when you give them cash?

Excellent post by authors of new paper on VoxDev, listing many different mechanisms and also looks at how this changes by type of transfer (e.g. gov’t conditional and unconditional, remittances, etc.)

BONUS: More gender equality = greater differences in preferences on values like altruism, patience or trust (ft. interesting map)

Falk & Hermle 2018

Causal Inference: The Mixtape

“Identifying causal effects involves assumptions, but it also requires a particular kind of belief about the work of scientists. Credible and valuable research requires that we believe that it is more important to do our work correctly than to try and achieve a certain outcome (e.g., confirmation bias, statistical significance, stars). The foundations of scientific knowledge are scientific methodologies. Science does not collect evidence in order to prove what we want to be true or what people want others to believe. That is a form of propaganda, not science. Rather, scientific methodologies are devices for forming a particular kind of belief. Scientific methodologies allow us to accept unexpected, and sometimes, undesirable answers. They are process oriented, not outcome oriented. And without these values, causal methodologies are also not credible.”

Causal Inference: The Mixtape by Scott Cunningham, associate professor of economics at Baylor University (oh and there’s an accompanying Spotify playlist)

Weekly Development Links #4 – #6

Dev links coming to you weekly from now on!

Week #6: Oct 17

1. Cash transfers increase trust in local gov’t

“How does a locally-managed conditional cash transfer program impact trust in government?”

  • Cash transfers increased trust in leaders and perceptions of leaders’ responsiveness and honesty
  • Beneficiaries reported higher trust in elected leaders but not in appointed bureaucrats
  • Government record-keeping on health and education improved in treatment communities

2. Kinda random: sand dams

Read a WB blogpost on sand dams as a method for increasing water sustainability in arid regions … but that did not explain how the heck you store water in sand, so watched this cool video from Excellent Development, a non-profit that works on sand dam projects.

3. USAID increasingly using “geospatial impact evaluations” ft. MAPS!

Outlines example of a GIE on USAID West Bank/Gaza’s recent $900 million investment in rural infrastructure

Ariel BenYishay, Rachel Trichler, Dan Runfola, and Seth Goodman at Brookings

BONUS: In other geospatial news
LSE blog post on the work of ground-truthing spatial data in Kenya

Week #5: Oct 10

Health Round-Up Edition

1. Dashboards for decisions: Immunization in Nigeria

A new dashboard is being used to improve data on routine immunizations … but doesn’t look like the underlying data quality has been improved. Is this just better access to bad data?

2. Norway vs. Thailand vs. US

A comparative study of health services for undocumented migrants

3. Traditional Midwives in Guatemala

Aljazeera on the complicated relationship between traditional midwives providing missing services and the gov’t trying to provide those services in health centers

BONUS: Visualizing fires + “good”

  • Satellite imagery of crop burning in India in 2017 vs 2018
  • How good is good? 6.92/10. The YouGov visualization on how people rate different descriptors on a 0-10 scale is really interesting if you look at the distributions – lots of agreement on appalling, average (you’d hope there would be clustering around 5!), and perfect. Then, pretty wide variance for quite bad, pretty bad, somewhat bad, great, really good, and very good. Shows how you should cut out generic good/bad descriptions in your writing and use words like appalling or abysmal that are more universally evocative.

Week #4: Oct 3

1. Tanzania outlaws critiques of their data!?

“Consider a simple policy rule: if a government’s statistics cannot be questioned, they shouldn’t be trusted. By that rule, the Bank and Fund would not report Tanzania’s numbers or accept them in determining creditworthiness—and they would immediately withdraw the offer of foreign aid to help Tanzania produce statistics its citizens cannot criticize.”

2. 12 Things We Can Agree On About Global Poverty?

In August, a CGDev post proposed 12 universally agreed-upon truths about global poverty. Do you agree? Are there other truths we should all agree on?

3. Food for thought on two relevant method issues

  • Peter Hull released a two-page brief on controlling for propensity scores instead of using them to match or weight observations
  • Spillover and estimands: “The key issue is that the assumption of no spillovers runs so deep that it is often invoked even prior to the definition of estimands. If you write the “average treatment effect” estimand using potential outcomes notation, as E(Y(1)−Y(0))E(Y(1)−Y(0)), you are already assuming that a unit’s outcomes depend only on its own assignment to treatment and not on how other units are assigned to treatment. The definition of the estimand leaves no space to even describe spillovers.”

BONUS: New head of IMF
Dr. Gita Gopinath takes over.

Weekly Development Links #3

My final week of taking over IDinsight’s internal development links.

1. Development myths: debunked

Rachel Glennerster asked for examples of development myths, resulting in a list development myths along with debunking sources / evidence against. Some of the myths shared, with accompanying evidence:

2. Traditional local governance systems (autocratic) underutilize local human capital

A new paper by Katherine Casey, Rachel Glennerster, Ted Miguel, and Maarten Voors. “We experimentally evaluate two solutions to these problems [autocratic local rule by old, uneducated men] in rural Sierra Leone: an expensive long-term intervention to make local institutions more inclusive; and a low-cost test to rapidly identify skilled technocrats and delegate project management to them. In a real-world competition for local infrastructure grants, we find that technocratic selection dominates both the status quo of chiefly control and the institutional reform intervention, leading to an average gain of one standard deviation unit in competition outcomes. The results uncover a broader failure of traditional autocratic institutions to fully exploit the human capital present in their communities.“

3. Aggressive U.S. recruitment of nurses from Philippines did not result in brain drain / negative health impacts

A new paper by Paolo Abarcar and Caroline Theoharides. “For each new nurse that moved abroad, approximately two more individuals with nursing degrees graduated. The supply of nursing programs increased to accommodate this. New nurses appear to have switched from other degree types. Nurse migration had no impact on either infant or maternal mortality.”

BONUS. Data viz: Poverty persists in Africa, falls in other regions

Justin Sandefur shared that the Economist much improved a World Bank graphic to more clearly visualize how the number of people living in poverty has risen slightly in Africa while other regions have seen sharp decreases in # of people in poverty over time. (Wonder how the graphic would like stacked Africa, South Asia, then East Asia & Pacific? Less dramatic contrast between Africa and the other regions? Number of poor in South Asia hasn’t decreased as dramatically as East Asia, would look more similar to Africa trend than East Asia trend until about 2010 I think.)

Weekly Development Links #2

This is part 2 of me taking over IDinsight’s internal development link round-up.

1. This week in gender & econ

2. Two papers on p-hacking or bad reporting in econ papers

3. Mapping trade routes Tilman Graff shared some really cool visualizations of trade routes, aid, and infrastructure in several Africa countries. They were created as part of his MPhil thesis.

Footbridges for higher wages

Lant Pritchett and other researchers often argue that development economists are too focused on one-off, micro interventions and fail to see the big picture. They are highly critical of the hype that develops around specific interventions following the release of studies using RCTs or other quasi-experimental methods to measure the impact of a specific program – microfinance, for example, had a big moment and, more recently, cash transfers have dominated many discussions of economic development.

Pritchett’s scorecard comparing first generation RCT practice to the approach of the non-RCT crowd is an especially brutal assessment of the micro development literature (second table in the link). He writes, “National Development leads to better well being. National development is ontologically a social process (markets, politics, organizations, institutions). RCTs have focused on topics that account for roughly zero of the observed variation in human development outcomes.”

There’s a lot that’s valid about this line of critique, although I think it’s more a call to be sure to contextualize learnings, ideally with qualitative research to investigate the how and why of a quantitative claim, rather than motivation to throw out the micro development approach altogether.

Besides, there is something so satisfying about how a small intervention can have a big impact.

Small bridges, big deal

Brooks and Donovan’s recent paper (full PDF here) found that building footbridges in Northern Nicaragua protected local workers from the typical wage loss seen during flooding, when travel routes are cut off, and even led to increased profits of local farmers.

Their primary finding is best seen through two graphics from the paper. The first shows the distribution of wage earnings before footbridge construction, and you can clearly see a massive disadvantage to those experiencing flooding. In the second, the gap has disappeared.

Figures 1 & 2: Distribution of wage earnings BEFORE footbridge construction

Figure 2: AFTER

They also find positive spillover effects. First, rural villagers were able to take higher paying jobs in nearby towns, increasing their wages and increasing the wages of those left behind, who faced less competition in the local labor market. (A similar mechanism to that found in the No Lean Season research, which offered select villagers incentives to migrate to cities for work and found positive income effects for those households and neighboring non-study households.)

Second, farmer profits increased. Not because of lower trade costs that allowed farmers to buy cheaper inputs, but because they were able to access new purchasing markets for their goods and diversify their income sources.

This paper is amazing because the data viz communicates clearly, the findings are meaningful and positive, and the idea for the research design had to have come from an intimate knowledge of the challenges facing rural citizens of Northern Nicaragua.

A national and local development tool

Infrastructure studies connect easily to those big questions about national development that anti-randomistas would prefer to focus on.While it won’t be footbridges in every location, there are lots of countries where road and transport infrastructure solutions are needed to promote both local and national development.

Papers like this one show how connectivity and access can be an important determinant of economic welfare via multiple mechanisms. Besides income effects like those measured in the Brooks and Donovan paper, there are possible effects for access to credit, healthcare, or other public services that isolated communities would otherwise miss out on.

Gaining entitlements with infrastructure and cash

There’s a seriously inspiring narrative in there – a simple change that leads to more options, more opportunities, more connectivity. As my colleague Sindy was discussing today, there is a pattern that interventions about increasing options and expanding opportunity, such as infrastructure improvements or cash transfers, seem more powerful to affect broad change than interventions targeting very narrow and specific goals.

Although, there is probably a gain in using both types of interventions at different times, or concurrently.

McIntosh and Zeitlin’s new paper compares a cash transfer program directly with a child nutrition program.The final line of their abstract made me think about paternalism and beneficiary preferences: “The results indicate that programs targeted towards driving specific outcomes can do so at lower cost than cash, but large cash transfers drive substantial benefits across a wide range of impacts, including many of those targeted by the more tailored program.”

People spend their money with different priorities than programs dictate and seem to get more out of it. That suggests to me that cash transfers (or infrastructure improvements) are a way to improve this baseline ability to provide for your household (“entitlements” à la Amartya Sen), while specific health or education interventions are more useful as public service-style campaigns to promote undervalued goods, such as immunizations.

A final thought

I’m generally curious how often Sen’s entitlements approach is explicitly applied to non-famine topics in development research. I’m guessing often. (A two-minute google led me to a PhD thesis called “Poverty as entitlement failures” that sounds interesting.)

Weekly Development Links #1

Each Wednesday at IDinsight, one of our tech team members, Akib Khan, posts a few links (mostly from Twitter!) to what he’s been reading in development that week. For the next three weeks, he’s on leave and I am taking over! Thought I should cross-post my selections (also mostly curated from #EconTwitter):

Cash Transfer Bonanza: The details matter
Blattman et al. just released a paper following up on previous 4-year results from a one-time cash transfer of $400, now reporting 9-year results (see first 3 links). To liven up the internal discussion, I’m adding critiques by Ashu Handa (UNC Transfer Project / UNICEF-Innocenti economist and old family friend), who has cautioned against lack of nuance in interpretation of CT study results, esp. around program implementation details like who is distributing grants, the size of the grants, and how frequently they are given – he studies social protection programs giving repeat cash transfers.

Diff-in-diff treatment timing paper… with GIFs!
Andrew Goodman-Bacon (what a name!) has a new paper that all of #EconTwitter is going crazy over. It deals with some methodological issues using diff-in-diff when treatment turns on at different times for different groups, and other scenarios where timing becomes important. Real paper not for the faint hearted, but the Twitter thread has some great GIFs!

African debt to China: reality doesn’t match the hype

Bonus link: Eritrea & Ethiopia border opening party

Thesis revamp: All hail Ted Miguel, PhD, god of economic writing!

      Ted Miguel, god of economic writing

In order to have a high-quality writing sample for the RA jobs I’m applying to this fall, I am revamping my thesis! Joy of joys!

I thought about doing this earlier in the year and even created a whole plan to do it, but ended up deciding to work on this blog, learning to code, and other, less horrifying professional development activities.

I say horrifying because the thesis I submitted was HORRIBLY WRITTEN. So so so bad. I cringe every time I look back over it. I had tackled a 6-year project (the length of time it took to write the paper I was basing my thesis on, I later found out) in four months time. Too little of the critical thinking I had done on how to handle the piles and piles of data I needed to answer my research question actually ended up in writing.

I thought it would be a drag to fix up the paper. I didn’t expect to still be as intrigued by my research topic (democracy and health in sub-Saharan Africa!) or to be as enthusiastic about practicing my economic writing. I’m taking the unexpected enjoyment as a positive sign that life as a researcher will be awesome.

I’ve been thinking critically about the question of democracy and health and how they’re interrelated and how economic development ties into each. I’ve read (skimmed) a few additional sources that I didn’t even think to look for last time and I already have some good ideas for a new framing of why this research is interesting and important. The first time around, I focused a lot on the cool methodology (spatial regression discontinuity design) because that’s what I spent most of my time working on.

My perspective on the research question has been massively refreshed by time apart from my thesis, new on-the-ground development experience, and the papers I’ve read in the interim.

My first tasks have been to re-read the thesis (yuck), and then gather the resources I need to re-write at least the introduction. I am focusing on the abstract and introduction as the first order of business because some of the writing samples I will need to submit will be or can be shorter and the introduction is as far as most people would get anyways.

To improve my writing and the structure of my introduction, my thesis advisor – who I can now call Erick instead of Professor Gong – recommended reading some of Ted Miguel’s introductions. I printed three and all were well-written and informative in terms of structure; one of them (with Pascaline Dupas) even helped me rethink the context around my research question and link it more solidly to the development economics literature.

The next move is to outline the introduction by writing the topic sentence of each paragraph (a tip taken from my current manager at IDinsight, Ignacio, who is very into policy memo-style writing) using a Miguel-type structure. I’ll edit that structure a bit, then add the text of the paragraphs.

Noble work: Anand Giridharadas on the EKS

There was a recent discussion on the IDinsight #philosophy Slack channel about a recent Ezra Klein Show (EKS on this blog from now on, since I talk about it all the time) podcast with Anand Giridharadas. My contribution built off someone else’s notes that Giridharadas is spot on about how companies (also IDinsight in some ways) sell working for them as an extension of the camaraderie and culture of a college campus, how he doesn’t offer concrete solutions and that’s very annoying, and some reflections on transitioning from private sector consulting to IDinsight’s social sector, non-profit consulting model. I related more to the moral arguments in the podcast, and this is what I shared:

I connected most with his argument about how the overall negative impact of many big for-profit companies on worldwide well-being vastly outweighs any individual good you can do with the money you earn. One of EA’s recommended pathways to change is making a ton of money and giving it to effective charities, but if you do that by working for an exploitative company, then you’re really contributing to the maintenance of inequality and of the status quo racist, sexist, oppressive system.

My dad was always talking about having a “noble” profession when I was growing up (he’s a teacher and my mom’s a geriatric physical therapist) and even though “noble” is a strange way to put it, I think it is really important to (as much as possible) only be party to organizations and companies that are doing good or at least not doing active harm.

That being said, there are more reasons for going into the private sector and aiming to make money than are really dealt with in the podcast. For example, a few people we’ve talked to in South Africa have mentioned that many highly skilled South Africans are responsible for the education costs for all siblings/cousins and that is a strong motivator to take a higher paying salary.

It becomes very related to the debate about how much development or social sector workers should get paid, relative to competitive private sector jobs. I think IDinsight does a pretty good job of being in the middle for US associates anyways – paying enough that you can even save some, which is more than a lot of non-profits provide, but not necessarily trying to compete with private sector jobs because our model relies a lot on hiring people who are in it to serve, not for the money. Something for us to continue thinking about is how this might exclude candidates who have other financial responsibilities and how we should respond to this issue in how we hire and set salaries.

It’s so frustrating when people identify a problem without offering solutions. The closest he comes to offering solutions is to have organizations stop lobbying for massive tax breaks or in other ways deprioritize the bottom line of profitability. Sounded to me like his vision involves a lot more socialist ideas: the full solutions to these issues would involve massive-scale reorganizing of the existing economic system… although maybe we are heading in that direction with more co-op style companies and triple bottom line for-profit social enterprises? (Don’t know a ton about this co-op stuff – mostly from another Ezra Klein show episode probably, but it sounds cool!) …Maybe his next book will try to map out solutions, though?

Look out for: Market impacts of cash transfers

Forthcoming – “General Equilibrium Effects of Cash Transfers,” from Paul NiehausJohannes HaushoferTed Miguel and Michael Walker, answering the question: What are the market impacts of an inflow of ~15% of local GDP?

I have to say I barely understand what’s going on in a “market” … my economic background is very individual- and household-focused.

But understanding the effect of an intervention on a community as a whole, not just on those treated, seems really important. Partially, this is why we look at and consider spillover effects – the effect of an intervention on the neighbors of the treated, who didn’t receive the program themselves.

General equilibrium or market effects investigate a level up from spillover effects and treatment effects – they look at the cumulative impact of the program to the way the economy operates.

I couldn’t explain how one studies a specific market, or what counts as being part of the market, in any clear terms, but I’m still excited for this upcoming paper on the market-level effects of cash transfers, a point that has been debated recently, after evidence of potential negative spillover effects came out.

 

I’m pretty sure I just solved life

Disclaimer: I was a little drunk on power (calculations) when I wrote this, but it’s me figuring out that econometrics is something I might want to specialize in!

I think I just figured out what I want to do with the rest of my career.

I want to contribute to how people actually practice data analysis in the development sector from the technical side.

I want to write about study design and the technical issues that go into running a really good evaluation, and I want to produce open source resources to help people understand and implement the best technical practices.

This is always something that makes me really excited. I don’t think I have a natural/intuitive understanding of some of the technical work, but I really enjoy figuring it out.

And I love writing about/explaining technical topics when I feel like I really “get” a concept.

This is the part of my current job that I’m most in love with. Right now, for example, I’m working on a technical resource to help IDinsight do power calculations better. And I can’t wait to go to work tomorrow and get back into it.

I’ve also been into meta-analysis papers that bring multiple studies together. In general, the meta-practices, including ethical considerations, of development economics are what I want to spend my time working on.

I’ve had this thought before, but I haven’t really had a concept of making that my actual career until now. But I guess I’ve gotten enough context now that it seems plausible.

I definitely geek out the most about these technical questions, and I really admire people who are putting out resources so that other people can geek out and actually run better studies.

I can explore the topics I’m interested in, talk to people who are doing cool work, create practical tools, and link these things that excite me intellectually to having a positive impact in people’s lives.

My mind is already racing with cool things to do in this field. Ultimately, a website that is essentially an encyclopedia of development economics best practices would be so cool. A way to link all open source tools and datasets and papers, etc.

But top of my list for now is doing a good job with and enjoy this power calculations project at work. If it’s as much fun as it was today, I will be in job heaven.

Continue reading I’m pretty sure I just solved life

Recommendations of the Week: June 18-24

Blog

Goddess-Economist Seema Jayachandran wrote about economists’ gendered view of their own discipline back in March. Dr. Jayachandran and PhD student co-author Jamie Daubenspeck investigate:

  1. Percent of woman authors on different development topics: Drawing on all empirical development papers from 2007-2017, they find, out of all papers, “51% were written by all men, and 15% by all women. The average female share of authors was 28% (weighting each paper equally).” Gender, health, trade, migration, education, poverty and conflict are the development topics with a greater than average number of woman authors.
  2. Economists’ perspectives on under-researched topics: They show that there is a negative correlation between a topic’s % of woman authors and perceptions the topic is under-researched, a finding they call “a bit depressing.” Same. (They also write that “whether a topic is under-researched are not significantly correlated with the actual number of articles on the topic published in the JDE over our sample period.” So what do these economists even know?)

I love their thoughtful outline of the methodology they used for this little investigation. Describing the world with data is awesome.

Awesome Humans

I ended up hearing about/reading about several amazing humans this week:

Dr. Nneka Jones Tapia – the clinical psychologist running Cook County Jail – had amazing things to say on the Ezra Klein Show last year in July. She is powerful and thoughtful and doing amazing things to improve prisons in the US.

New Zealand PM Jacinda Ardern gave birth on the 21st. She’s only the second world leader to give birth in office, after Pakistan’s Benazir Bhutto. The best part is that she is 100% unapologetic about being a mother in office, even while she acknowledges the challenges she will personally face in balancing a new baby and work.

These two leaders are just out there in the world leading noble, thoughtful, innovative lives. In love.

And then there’s MJ Hegar, who’s running for Congress against a tea partier in Texas. Her amazingly directed ad shows how enduring her dedication to service has been throughout her life:

Life Skill

My best friend Riley and I made a pact to meditate daily for ten days, starting on Monday. I have done it each day this week and my week has felt fuller and more focused than ever. Not willing to attribute full causality to the meditation, but it definitely has been a tool to start my day well and a reminder throughout the day that I can and want to stay focused and in the moment.

Podcast

The Ezra Klein Show interviews are always on point, and “The Green Pill” episode featuring Dr. Melanie Joy was no exception. The June 11 show discussed “carnism” – the unspoken ideology that tells us eating animals, wearing animals, and otherwise instrumentalizing them is good.

I’ve been mulling it over for a while now, but the episode’s frank conversation about why veganism is so hard to talk about pleasantly – and why it’s so hard for people to shift from a carnal mindset – motivated me to head back down the vegetarian path.

I was vegetarian for a year or so in college, but now I’m aiming for veganism, or something close. I’m not eating meat and am not actively purchasing or eating eggs or milk. At this point, I’ll eat eggs or milk or other animal products that are already baked into something – a slice of cake, for example. Eventually, I want to phase out pretty much all animal products. But I’m giving myself some space to adjust and dial back the carnism bit by bit. The incremental approach should let me stick to it better.

Cheese will probably be my “barrier food” – apparently this is so common, there’s a webpage that specifically teaches how to overcome the cheese block. (hehe)

They recommend slowly replacing cheese with guac or hummus, and taking a large break from any cheese before trying vegan cheese. (Which won’t be a problem since I doubt there’s any vegan cheese in Kenya to begin with!)

Cover Image

Fruit

It is not mango season in Kenya, but I had the best mango this week. Maybe because I cut it myself for the first time, making an absolute mess. Or maybe because it was the key ingredient to the first lettuce-containing salad I’ve ever made myself at home. But there’s a lot to be said for a fruit that encourages you to embrace your messy nature.

Why you should convert categorical variables into multiple binary variables

Take the example of a variable reporting if someone is judged to be very poor, poor, moderately rich, or rich. This could be the outcome of a participatory wealth ranking (PWR) exercise like that used by Village Enterprise.

In a PWR exercise, local community leaders can identify households that are most vulnerable. These rankings can then be used to target a development program (like VE’s graduation-out-of-poverty program that combines cash transfers with business training) to the community members that are most in need.

Let’s say that you want to include the PWR results in a regression analysis as a covariate. You have a dataset of all the relevant variables for each household, including a variable that records whether the household was ranked in the PWR exercise as very poor, poor, moderately rich, or rich.

You need to convert this string variable (text) into a numeric value. You could assign each option a value from 1 to 4, with 1 being “very poor” and 4 meaning “rich” … but you shouldn’t use this directly in your regression.

If you have a variable that moves from 1 to 2 to 3 to 4, you’re implying that there is a linear pattern between each of those values. You’re saying that the effect on your outcome variable of going from being very poor (1) to poor (2) is the same as the effect of going from poor (2) to moderately rich (3). But you don’t know what the real relationship is between the different PWR levels, since the data isn’t that granular. You can’t make the linear assumption.

So instead, you should use four different binary variables in your regression: Ranked “very poor” or not? “Poor” or not? “Moderately rich” or not? “Rich” or not?

This Stata support page does a great job of summarizing how to apply this in your regression code or create binary variables from categorical using easy shortcuts. I like:

reg y x i.pwr

But how do you interpret the results?

When you create dummies (binary variables) out of a categorical variable, you use one of the group dummies as the reference group and don’t actually include it in the regression.

By default, the reference group is usually the smallest/lowest group. In this case, that means “very poor.” So in the regression, you’ll have three dummies, not four. Being “very poor” is the base condition against which to compare the other rankings.

Let’s say there is a statistically significant, positive coefficient on the “moderately rich” dummy in your regression results. That means that, compared to the base condition of being very poor, being moderately rich has a positive effect on your outcome variable.

Ӧzler: Decrease power to detect only a meaningful effect

Photo by Val Vesa on Unsplash

Reading about power, I found an old World Bank Impact Evaluations blog post by Berk Ӧzler on the perils of basing your power calcs in standard deviations without relating those SDs back to the real life context.

Ӧzler summarizes his main points quite succinctly himself:

“Takeaways:

  • Think about the meaningful effect size in your context and given program costs and aims.
  • Power your study for large effects, which are less likely to disappear in the longer run.
  • Try to use all the tricks in the book to improve power and squeeze more out of every dollar you’re spending.”

He gives a nice, clear example to demonstrate: a 0.3 SD detectable effect size sounds impressive, but for some datasets, this would really only mean a 5% improvement which might not be meaningful in context:

“If, in the absence of the program, you would have made $1,000 per month, now you’re making $1,050. Is that a large increase? I guess, we could debate this, but I don’t think so: many safety net cash transfer programs in developing countries are much more generous than that. So, we could have just given that money away in a palliative program – but I’d want much more from my productive inclusion program with all its bells and whistles.”

Usually (in an academic setting), your goal is to have the power to detect a really small effect size so you can get a significant result. But Ӧzler makes the opposite point: that it can be advantageous to only power yourself to detect what is a meaningful effect size, decreasing both power and cost.

He also advises, like the article I posted about yesterday, that piloting could help improve power calculations via better ICC estimates: “Furthermore, try to get a good estimate of the ICC – perhaps during the pilot phase by using a few clusters rather than just one: it may cost a little more at that time, but could save a lot more during the regular survey phase.”

My only issue with Ӧzler’s post is his chart, which shows the tradeoffs between effect size and the number of clusters. His horizontal axis is labeled “Total number of clusters” – per arm or in total, Bert?!? It’s per arm, not total across all arms. There should be more standardized and intuitive language for describing sample size in power calcs.

Gendered language -> gendered economic outcomes

A new paper by Jakiela and Ozier sounds like an insane amount of data work to classify 4,336 languages by whether they gender nouns. For example, in French, a chair is feminine – la chaise.

They find, across countries:

  • Gendered language = greater gaps in labor force participation between men and women (11.89 percentage point decline in female labor force participation)
  • Gendered language = “significantly more regressive gender norms … on the magnitude of one standard deviation”

Within-country findings from Kenya, Niger, Nigeria, and Uganda – countries with sufficient and distinct in-country variation in language type – further show statistically significant lower educational attainment for women who speak a gendered language.

(Disclaimer: The results aren’t causal, as there are too many unobserved variables that could be at play here.)

As the authors say: “individuals should reflect upon the social consequences of their linguistic choices, as the nature of the language we speak shapes the ways we think, and the ways our children will think in the future.”

3ie: Improve power calculations with a pilot

3ie wrote on June 11 about why you may need a pilot study to improve power calculations:

  1. Low uptake: “Pilot studies help to validate the expected uptake of interventions, and thus enable correct calculation of sample size while demonstrating the viability of the proposed intervention.”
  2. Overly optimistic MDEs: “By groundtruthing the expected effectiveness of an intervention, researchers can both recalculate their sample size requirements and confirm with policymakers the intervention’s potential impact.” It’s also important to know if the MDE is practically meaningful in context.
  3. Underestimated ICCs: “Underestimating one’s ICC may lead to underpowered research, as high ICCs require larger sample sizes to account for the similarity of the research sample clusters.”

The piece has many strengths, including that 3ie calls out one of their own failures on each point. They also share the practical and cost implications of these mistakes.

At work, I might be helping develop an ICC database, so I got a kick out of the authors’ own call for such a tool…

“Of all of the evaluation design problems, an incomplete understanding of ICCs may be the most frustrating. This is a problem that does not have to persist. Instead of relying on assumed ICCs or ICCs for effects that are only tangentially related to the outcomes of interest for the proposed study, current impact evaluation researchers could simply report the ICCs from their research. The more documented ICCs in the literature, the less researchers would need to rely on assumptions or mismatched estimates, and the less likelihood of discovering a study is underpowered because of insufficient sample size.”

…although, if ICCs are rarely reported, I may have my work cut out for me!

You have to pay to be published??

Clockwise from top left: Dr. Francisca Oboh-Ikuenobe, Dr. Nii Quaynor, Mohamed Baloola, Dr. Florence Muringi Wambugu.

I was reading about the new African journal – Scientific African – that will cater specifically to the needs of African scientists. Awesome!

Among the advantages of the new journal is the fact that “publication in Scientific African will cost $200, around half of what it costs in most recognised journals.”

Wait.

You have to pay to be published in an academic journal? Dang.

I guess that cost is probably built into whatever research grant you’re working on, but in most other publications, I thought writers got paid to contribute content. I guess it’s so that there’s not a direct incentive to publish as much as possible, which could lead to more falsified results? Although it seems like the current model has a lot of messed up incentives, too.

“What are people currently doing?”

Andrew Gelman’s recent blog post responding to a Berk Özler hypothetical about data collection costs and survey design raised a good point about counterfactuals that I theoretically knew, but was phrased in a way that brought new insight:

“A related point is that interventions are compared to alternative courses of action. What are people currently doing? Maybe whatever they are currently doing is actually more effective than this 5 minute patience training?”

It was the question “What are people currently doing?” that caught my attention. It reminded me that one key input for interpreting results of an RCT is what’s actually going on in your counterfactual. Are they already using some equivalent alternative to your intervention? Are they using a complementary or incompatible alternative? How will the proposed intervention interact with what’s already on the ground – not just how will it interact in a hypothetical model of what’s happening on the ground?

This blogpost called me to critically investigate what quant and qual methods I could use to understand the context more fully in my future research. It also called me to invest in my ability to do comprehensive and thorough literature reviews and look at historical data – both of which could further inform my understanding of the context. And, even better, to always get on the ground and talk to people myself. Ideally, I would always do this in-depth research before signing onto the kind of expensive, large-scale research project Özler and Gelman are considering in the hypothetical.

“Obviously” in academic writing

Academic writing is full of bad habits. For example, using words like “obviously,” “clearly,” or “of course.” If the author’s claim or reasoning really is obvious to you, these words make you feel like you’re in on the secret; you’re part of the club; you’ve been made a part of the “in” group.

But when you don’t know what they’re talking about, the author has alienated you from their work. They offer no explanation of the concept because it seems so simple to them that they simply won’t deign to explain themselves clearly to those not already “in the know.”

Part of an academic’s job is to clearly explain every argument in their papers. It is lazy and exclusionary to imply readers should already understand a concept or a path of reasoning.

At worst, it just makes you sound rude and superior:

“Advertising is, of course, the obvious modern method of identifying buyers and sellers.” – Stigler, “The Economics of Information”

He really doubled-down on how evident this fact is, which only tells the reader how smart he thinks he is. The sentence could have read, “Advertising is the preferred modern method of identifying buyers and sellers,” and could have included a citation.

On the other hand, a non-exclusionary use of “obviously”:

“Obviously, rural Ecuador and the United States are likely to differ in a large number of ways, but the results in this (and other recent) papers that show a shifting food Engel curve point to the risks inherent in assuming that the Engel curve is stable.” – Shady & Rosero paper on cash transfers to women

The authors had previously compared two papers from two very different contexts; they use “obviously” to acknowledge the potential issues with comparing these two settings. This is an acceptable use case because the statement that follows actually is obvious and is bringing any reader on board by acknowledging a possible critique of the argument. It is an acknowledgement of possible lack on the author’s part, rather than a test of the reader’s intelligence or prior knowledge.

Grounded Theory, Part 1: What is it?

Photo by Calum MacAulay on Unsplash

I recently read Brené Brown’s Daring Greatly. The book presents Brown’s research, but it can feel more like a personal guidebook to tackling issues of vulnerability and shame.

Because the research has a conversational feel, it’s hard to understand how much of the book is based in research and how much in Brown’s individual experiences. She weaves in personal stories frequently, often to demonstrate a prickly emotional experience that was common across her interviews. But when I reached the end of the book, I wanted to know how she drew these theories from the data. I’ve only worked sparingly with qualitative data: how does one “code” qualitative data? How do you analyze it without bringing in all sorts of personal biases? How do you determine its replicability, internal and external validity, and generalizability?

Ingeniously, Brown grounds the book in her research methods with a final chapter on grounded theory methodology. Her summary (also found online here) was a good introduction to how using grounded theory works and feels. But I still didn’t “get” it.

So I did some research.

Grounded Theory

Brown quotes 20th century Spanish poet Antonio Machado at the top of her research methods page:

“Traveler, there is no path. / The path must be forged as you walk.”

This sentiment imbued the rest of the grounded theory (GT) research I did. Which seemed bizarre to a quant-trained hopeful economist. I’m used to pre-analysis plans, testing carefully theorized models, and starting with a narrow question.

Grounded theory is about big questions and a spirit of letting the data talk to you.

Founded by Barney Glaser and Anselm Strauss in 1967, GT is a general research methodology for approaching any kind of research, whether qual- or quant-focused. When using GT, everything is data – your personal experiences, interviews, mainstream media, etc. Anything you consume can count, as long as you take field notes.

Writing field notes is one of the key steps of GT: coding those notes (or the data themselves – I’m still a little blurry on this) line-by-line is another. The “codes” are recurring themes or ideas that you see emerging from the data. It is a very iterative methodology: you collect initial data, take field notes, code the notes/data, compile them into memos summarizing your thoughts, collect more data based on your first learnings, code those, compile more memos, collect more data…

Throughout the whole process, you are theorizing and trying to find emergent themes and ideas and patterns, and you should actively seek new data based on what your theories are. You take a LOT of written notes – and it sounds like in the Glaserian tradition, you’re supposed to do everything by hand. (Or is it just not using any algorithms?)

Brown describes the data she collected and her coding methodology:

“In addition to the 1,280 participant interviews, I analyzed field notes that I had taken on sensitizing literature, conversations with content experts, and field notes from my meetings with graduate students who conducted participant interviews and assisted with the literature analysis. Additionally, I recorded and coded field notes on the experience of taking approximately 400 master and doctoral social-worker students through my graduate course on shame, vulnerability, and empathy, and training an estimated 15,000 mental health and addiction professionals.

I also coded over 3,500 pieces of secondary data. These include clinical case studies and case notes, letters, and journal pages. In total, I coded approximately 11,000 incidents (phrases and sentences from the original field notes) using the constant comparative method (line- by- line analysis). I did all of this coding manually, as software is not recommended in Glaserian-grounded theory.” [emphasis mine]

The ultimate goal is to have main concepts and categories emerge from the data, “grounded” in the data, that explain what main problem your subjects are experiencing and how they are trying to solve that problem. For example, Brown’s work centers on how people seek connection through vulnerability and try to deal with shame in various health and unhealthy ways. She started with this big idea of connection and just started asking people about what that meant, what issues there were around it, etc. until a theory started to arise from those conversations.

You’re not supposed to have preexisting hypotheses, or even do a literature review to frame specific questions, because that will bias how you approach the data. You’re supposed to remain open and let the data “speak to you.” My first instinct on this front is that it’s impossible to be totally unbiased in how you collect data. Invariably, your personal experience and background determine how you read the data. Which makes me question – how can this research be replicable? How can a “finding” be legitimate as research?

My training thus far has focused on quantitative data, so I’m primed to preference research that follows the traditional scientific method. Hypothesize, collect data, analyze, rehypothesize, repeat. This kind of research is judged on:

  • Replicability: If someone else followed your protocol, would they get the same result?
  • Internal validity: How consistent, thorough, and rigorous is the research design?
  • External validity: Does the learning apply in other similar populations?
  • Generalizability: Do the results from a sample of the population also apply to the population as a whole?

GT, on the other hand, is judged by:

  • Fit: How closely do concepts fit the incidents (data points)? (aka how “grounded” is the research in the data?)
  • Relevance: Does the research deal with the real concerns of participants and is it of non-academic interest?
  • Workability: Does the developed theory explain how the problem is being solved, accounting for variation?
  • Modifiability: Can the theory be altered as new relevant data are compared to existing data?

I also read (on Wikipedia, admittedly), that Glaser & Strauss see GT as never “right” or “wrong.” A theory only has more or less fit, relevance, workability, or modifiability. And the way Brown describes it, I had the impression that GT should be grounded in one specific researcher’s approach:

“I collected all of the data with the exception of 215 participant interviews that were conducted by graduate social-work students working under my direction. In order to ensure inter-rater reliability, I trained all research assistants and I coded and analyzed all of their field notes.”

I’m still a bit confused by Brown’s description here. I didn’t know what inter-rater reliability was, so I had assumed it meant that the study needed to have internal consistency in who was doing the coding. But when I looked it up online, it appears to be the consistency of different researchers to code the same data in the same way. So I’m not sure how having one person do all of the research enables this kind of reliability. Maybe if your GT research is re-done (replicated) by an independent party?

My initial thoughts are that all GT research sound like they should have two authors that work in parallel but independently, with the same data. Each develops separate theories and then at the end, the study can compare the two parallel work streams to identify what both researchers found in common and where they differed. I still have a lot of questions about how this works, though.

Lingering Questions

A lot of my questions are functional. How do you actually DO grounded theory?

  • How does GT coding really work? What does “line-by-line” coding mean? Does it mean you code each sentence or literally each line of written text?
  • Do these ever get compiled in a database? How do you weight data sources by their expertise and quality (if you’re combining studies and interviews with average Joes, do you actively weight the studies)? -> Can you do essentially quantitative analysis on a dataset based on binary coding of concepts and categories?
  • How do you “code” quantitative data? If you had a dataset of 2000 household surveys, would you code each variable for each household as part of your data? How does this functionally work?
  • If you don’t do a literature review ahead of time, couldn’t you end up replicating previous work and not actually end up contributing much to the literature?

And then I also wondered: how is it applicable in my life?

  • Is GT a respected methodology in economics? (I’d guess not.)
  • How could GT enhance quant methods in econ?
  • Has GT been used in economic studies?
  • What kinds of economic questions can GT help us answer?
  • Should I learn more about GT or learn to use it in my own research?

Coming up: Part 2, Grounded Theory & Economics

To answer some of my questions, I want to do an in-depth read of a paper from the 2005 Grounded Theory Review by Frederick S. Lee: “Grounded Theory and Heterodox Economics.” (The journal has another article from 2017 entitled “Rethinking Applied Economics by Classical Grounded Theory: An invitation to collaborate” by Olavur Christiansen that I hope to read, too.)

Are we murderers for not donating our organs? [repost]

Zell Kravinsky risked his life to donate his healthy kidney to a complete stranger. Would you do the same?

Kravinsky is a radical altruist. He believes in giving away as much as possible to others, including his nearly $45 million fortune and his own body parts. Most people would consider donating a kidney as going above and beyond, but Kravinsky told the New Yorker in 2004 that he considers anyone who doesn’t donate their extra kidney a murderer.

We probably don’t, as individuals, have a moral responsibility to donate our organs, but maybe we do have a societal responsibility to find a system by which we can match kidney donors and recipients so that no one has to die just because there isn’t a transplant available. In 2012, there were 95,000 Americans on the wait list for a life-saving kidney, according to economists Gary Becker and Julio Elias. The average wait time for a kidney in 2012 was over four years.

Becker and Elias are proponents of creating a formal, legal market for organs to eliminate long wait times and better match recipients with donors. Right now, it is illegal to sell your organs in most of the world, including in the U.S.

The main risks of monetary compensation for organ donations are the coercion of unwilling donors, the potentially unequal distribution of donors — poor people would be more likely to become donors, and the moral question of whether or not it is okay to sell body parts, even if they are our own.

Purely moral arguments aside for a moment, there are ways to alleviate the risks of a market for organs. Waiting periods between registration and donation, psychiatric evaluation ahead of registration as an organ donor, and strict identification requirements or even background checks can all combat coercion in the market for organs, while saving the lives of the many Americans who die on an organ waitlist. Becker and Elias also point to the fact that people in lower income brackets are disproportionately affected by long waitlists: the wealthy can fly abroad to obtain a healthy organ or manipulate the current waitlist system in their favor, while poorer Americans face longer wait times. While donors may be disproportionately poor, which raises concerns of implicit economic coercion, the lower income brackets also benefit disproportionately from the policy.

Even more powerful than a legal market alone would be a combination of a legal market for organs and an implied consent law, which would mean people would have to opt out of being an organ donor, rather than the U.S. standard of opting into being a donor. A 2006 study by economists Alberto Abadie and Sebastien Gay found that implied consent laws have a positive impact on organ donations. Under a combination of these two initiatives, essentially all organ donor needs might be met, and a person’s will might come to include provisions for their organs to be harvested and family members to be compensated.

While Kravinsky donated his kidney for free, he once offered a journalist $10,000 to donate a kidney to a stranger, according to Philadelphia Magazine. But the journalist backed out of the deal he struck with Kravinsky after his wife and friends convinced him not to go through with it. They convinced him that the risk of surgery, though relatively minor, was not worth saving a life. But if a safe, legal market for organ sales is established, perhaps the establishment of a market price for organ donation and a normalization of the procedure will allow Americans to save lives and make money, without requiring Kravinsky’s extreme, and perhaps aggressive, sort of altruism.

Originally written for my Economics of Sin senior seminar, spring 2017; previously published at the Unofficial Economist on Medium.

Is my job moral? [repost]

If I continue on my current career path, I may end up arbitrating who lives and who dies. (And maybe I’ll tell their story in an economics journal and make a living doing so.)

I am planning on pursuing a career in development work, specifically in the evaluation of development programs. The “gold standard” for evaluating programs is a Randomized Control Trial (RCT).

Consider a non-profit distributing books to children with the goal of improving literacy. The non-profit wants to know whether their books really have any impact on children’s literacy. Ideally, they could look at what happens when they give a group of children the books and also what happens when they don’t give the same children books.

However, due to thus far unchangeable time-space continuum properties, this isn’t possible. So, in order to confidently say that their books had an impact, the non-profit needs to compare the literacy scores of children who received the books with other very similar children who didn’t get books. Let’s say they hire me to run an RCT for this very purpose.

To determine which children will get the books (the treatment group) and which children will serve as the comparison group (the control group), I take a list of 100 schools and randomly assign half of them to receive the extra books program. After the books are distributed and some time has passed, I go back to the schools and I have all the children take literacy tests. I compare the test scores of children in each group, and find that, on average, children who received books did much better on the literacy tests.

The non-profit is very happy and uses the results to convince more people to donate to their program. Now they can give books to many more children, and presumably those children’s literacy scores will also increase.

This is all good and well. Even if some children in the study were chosen not to receive books, there are several commonly accepted justifications for why we studied them without providing a service:

  • The non-profit did not have enough money to give books to all the schools anyway. Randomly determining which schools received the books makes it as fair as possible.
  • While the books program was unlikely to have negative effects on children, we didn’t know if it would have no effect or a positive effect at the start. So we didn’t know if we were really depriving children of a chance to improve their literacy.
  • Being able to conduct the evaluation could inform policy and global knowledge on effective ways to improve literacy, and could improve decision-making at the non-profit.
  • In this case, maybe the control group children were the first to receive books when the non-profit’s funding increased.

These are common justifications for development evaluations. They seem quite reasonable — randomly giving out benefits might be the fairest option, we don’t know what the effect really is, and the study will contribute to our shared knowledge and lead to better decisions and even better outcomes in the future.

What if, instead of working on literacy, the non-profit wanted to reduce deaths from childbirth by improving access to and use of health facilities by pregnant women?

Suddenly, so much more is at stake.

If I randomly assign half a county to have access to a special taxi service that drives pregnant women to hospitals for safer deliveries, and one of the women who was assigned NOT to receive the taxi service dies because she gave birth at home, is the evaluation immoral? Am I morally culpable for her death?

Because I work with numbers and data, it is easy to separate myself from the potential negative consequences of the work. I didn’t choose her to die — the random number generator made me do it. 

Photo by Markus Spiske on Unsplash

So what if we’re in a situation where a randomized control trial seems immoral? How can we still learn about what works and what doesn’t?

There are other evaluation methods that can give us an idea of what programs work and which don’t. For example, quasi-experimental methods look at situations where comparable control and treatment groups are incidentally defined by the implementation of a policy. Then we can compare two groups without having to be responsible for directly assigning some people to receive a program while others go without.

Qualitative or other non-experimental methods involve gathering data by talking to people, doing research, and meeting with different groups to get various opinions on what’s happening. These methods can also help paint a picture of whether a program is having a positive effect.

But the RCT is the gold standard for a reason. A well-designed RCT can tell us what the effect of a program is with much higher confidence and precision than other methods.

UNICEF Social Policy Specialist Tia Palermo recently wrote a post titled “Are Randomized Control Trials Bad for Children?” for UNICEF’s Evidence for Action blog. She makes a powerful point to consider: What are the alternatives to running RCTs? Are they better or worse?

Palermo sees the alternative as worse: “Is it ethical to pour donor money into projects when we don’t know if they work? Is it ethical not to learn from the experience of beneficiaries about the impacts of a program?” she asks.

Her most convincing argument is that there are ethical implications every research method we might choose:

“A non-credible or non-rigorous evaluation is a problem because underestimating program impacts might mean that we conclude a program or policy doesn’t work when it really does (with ethical implications). Funding might be withdrawn and an effective program is cut off. Or we might overestimate program impacts and conclude that a program is more successful than it really is (also with ethical implications). Resources might be allocated to this program over another program that actually works, or works better.”

And there are ethical implications to not evaluating programs at all. If non-profits aren’t held to any standard and don’t measure the effect of their program at all, there’s no way to tell which interventions and which non-profits are helping, having no effect on, or even harming the program recipients.

In the case of the woman who died because she didn’t get to a health facility, if the study had never taken place, would she have gotten to a health facility or not? It is impossible to know what would have happened, but it’s not impossible to minimize the risk of harm and maximize the benefits to all study participants. 

Photo by Anes Sabitovic on Unsplash

Ultimately, RCTs generate important evidence when they are well executed. The findings from such studies can be used to make better decisions at non-profits, at big donor foundations like the Gates Foundation or GiveWell, and at government agencies. All of which can lead to more lives saved, which is the ultimate goal.

So what to do about the ethical implications of randomly determining who gets access to a potentially life-saving program? Or any program that could have a positive impact on people’s lives?

There are a variety of measures in place to ensure ethical conduct in research and many more ~official~ economists are thinking about these ideas.

The 1979 Belmont Report in helped establish criteria for ethics in human research, focusing on respect for people’s right to make decisions freely, maximizing benefits and doing no harm, and fairness in who bears any risks or benefits. Institutional Review Boards (IRBs) are governing bodies that ensure these principles are being upheld for all research.

Economists Rachel Glennerster Shawn Powers wrote a highly recommended piece on these ethical considerations, “Balancing Risk and Benefit: Ethical Tradeoffs in Running Randomized Evaluations,” which I’m currently reading.

Yet persistent concerns about how to run ethical evaluations suggest that there is more work to do.

Taking the time to consider the ethical implications of each project is key. And I think there is more room for evaluators to read deeply on the subject and really dig into how to make evaluations more just and more beneficial to even those in the control group who don’t receive the program.

A driving principle, especially for researchers running RCTs in the development field, could be that an evaluation must have a direct positive impact on all study participants, either during the study or immediately following its completion. There are a variety of ways, some more commonly used than others, that researchers can apply this principle:

  • If we truly don’t know whether the effect of the program is positive or negative, we can make plans to provide the program to control households if it is found to have a positive effect.
  • If we suspect the program has a positive effect, the control group can be offered the program immediately after the study period has ended.
  • We can offer everyone in the study a base service, while the study tests the effectiveness of an additional service provided only to the treatment group. This way, everyone who is contributing time and information to the study receives some benefit in return.
  • Extensive piloting (testing different ideas and aspects of the evaluation before the start of the study) can also reveal potential moral dilemmas to evaluating any particular program.
  • Community interest meetings can be held before the study is implemented to gain community-level consent to participate in the study. These meetings could also be held quite early on to inform research designs and improve the quality of the study results. For example, in some cultures, it is not appropriate for a man to be alone with a woman he is not related to. If this is the case in a study area, then hiring male staff to conduct surveys would lead to a less successful study.
  • Local staff can be hired to conduct any surveys or data collection to ensure that the surveys are culturally appropriate.
  • We always obtain full and knowledgable consent from participants, which may require translating surveys into participants’ native language.
  • If study participation requires much time or effort from control group individuals, they can be appropriately compensated.
  • All reports on evaluations (RCTs and other designs) can be fully transparent about research decisions and how ethical concerns were addressed. This will contribute to the international research community’s combined knowledge of how to ensure the rights of participants are provided for in RCTs and other research.
  • The learnings from the study can also be shared with the participating community and should add to their knowledge about their own lives; contributing to the abstract “international research community” is not enough.

Enacting these measures requires more of researchers: some have the potential to affect the legitimacy of the evaluation results if they are not properly accounted for in analysis. But a strong sense of ethics and a dedication to the population being served (often low-income individuals from the Global South, contrasted with well-off researchers from the West) demand that we take the extra time in our research to consider all ethical implications.

Originally published on my Unofficial Economist Medium publication, November 4, 2017.