Overview: “We estimate the causal effects of a country’s aid receipts on both total refugee flows to the world and flows to donor countries.”
Data: “Refugee data on 141 origin countries over the 1976–2013 period [combined] with bilateral Official Development Assistance data”
Identification strategy: “The interaction of donor-government fractionalization and a recipient country’s probability of receiving aid provides a powerful and excludable instrumental variable (IV) when we control for country- and time-fixed effects that capture the levels of the interacted variables.”
Findings: “We find no evidence that aid reduces worldwide refugee outflows or flows to donor countries in the short term. However, we observe long-run effects after four three-year periods, which appear to be driven by lagged positive effects of aid on growth.”
Overview: “Temporary migration to developing countries might play a role in generating individual commitment to development”
Data: “unique survey [of Mormon missionaries] gathered on Facebook”
Identification strategy: “A natural experiment – the assignment of Mormon missionaries to two-year missions in different world regions”
Findings: “Those assigned to the treatment region (Africa, Asia, Latin America) report greater interest in global development and poverty, but no difference in support for government aid or higher immigration, and no difference in personal international donations, volunteering, or other involvement.” (controlling for relevant vars)
Overview: “focus is internal replication, which uses the original data from a study to address the same question as that study”
Findings: “In all cases the pure replication components of these studies are generally able to reproduce the results published in the original article. Most of the measurement and estimation analyses confirm the robustness of the original articles or call into question just a subset of the original findings.” + some advice info on how to better translate study findings into policy
Preface: I always say I want to read more papers & summarize them. That can seem like an overwhelmingly massive undertaking. But I am forging ahead! This is the first step of what I hope to be a regular habit of reading and summarizing papers. “Building State Capacity” raised a lot of interesting points – it’s the first paper I’ve read in a while. As I refamiliarize myself with academic writing and various development econ concepts, I hope to become increasingly concise.
Program: Use of biometric identification system to administer benefits from two large welfare programs
Where: Andhra Pradesh, India
When: 2010 (baseline) – 2012 (endline)
Sample: 157 sub-districts, 19 million people
Identification strategy: RCT
Payment collection became faster and more predictable
Large reductions in leakage (fraud/corruption)
Increase in program access: Reduction in gov’t officials claiming benefits in others’ names
Little heterogeneity of results: No differences based on village or poverty/vulnerability of HH
Strength of results: “Treatment distributions first-order stochastically dominate control distributions,” which means that “no treatment household was worse off relative to the control household at the same percentile of the outcome distribution”
Drivers of impact? (non-experimental decomposition)
For payment process improvement: changed organization responsible for managing fund flow and payments
For decrease in fraud: biometric authentication
Cost effective, for state and beneficiaries
Surveys: Baseline and endline household surveys (2 years between)
Randomization: Graduated rollout over 2 years. Treatment subdistricts were first wave, then buffer subdistricts (during survey time), then finally the control subdistricts (note: subdistricts = “mandals” in India)
Stratification: By district and a principal component of socioeconomic characteristics
Analysis: Intent-to-treat (ITT): “estimates the average return to as-is implementation following the ‘intent’ to implement the new system”
“Up-take”: 50% of payments transferred to electronic in 2 years
Main controls: district FEs, “the first principal component of a vector of mandal characteristics used to stratify,” baseline outcome levels where possible
Standard errors: clustered at mandal level (Lowest level of stratification)
No differential misreporting: not driving results either due to collusion btwn officials and respondents or to inadvertent recall problems
No spillovers: no evidence of either strategic spillovers (officials diverting funds to control mandals they can more easily steal from) or spatial spillovers (from neighbor gram panchayats – village counsels)
No effects of survey timing relative to payment time
No Hawthorne effects
Thoughts & Questions
“Evaluated at full scale by government”: This minimizes risks around external validity that are often an issue for studies on NGO-operated programs at a smaller scale. Vivalt (2019) found that programs implemented by governments had smaller effect sizes than NGO/academic implemented programs, controlling for sample size; Muralidharan and Niehaus (2017) and others have discussed how results of small pilot RCTs often do not scale to larger populations.
Love that they remind you of ITT definition in the text – makes it more readable. Also that they justify why ITT is the policy-relevant parameter (“are net of all the logistical and political economy challenges that accompany such a project in practice”)
Again, authors define “first-order stochastically dominate” in the text, which I was wondering about from the abstract. Generally, well-written and easy to understand after a while not reading academic papers all the time!
What does “non-experimental decomposition” mean? (This is describing how the authors identified drivers of treatment effects)
Is it particularly strong evidence that treatment distribution was first-order stochastically dominant over control distribution? How do we interpret this statistically? Logically, if treatment was better for all HHs, relative to the closest comparison HH, that’s a good sign. But what if your results were not stat sig but WERE first-order stochastically dominant? What would that mean for interpreting the results?
What is the difference between uptake and compliance? Uptake = whether treatment HHs take up the intervention/treatment. Compliance = whether the HH complies with its assigned status in the experimental design (applies to both treat and control households). Is that right?
What does “first stage” mean? In this paper, it seems to be asking, How did treat and control units comply with the evaluation design, and what is the % uptake? (Basically, did randomization meaningfully work?) Is this always what first stage means for RCTs? How does its meaning differ for other identification strategies?
Reminder: Hawthorne effects = when awareness of being observed alters study participant behavior
What does “principal component” mean? Is it like an index?
Authors note that the political case for investment in capacity depend on a) magnitude and b) immediacy of returns -> Does that mean policy makers are consistently biased toward policies w/ short-term pay-offs? (If yes, would expect there to be a drop-off for policies that have pay-offs on longer timeline than election cycle… or maybe policy just not data driven enough to see that effect?) Also, would this lead to fewer studies on long-term effects of programs/interventions with strong short-term payoffs because little policy appetite for longterm results?
Challenges of working in policy space: program was almost ended b/c of negative feedback from local leaders (whose rents were being decreased!), but evidence from study, including positive beneficiary feedback, helped state gov’t stay the course! Crazy!
Reference to “classic political economy problem of how concentrated costs and diffuse benefits may prevent the adoption of social-welfare improving reforms” In future, look up reference: Olson 1965
Type I and II errors being referenced in a new (to me) way: Type I as exclusion, Type II as inclusion errors … I know these errors in statistical terms as Type I = false positive (reject true null) and Type II = false negative (fail to reject false null). In the line following the initial reference, the authors seem to refer to exclusion errors as exclusion of intended recipients, so not sure if these are different types of errors or I’m not understanding yet. To be explored further in future.
Muralidharan, K., Niehaus, P., & Sukhtankar, S. (2016). Building state capacity: Evidence from biometric smartcards in India. American Economic Review, 106(10), 2895-2929.
Brought to you by #NEUDC2018! Check out mini summaries of the many awesome papers featured at this conference here, and download papers here. These are three that really struck me.
1. Psychological trainings increase chlorination rates Haushofer, John, and Orkin 2018: (RCT in Kenya) “One group received a two-session executive function intervention that aimed to improve planning and execution of plans; a second received a two-session time preference intervention aimed at reducing present bias and impatience. A third group receives only information about the benefits of chlorination, and a pure control group received no intervention.” Executive function and time preference trainings led to stat sig increases in chlorination and stat sig decreases in diarrhea rates.
2. Conditional cash transfers reduce suicides! Christian, Hensel, and Roth 2018: (RCT in Indonesia) This paper is so cool! One mechanism is by mitigating the negative impact of bad agricultural shocks and decreasing depression. “We examine how income shocks affect the suicide rate in Indonesia. We use both a randomized conditional cash transfer experiment, and a difference-in-differences approach exploiting the cash transfer’s nation-wide roll-out. We find that the cash transfer reduced yearly suicides by 0.36 per 100,000 people, corresponding to an 18 percent decrease. Agricultural productivity shocks also causally affect suicide rates. Moreover, the cash transfer program reduces the causal impact of the agricultural productivity shocks, suggesting an important role for policy interventions. Finally, we provide evidence for a psychological mechanism by showing that agricultural productivity shocks affect depression.”
3. Women police stations increased reporting of crimes against women Amaral, Bhalotra, and Prakash 2018: (in India) “Using an identification strategy that exploits the staggered implementation of women police stations across cities and nationally representative data on various measures of crime and deterrence, we find that the opening of police stations increased reported crime against women by 22 percent. This is due to increases in reports of female kidnappings and domestic violence. In contrast, reports of gender specific mortality, self-reported intimate-partner violence and other non-gender specific crimes remain unchanged.”
1. 11 years later: Experimental evidence on scaling up education reforms in Kenya (TL;DR gov’t didn’t adopt well)
(This paper was published in Journal of Public Econ 11 years after the project started and 5 years after the first submission!) “New teachers offered a fixed-term contract by an international NGO significantly raised student test scores, while teachers offered identical contracts by the Kenyan government produced zero impact. Observable differences in teacher characteristics explain little of this gap. Instead, data suggests that bureaucratic and political opposition to the contract reform led to implementation delays and a differential interpretation of identical contract terms. Additionally, contract features that produced larger learning gains in both the NGO and government treatment arms were not adopted by the government outside of the experimental sample.”
Total causal effect (TCE) = weighted average of the intent to treat effect (ITT) and the spillover effect on the non-treated (SNT)
Importance: “RCTs that fail to account for spillovers can produce biased estimates of intention-to-treat effects, while finding meaningful treatment effects but failing to observe deleterious spillovers can lead to misconstrued policy conclusions. Therefore, reporting the TCE is as important as the ITT, if not more important in many cases: if the program caused a bunch of people to escape poverty while others to fall into it, leaving the overall poverty rate unchanged (TCE=0), you’d have to argue much harder to convince your audience that your program is a success because the ITT is large and positive.”
Context: Zeitlin and McIntosh recent paper comparing cash and a USAID health + nutrition program in Rwanda. From their blog post: “In our own work the point estimates on village-level impacts are consistent with negative spillovers of the large transfer on some outcomes (they are also consistent with Gikuriro’s village-level health and nutrition trainings having improved health knowledge in the overall population). Cash may look less good as one thinks of welfare impacts on a more broadly defined population. Donors weighing cash-vs-kind decisions will need to decide how much weight to put on non-targeted populations, and to consider the accumulated evidence on external consequences.”
3. Why don’t people work less when you give them cash?
Excellent post by authors of new paper on VoxDev, listing many different mechanisms and also looks at how this changes by type of transfer (e.g. gov’t conditional and unconditional, remittances, etc.)
BONUS: More gender equality = greater differences in preferences on values like altruism, patience or trust (ft. interesting map)
“Identifying causal effects involves assumptions, but it also requires a particular kind of belief about the work of scientists. Credible and valuable research requires that we believe that it is more important to do our work correctly than to try and achieve a certain outcome (e.g., confirmation bias, statistical significance, stars). The foundations of scientific knowledge are scientific methodologies. Science does not collect evidence in order to prove what we want to be true or what people want others to believe. That is a form of propaganda, not science. Rather, scientific methodologies are devices for forming a particular kind of belief. Scientific methodologies allow us to accept unexpected, and sometimes, undesirable answers. They are process oriented, not outcome oriented. And without these values, causal methodologies are also not credible.”
How good is good? 6.92/10. The YouGov visualization on how people rate different descriptors on a 0-10 scale is really interesting if you look at the distributions – lots of agreement on appalling, average (you’d hope there would be clustering around 5!), and perfect. Then, pretty wide variance for quite bad, pretty bad, somewhat bad, great, really good, and very good. Shows how you should cut out generic good/bad descriptions in your writing and use words like appalling or abysmal that are more universally evocative.
“Consider a simple policy rule: if a government’s statistics cannot be questioned, they shouldn’t be trusted. By that rule, the Bank and Fund would not report Tanzania’s numbers or accept them in determining creditworthiness—and they would immediately withdraw the offer of foreign aid to help Tanzania produce statistics its citizens cannot criticize.”
2. 12 Things We Can Agree On About Global Poverty?
In August, a CGDev post proposed 12 universally agreed-upon truths about global poverty. Do you agree? Are there other truths we should all agree on?
3. Food for thought on two relevant method issues
Peter Hull released a two-page brief on controlling for propensity scores instead of using them to match or weight observations
Spillover and estimands: “The key issue is that the assumption of no spillovers runs so deep that it is often invoked even prior to the definition of estimands. If you write the “average treatment effect” estimand using potential outcomes notation, as E(Y(1)−Y(0))E(Y(1)−Y(0)), you are already assuming that a unit’s outcomes depend only on its own assignment to treatment and not on how other units are assigned to treatment. The definition of the estimand leaves no space to even describe spillovers.”
Service delivery will result in state legitimacy in war-affected contexts: Study in Afghanistan refutes this
Surgery is too expensive and not cost-effective in LMICs: Evidence that it is comparably cost effective to other public health interventions
2. Traditional local governance systems (autocratic) underutilize local human capital
A new paper by Katherine Casey, Rachel Glennerster, Ted Miguel, and Maarten Voors. “We experimentally evaluate two solutions to these problems [autocratic local rule by old, uneducated men] in rural Sierra Leone: an expensive long-term intervention to make local institutions more inclusive; and a low-cost test to rapidly identify skilled technocrats and delegate project management to them. In a real-world competition for local infrastructure grants, we find that technocratic selection dominates both the status quo of chiefly control and the institutional reform intervention, leading to an average gain of one standard deviation unit in competition outcomes. The results uncover a broader failure of traditional autocratic institutions to fully exploit the human capital present in their communities.“
3. Aggressive U.S. recruitment of nurses from Philippines did not result in brain drain / negative health impacts
A new paper by Paolo Abarcar and Caroline Theoharides. “For each new nurse that moved abroad, approximately two more individuals with nursing degrees graduated. The supply of nursing programs increased to accommodate this. New nurses appear to have switched from other degree types. Nurse migration had no impact on either infant or maternal mortality.”
BONUS. Data viz: Poverty persists in Africa, falls in other regions
Justin Sandefur shared that the Economist much improved a World Bank graphic to more clearly visualize how the number of people living in poverty has risen slightly in Africa while other regions have seen sharp decreases in # of people in poverty over time. (Wonder how the graphic would like stacked Africa, South Asia, then East Asia & Pacific? Less dramatic contrast between Africa and the other regions? Number of poor in South Asia hasn’t decreased as dramatically as East Asia, would look more similar to Africa trend than East Asia trend until about 2010 I think.)