I’m pretty sure I just solved life

Disclaimer: I was a little drunk on power (calculations) when I wrote this, but it’s me figuring out that econometrics is something I might want to specialize in!

I think I just figured out what I want to do with the rest of my career.

I want to contribute to how people actually practice data analysis in the development sector from the technical side.

I want to write about study design and the technical issues that go into running a really good evaluation, and I want to produce open source resources to help people understand and implement the best technical practices.

This is always something that makes me really excited. I don’t think I have a natural/intuitive understanding of some of the technical work, but I really enjoy figuring it out.

And I love writing about/explaining technical topics when I feel like I really “get” a concept.

This is the part of my current job that I’m most in love with. Right now, for example, I’m working on a technical resource to help IDinsight do power calculations better. And I can’t wait to go to work tomorrow and get back into it.

I’ve also been into meta-analysis papers that bring multiple studies together. In general, the meta-practices, including ethical considerations, of development economics are what I want to spend my time working on.

I’ve had this thought before, but I haven’t really had a concept of making that my actual career until now. But I guess I’ve gotten enough context now that it seems plausible.

I definitely geek out the most about these technical questions, and I really admire people who are putting out resources so that other people can geek out and actually run better studies.

I can explore the topics I’m interested in, talk to people who are doing cool work, create practical tools, and link these things that excite me intellectually to having a positive impact in people’s lives.

My mind is already racing with cool things to do in this field. Ultimately, a website that is essentially an encyclopedia of development economics best practices would be so cool. A way to link all open source tools and datasets and papers, etc.

But top of my list for now is doing a good job with and enjoy this power calculations project at work. If it’s as much fun as it was today, I will be in job heaven.

Why you should convert categorical variables into multiple binary variables

Take the example of a variable reporting if someone is judged to be very poor, poor, moderately rich, or rich. This could be the outcome of a participatory wealth ranking (PWR) exercise like that used by Village Enterprise.

In a PWR exercise, local community leaders can identify households that are most vulnerable. These rankings can then be used to target a development program (like VE’s graduation-out-of-poverty program that combines cash transfers with business training) to the community members that are most in need.

Let’s say that you want to include the PWR results in a regression analysis as a covariate. You have a dataset of all the relevant variables for each household, including a variable that records whether the household was ranked in the PWR exercise as very poor, poor, moderately rich, or rich.

You need to convert this string variable (text) into a numeric value. You could assign each option a value from 1 to 4, with 1 being “very poor” and 4 meaning “rich” … but you shouldn’t use this directly in your regression.

If you have a variable that moves from 1 to 2 to 3 to 4, you’re implying that there is a linear pattern between each of those values. You’re saying that the effect on your outcome variable of going from being very poor (1) to poor (2) is the same as the effect of going from poor (2) to moderately rich (3). But you don’t know what the real relationship is between the different PWR levels, since the data isn’t that granular. You can’t make the linear assumption.

So instead, you should use four different binary variables in your regression: Ranked “very poor” or not? “Poor” or not? “Moderately rich” or not? “Rich” or not?

This Stata support page does a great job of summarizing how to apply this in your regression code or create binary variables from categorical using easy shortcuts. I like:

reg y x i.pwr

But how do you interpret the results?

When you create dummies (binary variables) out of a categorical variable, you use one of the group dummies as the reference group and don’t actually include it in the regression.

By default, the reference group is usually the smallest/lowest group. In this case, that means “very poor.” So in the regression, you’ll have three dummies, not four. Being “very poor” is the base condition against which to compare the other rankings.

Let’s say there is a statistically significant, positive coefficient on the “moderately rich” dummy in your regression results. That means that, compared to the base condition of being very poor, being moderately rich has a positive effect on your outcome variable.

Meditation Pact

My best friend Riley and I agreed to meditate every day for the next 10 days.

I got up early this morning for the first day – it’s starting to be winter in Nairobi so I was snug in warm clothes when I did my 10 minutes on the balcony.

The part I love most about the Headspace meditation is when you let go of all thoughts and let your brain wander, then center back into your body and physical sensations. Always makes me feel light but grounded.

3ie: Improve power calculations with a pilot

3ie wrote on June 11 about why you may need a pilot study to improve power calculations:

  1. Low uptake: “Pilot studies help to validate the expected uptake of interventions, and thus enable correct calculation of sample size while demonstrating the viability of the proposed intervention.”
  2. Overly optimistic MDEs: “By groundtruthing the expected effectiveness of an intervention, researchers can both recalculate their sample size requirements and confirm with policymakers the intervention’s potential impact.” It’s also important to know if the MDE is practically meaningful in context.
  3. Underestimated ICCs: “Underestimating one’s ICC may lead to underpowered research, as high ICCs require larger sample sizes to account for the similarity of the research sample clusters.”

The piece has many strengths, including that 3ie calls out one of their own failures on each point. They also share the practical and cost implications of these mistakes.

At work, I might be helping develop an ICC database, so I got a kick out of the authors’ own call for such a tool…

“Of all of the evaluation design problems, an incomplete understanding of ICCs may be the most frustrating. This is a problem that does not have to persist. Instead of relying on assumed ICCs or ICCs for effects that are only tangentially related to the outcomes of interest for the proposed study, current impact evaluation researchers could simply report the ICCs from their research. The more documented ICCs in the literature, the less researchers would need to rely on assumptions or mismatched estimates, and the less likelihood of discovering a study is underpowered because of insufficient sample size.”

…although, if ICCs are rarely reported, I may have my work cut out for me!