Data Analytics

### Designing Experiments Homework #4

Individual Questions

Thornton estimates the CACE (LATE) using two-stage least squares regression, which allows her to take advantage of the fact that she randomized (i) whether people received an incentive, (ii) the incentive amount, and (iii) the distance to the voluntary counseling and testing (VCT) centers where they could obtain their test results. These randomized variables are forms of encouragement (instrumental variables) that affect exposure to the treatment, i.e., individual obtaining a test result. So, in effect, when Thornton says she’s estimating a CACE (LATE), she is correct, but what she
is not making clear is that her CACE is a weighted mix of different complier groups. For example, the people who are Compliers at high incentive values may be

Never Takers at low incentive values. Or, even more complicated, Compliers at high incentive values and close clinic distances may be Never Takers at high incentive values and far clinic distances. The advantages of randomizing multiple forms of encouragement are (1) you can estimate the ATEs of each form of encouragement (often, getting people to take-up the treatment is an important objective in itself; we want to learn, for example, if paying people to get their test results is more cost-effective than bringing the VCT center closer to them); (2) you make it more likely that you will generate an increase in the probability of exposure (i.e., avoid a “weak” instrumental variable problem that we talked about in class, which can lead to bias in your CACE estimator); and related to (2), (3) you make it more likely that your Complier subgroup will be a larger fraction of your overall approach is that it’s harder to interpret the causal effect: we know it’s an ATE for “the Complier” subgroup, but it’s hard to communicate who comprises this subgroup (i.e., how are they different from the rest of the population in the experiment).
In this homework, we’ll just use the binary versions of the incentives (any = 1 if the individual received any positive incentive) and the distance of the VCT (under = 1 if distance of the VCT is under 1.5 km).

1. a. Estimate the CACE of getting the HIV test result (D=got) on condom purchases (Y=numcond) using the randomization of incentives (Z=any). Just estimate the CACE; you do not have to estimate its standard error. [Hints: Section 6.2 of your book explains how to estimate CACE, as did lecture. You do not need regression analysis to answer this question. You need to calculate 4 numbers from the data and use those numbers to calculate the ratio that yields the CACE estimate]

[Note: If you are using Excel to do your calculations, I recommend you Save the file with a new name and drop all the individuals who are missing numcond data (missing values in that cell). Just sort the data by numcond and delete all rows that have missing numcond values.]

1. b. Describe in words what this CACE represents. Pretend you are explaining it to your boss, who knows nothing about two-sided noncompliance, what the number means, and for what population it’s relevant.

2.a. Estimate the CACE of getting the HIV test result (D=got) on condom purchases (Y=numcond) using the randomization of VCT center distance (Z=under). Just estimate the CACE; you do not have to estimate its standard error. [Hint: Read the hint for Q1]

2.b. Describe in words what this CACE represents. Pretend you are explaining it to your boss, who knows nothing about two-sided noncompliance, what the number means, and for what population it’s relevant.

2.c. The CACE estimated using under as Z is different from the CACE estimated using any as Z. What are potential reasons for this difference?

1. a. (i) In words, state what the monotonicity assumption means in the context of randomizing the location of VCT centers where the experimental participants could get their HIV test results? (ii) What do you think about the plausibility of this assumption? There is no single correct answer. Good answers show some thought and an understanding of monotonicity.

3.b. (i) In words, state what the exclusion restriction (the assumption of excludability) means in the context of randomizing the location of VCT centers in order to estimate the effect of getting test results on risky behaviors? (ii) What do you think about the plausibility of this assumption. There is no single correct answer. Good answers show some thought and an understanding of excludability.

1. Make sure you’ve read Section 7.4 in your textbook, which discusses bounding average treatment effects when we have attrition (the topic of the chapter assigned for this week). In Thornton’s study, only 1524 of the 2812 individuals were located at follow-up and asked how many condoms they wished to purchase. Thus, we have attrition in this experiment. Thornton argues that the attrition is random (or random conditional on covariates) and thus is not a threat to the internal validity of the experimental design (i.e., not a source of bias in her estimator of the CACE). Let’s imagine we don’t believe her argument and want to put bounds on the estimated CACE in Q1a. Using the minimum number of condoms purchased (0) and the maximum (18) as the extreme values for the range of plausible potential outcome values, calculate lower and upper bounds on the CACE. To make this question easier to answer, you should make the assumption your textbook makes and assume everyone responds the same to the treatment (homogenous treatment effects). In other words, Compliers, Never Takers and Always Takers all respond the same to the treatment (get). With that assumption, we can then assign best and worst- case scenarios for those people with missing Y values (“attriters”).*

Hint: First, assume the “best case” scenario for the attriters in terms of the largest possible positive treatment effect. In other words, using the max (18) and min (0) numcond values, fill in the missing condnum values for treated (get=1) and untreated (get=0) attriters such that you will get the largest positive CACE possible. Then assume the “worst-case” scenario for the attriters in terms of the largest negative CACE.

* Why are we making this assumption? Remember, we cannot identify the Never Takers in the no-incentive group: they are mixed in with Compliers to form the group of people who do not obtain their test results. And we cannot identify the Always Takers in the incentive group: they are mixed in with Compliers to form the group of people who obtain their test results. So, to do the bounding analysis, we must assume all participants respond the same to the treatment.