Despite advances in our understanding of quasi-experimental methods, there will likely remain demand to evaluate programs using observational methods like regression and matching. To evaluate the observational bias in these methods we collected data from a large number of RCTs with imperfect compliance (ICRCTs) conducted over the last 20 years. We create comparable observational and experimental estimates of treatment effects, and use these to estimate bias in each study. We then use meta-analysis to quantify the average direction of bias and uncertainty about its size. We find little evidence of average bias but large uncertainty. We suggest adjusting standard confidence intervals to take this uncertainty into account. Our preferred estimates imply that a hypothetical infinite N observational study has an effective standard error of over 0.16 standard deviations and hence a minimal detectable effect of more than 0.3 standard deviations. We conclude that – given current evidence – observational studies cannot be used to provide information about the impact of many programs that in truth have important policy relevant effects, but that collecting data from more ICRCTs may help to reduce uncertainty and increase the effective power of observational program evaluation.
Written with David Rhys Bernard (Paris School of Economics), Sylvain Chabé-Ferret (Toulouse School of Economics), Jon de Quidt (Stockholm University), Jasmin Claire Fliegner (University of Manchester), Roland Rathelot (Institut Polytechnique de Paris (ENSAE)