Do counties who release more have higher crime rates (or more new filings)? (correlative)
- Get crime rate and release rate at the county level
- Regress! Use something to take into account the measurement error in the rate data. (see notes below)
- Assess result for causal version (is the correlation big enough that it's worth refining?)
For bullet 2, it's tricky. First, you need to estimate the errors in the rate calculation. This page is a great resource for estimators. Normal approx is fine for N >= 30. For less than that, you need to worry a little about non-gaussianity, but don't go too crazy -- use the easiest estimator to implement that you can justify. https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
Try WLS in statsmodels https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.WLS.html as a first pass using 1/variance as your weights. See the next ticket for refinement!
Do counties who release more have higher crime rates (or more new filings)? (correlative)
For bullet 2, it's tricky. First, you need to estimate the errors in the rate calculation. This page is a great resource for estimators. Normal approx is fine for N >= 30. For less than that, you need to worry a little about non-gaussianity, but don't go too crazy -- use the easiest estimator to implement that you can justify. https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
Try WLS in statsmodels https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.WLS.html as a first pass using 1/variance as your weights. See the next ticket for refinement!