Regress to Impress

Fear, loathing, Data Science


Data Science

Time Keeps on Slipping: Exploiting Time for Causal Inference with Difference-in-Differences and Panel Methods

Note: This post assumes a passing familiarity with linear regression. Aside from that, it's a highly applied intro to D-in-D regression and panel data techniques. In Due Time In one of my favorite episodes of Futurama, the universe experiences "time skips."... Continue Reading →

A Dramatic Tour through Python’s Data Visualization Landscape (including ggpy and Altair)

Why Even Try, Man? I recently came upon Brian Granger and Jake VanderPlas's Altair, a promising young visualization library. Altair seems well-suited to addressing Python's ggplot envy, and its tie-in with JavaScript's Vega-Lite grammar means that as the latter develops new... Continue Reading →

Analyze Your Experiment with a Multilevel Logistic Regression using PyMC3​

Note: In this post, I assume some familiarity with PyMC. If you need to get up to speed in a hurry and you're familiar with linear regression, go here for a tutorial. Alternatively, you can read for the methodological intuition,... Continue Reading →

Using Data to Hold Crappy Businesses Accountable (Airline Edition)

Contextualizing My Vendetta I've been on a streak of bad flights lately. The last two, in particular, were horrible -- and not horrible in the standard "cramped seats/rubbery food/my-God-that-smell" way. Horrible due to (A) an unexplained cancellation, which turned my 12-hour... Continue Reading →

Clustering the 25 Best Songs I’ve Heard in 2014 (So Far)

And now for something completely different This is (nominally) a Data Science blog, but I do have other interests. One of those other interests is music, and now that I have a platform for forcing my opinion onto others, I... Continue Reading →

Bayesian Regression with PyMC: A Brief Tutorial

Warning: This is a love story between a man and his Python module As I mentioned previously, one of the most powerful concepts I've really learned at Zipfian has been Bayesian inference using PyMC. PyMC is currently my favorite library... Continue Reading →

Matplotlib, SciPy, NumPy, and pandas: Coming Together in Perfect Harmony

The Point of this Post: To Document an Example In this update, we'll cover reading data into a pandas DataFrame, Seaborn, creating multi-plot figures with matplotlib.pyplot.subplots(), LaTeX labeling, and parameterizing Gamma distributions using SciPy. I've been sitting on this example... Continue Reading →

Blog at

Up ↑