Using Data to Hold Crappy Businesses Accountable (Airline Edition)

Contextualizing My Vendetta

I’ve been on a streak of bad flights lately. The last two, in particular, were horrible — and not horrible in the standard “cramped seats/rubbery food/my-God-that-smell” way. Horrible due to (A) an unexplained cancellation, which turned my 12-hour nonstop flight into a 20-hour two-stop ordeal; and (B) an “airplane reconfiguration,” which I learned is an Airline-ism for “Oh crap. I guess we crappily overestimated the demand for this flight. Let’s swap our underbooked big plane for an overbooked small plane and pray nobody realizes.”

And they almost got away with it! Except I did realize — not because of any unique insight, but because the seat I paid for disappeared.

Both times, I imagined how the world might look if companies in other industries could comparably punk you. If, say, American car companies could sell you inferior, unsafe products that we subsidize through tax breaks. If film studios could arbitrarily split your movie-going experience into two unnecessary halves. If concert venues could replace the show you’d been looking forward to for weeks with Fredpyhus!, which is just an aging, world-weary Fred Durst doing a four-hour spoken word version of “Rollin (Air Raid Vehicle)” as he crouches naked in front of a Britney Spears mannequin.

Oh hey! I guess we’re ~75% of the way to our first-world service dystopia. But at least in the airline context, the incompetence may have been cute when their economics made them a kind of airborne Sears. However, they’re Sears no more. Like the rest of the American economy, they’ve consolidated, automated, self-regulated, and told labor to seek returns elsewhere, thankyouverymuch. They’re profitable. Really profitable! And yet the experience remains a train wreck (to use the contrarian metaphor).*

And look: I’ll grant you… the above has fallen squarely in the Grand Internet Tradition of “outrage sans solution,” but I’m going to propose one: let’s hold crappy airlines accountable, and let’s do that through data… Indeed, I was all set to rant about the specific screwball airline in my intro paragraph, but then I thought: hey, I’ve convinced people I know stuff about data I’m a Data Scientist — I should see if I caught that airline on a couple of off days, or if they really are screwballs; in either case, I’ll hopefully do some good.

After all, as George Akerlof and Robert Shiller argue in their book Phishing for Phools, it’s actually the natural state of a competitive market for consumers to get punked by unsavory business practices (sorry, Econ 101 Professor). And since it doesn’t seem like we’re going to get governmental help here anytime soon, we need to protect ourselves.

(I should pause to note: if you’re already sick of me and want to get straight to the code, please have at it!)

The (Preliminary) Analysis: Screw You, JetBlue

We consider all flights that occurred between January 2015 and February 2016 (the last month of data available from the Department of Transportation). To ensure we’re making meaningful comparisons, we limit the flights in our dataset to (A) “major” airlines (defined by yours truly); and (B) “busy” routes, which I’ve defined as routes being serviced by three or more airlines. (This also makes the analysis more actionable; if your route doesn’t have many options, then you’re out of luck anyway.) Naively, we can then ask, “What proportion of each airline’s flights were delayed by 15 or more minutes?”** (this is a stat the DoT tracks)

We learn that along this dimension, airlines fall into three tiers:

There’s American and Delta, at around ~17% delay rates (which I guess we’ll have to consider “good”)
There’s Virgin, United, and Southwest, who make up the ambiguous middle
There’s JetBlue, boasting a delay rate of 25% (!)

Visualizing delay rates by month allows us to see that these tiers don’t seem totally attributable to momentary rough patches, but are persistent features of how these airlines do business (note that Delta consistently outperforms, while JetBlue consistently underperforms):

The (Final) Analysis: Screw You, Southwest

And although you can tell a rather convincing tale with the last two graphs, you should not be convinced… yet! Indeed, I’ll admit: I had something of an ulterior motive for writing this post. Namely, data-driven journalism is v. trendy, and has all the trappings of objectivity; however, data can tell very different tales depending on the level of rigor you apply. In particular, in the graphs above, I pooled all the data — which is to say, I completely ignored the difficulty of the routes each airline usually flies. As anyone from San Francisco knows, not all airports are created equal: some can fog you in for hours; others are conducive to quick getaways. To show you how important this can be, consider the delay rate for the most heavily-trafficked route in our dataset, SFO to LAX:

So if you’re a Baydestrian trying to get to LA, Delta is the JetBlue, and the other airlines are basically the same.

The reason I’m harping on this point — other than to be a statistical pedant, of course — is to point out that when you’re reading data-oriented journalism online, you should be wary. Because of the way the Internet works (“click, baby, click!” – Sarah Palin, let’s say), people need to pump out content, but pumping out real analysis is hard. It takes work, which means many analyses end up being first-order. This can be fine, but sometimes, you have to dig deeper. Here’s a case where you do***.

(After all, even if I hate the whole airline industry right now, it deserves a fair shake.)

Alright, then, smart guy, how do we proceed?

Well, let’s do some second-order thinking. We have a ton of data, so we can control for all the other important variables****: the distance of the flight, the day of the week of the flight, the origin, and the destination. By controlling for all of this, we can estimate the “true” delay rate for each of these airlines (i.e., the delay rate you’d get if they all flew to and from the same airports on the same days at the same distances). So what happens when we do? This…

There’s been some movement! JetBlue is still crap, but you have another good option, as Virgin moves from being a middle-tier carrier to the best carrier, with American, Delta, and United all decent-to-middling options. These improvements, of course, make Southwest look relatively worse.

Having successfully isolated the crappy airlines from the rest of the pack (along this single dimension), let’s factor in cancellation rates as another dimension (again controlling for all of the above) so that we can give a more nuanced recommendation:

And now, some genuinely concrete tiers have emerged: Southwest is even crappier than we previously realized, with cancellation rates two times higher than the crappy cancellation middle tier of United and American. We also learn that perhaps one reason JetBlue delays so many flights is so that it can avoid canceling them, so they should probably get some credit. Putting it all together, then, the takeaways become:

Take Virgin
Then Take Delta
Then Pray…
…Especially On Southwest

In Closing

So what was the airline that royally screwed me twice in a row? Well, let’s just say I’ll be booking with them on my next trip… just not from SFO to LAX.

*And I didn’t even touch on the security that emphasizes showmanship over results!

**All plots made in seaborn

***Note: I am not talking about FiveThirtyEight, which I don’t read religiously but seems to do awesome/rigorous election/sports work (with some clickbait thrown in to make ends meet)

****Concretely, I fit a logistic regression of the form “Delay ~ Airline + Origin Airport + Destination Airport + Distance Flown + Day of Week”, and use the “effects” package in R to translate the “Airline” coefficients into something more grok-able*****

*****I should note: I am always and forever about the reproducibility of analysis; you can take a look here for my hopefully human-readable code.

Dan Saber