Death by Numbers: The Perils of Big Data

Cathy O’Neil’s Weapons of Math Destruction gives a provocative account of all the ways the big data push is harming us. O’Neil, a mathematician and author of the blog, describes how some of today’s most popular and widely used data metrics, including college rankings, e-scores, FICO scores, teacher accountability rankings, investment ratings, and insurance metrics, are making life considerably more difficult for numerous groups of people, particularly racial and ethnic minorities and the underprivileged. She calls these tools “weapons of math destruction”, or WMDs.

O’Neil describes WMDs as mathematical models that have three characteristics. WMDs are opaque: the factors that go into computing them and how they are combined are known only by the authors. They scale phenomenally well, as they depend on data that is readily available for practically everyone, and they combine the data together to yield an easy-to-communicate, dashboard-friendly number that is quick to interpret. Finally, they cause immense damage, because no mathematical model is complete or perfect, and the people harmed by the holes in the algorithm don’t get factored back into it to make it more complete. So, the holes remain, harming even more people in the same way.

The author aptly describes models as opinions expressed in math. A model combines pieces of data together in ways that reflect the biases and concerns of the person who developed it. The model will include only the data items its creator feels are relevant. Sometimes, the data a model employs doesn’t directly assess what the model needs to capture, so the model then uses proxies, data items that indirectly speak to the issue the model is trying to capture. For example, it is obviously illegal for an employer or a landlord to use race to screen candidates. If he uses a computer algorithm to screen by zip code or last name, however, he has achieved practically the same effect. Proxies can insidiously reinforce racial prejudices and discrimination, and they can worsen the socioeconomic conditions of those who are already challenged by them.

And yet, WMDs tend to go unquestioned. When exceptions to the models arise and people harmed by them are identified, they tend to be treated as collateral damage, and the models themselves aren’t revised. There is no feedback loop to ensure that that teacher who was fired because his instructional effectiveness rating, calculated from his student’s test scores, failed to account for the fact that his students’ previous school gamed the numbers, or that all of his students come to school hungry because their families are poor. The models are, literally and figurative, heartless. They encode only the data their creators included and ignore everything else.

O’Neil gives a particularly vivid account of how the US News and World Report‘s annual college ranking has helped balloon higher education costs and limited educational options for students. Started in 1983 as a way for a second-tier competitor of Time and Newsweek to boost readership, the college ranking US News created was originally based entirely on academic reputation as rated by college presidents. Academic reputation is rather self-fulfilling, and the results were what you’d expect: the Ivy Leagues came out on top. Then, in 1988, US News created a data model for ranking the universities. Academic reputation would account for only a fourth of the ranking. The rest of a school’s ranking would be determined by factors such as selectivity and alumni giving rate, which are proxies for student success and student satisfaction, respectively. They needed a ranking people would trust and expect, so they adjusted the factors the model used to combine these pieces of data so that they would still end up with a believable ranking, one that showed Stanford, Harvard, Yale, Princeton, and MIT at the top. There was nothing scientific about the model: it was an opinion guided by a goal expressed in math.

Thirty years later, colleges make many decisions aimed at boosting their ranking. Some even go so far as to ask already admitted students to retake the SAT so that their selectivity ranking will increase. And many schools have invested extravagantly in making their campuses more resemble country clubs than places where learning takes place, all to lure the best and brightest to campus so that their US News ranking will increase. The students pay for that. Average tuition has increased 500% since the 1980s. While there isn’t a direct cause and effect between US News and the student loan debt problem, colleges’ efforts to score favorably on a non-expert’s opinion of what constitutes a prestigious academic institution certainly stretches universities’ balance sheets. The burden to pay for that falls on the students who chose their schools, in part, on that ranking.

If you are interested in the effects of big data on everyday life, and if you feel compelled to start coming up with solutions, I encourage you to read Cathy O’Neil’s Weapons of Math Destruction. It has certainly sobered my zeal for algorithms more than a bit. It has caused me to consider implications I had not recognized before but should have.

About Ray Klump

Professor and chair of Mathematics and Computer Science Director, Master of Science in Information Security Lewis University,, You can find him on Google+.

Leave a Reply

Your email address will not be published. Required fields are marked *