Good Things Happen When Data Science Meets Cybersecurity

These days, when your charge card is compromised, your bank knows about it almost instantly. It uses sophisticated algorithms to detect anomalous behavior. If it notices that your charge card was used in Denver, but it also has record of you last using it just a few minutes ago in Chicago, it will have a pretty good indication that something is awry. The bank’s anomaly detection algorithms can also detect other patterns, such as multiple small purchases at stores you don’t frequent, raising alarm bells that the thief is trying to test the waters before trying to get away with a larger purchase. As your credit card bank, it has a detailed record of your purchasing profile. From this, it can develop a mathematical model of your habits. Anything that deviates from that model is flagged as an anomaly, and it will act on this by shutting down your current card and issuing you a new one.

Why not do the same thing with attempted cyber attacks? For the most part, the tools necessary to accomplish this have been around for a long time. Most corporate cyber defense systems include an intrusion prevention system (IPS) and an intrusion detection systems (IDS). Their names explain what they do. An IDS actively monitors a system at all its connections, detecting when an attempted attack is under way by identifying unexpected traffic and issuing alerts. An IDS is passive in that it only detects; it does not respond. An IPS, on the other hand, tries to counteract the attack flagged by the IDS. It works with firewalls and other network appliances to try to close the doors on the attack and stop it before it makes off with or compromises your organization’s data.

Take these tools a step further, and you arrive at Unified Threat Management (UTM) and Security Information and Event Management (SIEM) tools. A UTM combines IDS and IPS systems with firewalls and distributed anti-virus tools into a single appliance that creates an easier-to-configure wall around your infrastructure. The downside of having a single appliance, of course, is that it poses a single point of failure, a lone precious target for breaking your defenses. Still, their ease of setup makes UTMs an attractive option for some cybersecurity shops willing to risk putting all their digital eggs in one virtual basket.

SIEM takes a different approach. Rather than centralize IDS, IPS, firewalls, and other defense applications in one appliance, SIEM provides a platform that hosts all these services and enables them to exchange data in real time. By hosting the data exchange, collecting the communications, and analyzing and presenting dashboard-level information about what it has collected, SIEMs perform important data collection and organization tasks that can help inform both automatic and human-led cyber defenses. SIEM doesn’t present a single-point of failure. Rather, it allows all the network’s systems to communicate and work together to create an unprecedentedly detailed picture of the current health of the overall infrastructure and inform a plan for response should one be required.

Now, imagine if you could create a SIEM that gathered data not only from your local network’s appliances, but from the world’s. Rather than constrain your vision just to what your local system is detecting, suppose you could supplement your parochial perspective with what is transpiring beyond your own infrastructure, perhaps at organizations within the same geographic region, or within the same industry sector, or of similar size or purpose, right now, as you consider which defenses to implement to protect your own system. In other words, suppose you could view and use not only the data from your own SIEM, but from any other organization’s SIEM, thus giving you a much wider view of hacker activity. Chances are good that you’d see parallels between your own experience and those of your neighbors, which you would not be able to see by looking at just your own system

What you’d end up with is a volume of relevant data akin to what credit card companies use to detect anomalous charges. And you’d be similarly well equipped to detect peculiar uses of your infrastructure with surprising speed and effectiveness. Of course, the volume of data you’d have to sift through to draw conclusions would be immense, but that challenge is nothing new. Data Scientists do just that for virtually every industry today, so you’d have plenty of expertise from which to draw.

We don’t have to talk just hypothetically about this promising state of affairs. A new breed of cyber intrusion and detection data collection tools have emerged, including a new product from Microsoft called Azure Sentinel. One of the students in our Master of Science in Data Science program, in fact, serves as its Senior Program Manager. Azure is Microsoft’s cloud computing platform and, as such, it has access to immense quantities of data. The Azure Sentinel platform collects and organizes data from SIEMs across multiple industries and makes it available to cyber defense operations who have purchased the product to help identify emerging threats before they arrive with full force. The platform can do this because it has access to all the data needed to develop models of emerging threats.

As your own local SIEM collects data from all your appliances, Sentinel compares that collected data with its models of current and recently detected threats. It is thus able to flag threatening trends in precisely the same way your credit card bank does: by comparing observed behaviors. This helps protect you from attempted cyber attacks before they penetrate into your system, sparing you from the damage they might otherwise inflict.

Data Science, the fast-growing niche of Computer Science that specializes in collected, analyzing, visualizing, communicating, and securing large quantities of data, is being used to streamline processes and boost outcomes in virtually every industry. Cybersecurity is no exception, and Microsoft’s new Azure Sentinel is just one example that demonstrates what’s possible when you combine Data Science and Cybersecurity.

About Ray Klump

Professor and chair of Mathematics and Computer Science Director, Master of Science in Information Security Lewis University http://online.lewisu.edu/ms-information-security.asp, http://online.lewisu.edu/resource/engineering-technology/articles.asp, http://cs.lewisu.edu. You can find him on Google+.

Leave a Reply

Your email address will not be published. Required fields are marked *