Unravel Data Systems CTO on Big Data management for the security team

Unravel Data Systems CTO on Big Data management for the security team

Author: Shivnath Babu, co-founder/CTO at Unravel Data Systems and Adjunct Professor of Computer Science at Duke University on the Big Data - cybersecurity conversation

Shivnath Babu, co-founder/CTO at Unravel Data Systems and Adjunct Professor of Computer Science at Duke University, discusses how security teams can best manage Big Data in order to reap the benefits of it.

One of the technology industry’s biggest myths to date is why Big Data management is often a lost consideration when it comes to the security team. Recently, Unravel put together findings around Big Data applications and what really struck us was how those within cybersecurity are using these applications. 

The need for Big Data in security

First and foremost, respondents to the survey revealed that the most value derived from Big Data actually comes when leveraging it for security applications. Detecting fraud was the single most effective use case, with cybersecurity intelligence coming in third. 

This was hardly surprising, as security is at the top of everyone’s minds today given the very public threats, hacks and outages that have become a semi-regular news story. Modern security apps like fraud detection rely heavily on Artificial Intelligence and Machine Learning to work properly.

However, despite the value it brings, respondents also indicated that security analytics was the modern data application they struggled most to get right. This also didn’t surprise us too much, as it reflects the complexity of managing the volume and variety of real-time streaming data common in modern security apps.

The lofty data needs of cybersecurity

Cybersecurity is a difficult challenge from a Big Data point of view and many organisations are struggling with it. The hardest part is managing all of the streaming data that comes pouring in from the Internet, IoT devices, sensors, Edge platforms and other endpoints.

Streaming data comes often, piles up quickly and is complex. To properly manage this data and deliver working security apps the business needs the right solution that provides trustworthy workload management for cybersecurity analytics and offers the ability to track, diagnose and troubleshoot end-to-end across all of the data pipelines.

Real-time processing is not a new concept but the ability to run real-time apps reliably and at scale is. The development of open-source technologies such as Kafka, Spark Streaming, Flink and HBase have enabled developers to create real-time apps that scale, further accelerating their proliferation and business value.

Cybersecurity is critical for the well-being of enterprises and large organisations, but many don’t have the right data operations platform to do it correctly.

Example: Metron

Apache Metron is an example of a complex security data pipeline used within organisations.

To analyse streaming traffic data, generate statistical features and train Machine Learning models to help detect cybersecurity threats on large-scale networks – like malicious hosts in botnets – Big Data systems require complex and resource-consuming monitoring methods.

Security analysts may apply multiple detection methods simultaneously to the same massive incoming data, for pre-processing, selective sampling and feature generation, adding to the existing complexity and performance challenges.

Keep in mind, the applications often span across multiple systems (e.g. interacting with Spark for computation, with YARN for resource allocation and scheduling, with HDFS or S3 for data access, with Kafka or Flink for streaming) and may contain independent, user-defined programs, making it inefficient to repeat data pre-processing and feature generation common in multiple applications, especially in large-scale traffic data.

These inefficiencies create bottlenecks in application execution, hog the underlying systems, cause suboptimal resource utilisation, increase failures (e.g., due to out-of-memory errors) and more importantly, may decrease the chances of detecting a threat or a malicious attempt in time.

Approaches to manage these issues

An Application Performance Management full stack platform addresses these challenges and provides a compelling solution for operationalising security apps. Modern solutions leverage Artificial Intelligence and a variety of capabilities for enabling better workload management for cybersecurity analytics.

Key requirements to seek include:

  • Automatically identifying applications that share common characteristics and requirements and grouping them based on relevant data colocation (e.g., a combination of port usage entropy, IP region or geolocation, time or flow duration)
  • Recommendations on how to segregate applications with different requirements (e.g., disk i/o heavy preprocessing tasks vs. computational heavy feature selection) submitted by different users (e.g. SOC level 1 vs. level 3 analysts)
  • Recommendations on how to allocate applications with increased sharing opportunities and computational similarities to appropriate execution pools/queues
  • Automatic fixes for failed applications drawing on rich historic data of successful and failed runs of the application
  • Recommendations for alternative configurations to get failed applications quickly to a running state, followed by getting the application to a resource-efficient running state

Did we mention complexity?

Security applications which run on highly distributed modern data stacks are evidently too complex to manage and monitor manually. Furthermore, these are not the kind of applications that can fail without consequence and we aren’t just talking about minor inconveniences and lost revenue, the entire business is at risk if a security app fails.

Performance and reliability are a non-negotiable asset for all security apps. The best way to ensure this is by leveraging an AI-enabled platform which allows for real-time application monitoring of any potential failures. 

Speed is of the essence when it comes to these applications and only an AI-driven solution can deliver the right level of speed necessary to keep security tight and any potential threats managed.

Browse our latest issue

Intelligent CISO

View Magazine Archive