The Problem: Virtual security appliances offer numerous advantages, particularly with hypervisors and orchestration being commoditized to the point where most organizations can leverage these benefits. However, scaling these services poses significant challenges. To optimize hardware, the stack, and utilize offload accelerators, one must navigate driver issues and DPDK dependencies, which require perseverance to solve. Our client needed a solution to identify and mitigate DDoS attacks originating from within their network, protecting both their customers and their infrastructure. This solution needed to scale from hundreds of gigabits per second to terabits per second using off-the-shelf merchant components to meet cost objectives.
The Insight: A few Bluedot Insight members, drawing from their experience at a previous company, provided crucial insights that revolutionized the project. One team member introduced the concept of congestion control and rotating buckets, significantly enhancing the system’s performance. The breakthrough involved creating an alert-based heartbeat mechanism that would send packets with specific IP addresses and embedded sequence numbers into the IDS.
By implementing a heartbeat rule in Suricata, the system could alert on received heartbeats, sending alerts back to the controller to calculate latency and the number of alerts received, thus producing a health score.
This health score informed the congestion control algorithm, enabling it to manage traffic distribution based on the IDS servers’ processing capabilities.
The Implementation: For the initial implementation, we utilized Suricata as the IDS detection engine and employed a P4-based fabric to replicate all incoming traffic, load-balancing it across six Suricata servers in a cluster using a Lanner HTCA-6600 with a P4-based switch fabric.
The congestion control mechanism involved an alert-based heartbeat, which monitored the health of the Suricata servers and adjusted traffic distribution accordingly.
Additionally, we modified the enumerated-hash based load balancer to rotate through buckets, allowing for a statistical downsampling of traffic sent to each IDS. This modification was crucial for handling DDoS attacks and scenarios where traffic exceeded inspection capabilities.
We tested the system using a commercial NGFW virtual appliance, generating 20G SYN Flood traffic alongside 20G of non-attack traffic. With congestion control enabled, the system effectively mitigated the attack within three minutes, processing 25,000 alerts per second at a ~70% downsample rate.
In contrast, without congestion control, the NGFW failed to handle the attack, rendering it inactive and unable to mitigate the threat.
This innovative approach not only resolved the client’s immediate problem but also provided a scalable, cost-effective solution that could benefit other system integrators facing similar challenges.