March 18, 2025

ikayaniaamirshahzad@gmail.com

How We Built a LangGraph Agent To Prioritize GitOps Vulns


In today’s complex Kubernetes environments, managing and prioritizing vulnerabilities can quickly become overwhelming. With dozens or even hundreds of containers running across multiple services, how do you decide which vulnerabilities to address first?

This is where AI can help, and in this article we’ll share our experience building HAIstings, an AI-powered vulnerability prioritizer, using LangGraph and LangChain, with security enhanced by CodeGate, an open source AI gateway developed by Stacklok.

Too Many Vulnerabilities, Too Little Time

If you’ve ever run a vulnerability scanner like Trivy against your Kubernetes cluster, you know the feeling: hundreds or thousands of common vulnerabilities and exposures (CVEs) across dozens of images, with limited time and resources to address them. Which ones should you tackle first?

The traditional approach relies on severity scores (i.e., critical, high, medium, low), but these scores don’t account for your specific infrastructure context. For example, a high vulnerability in an internal, non-critical service might be less urgent than a medium vulnerability in an internet-facing component.

We wanted to see if we could use AI to help solve this prioritization problem. Inspired by Arthur Hastings, the meticulous assistant to Agatha Christie’s detective Hercule Poirot, we built HAIstings to help infrastructure teams prioritize vulnerabilities based on:

  1. Severity (critical/high/medium/low).
  2. Infrastructure context (from GitOps repositories).
  3. User-provided insights about component criticality.
  4. Evolving understanding through conversation.

Building HAIstings With LangGraph and LangChain

LangGraph, built on top of LangChain, provides an excellent framework for creating conversational AI agents with memory. Here’s how we structured HAIstings:

1. Core Components

The main components of HAIstings include:

  • k8sreport: Connects to Kubernetes to gather vulnerability reports from trivy-operator.
  • repo_ingest: Ingests infrastructure repository files to provide context.
  • vector_db: Stores and retrieves relevant files using vector embeddings.
  • memory: Maintains conversation history across sessions.

2. Conversation Flow

HAIstings uses a LangGraph state machine with the following flow:

This creates a loop where HAIstings:

  1. Retrieves vulnerability data.
  2. Generates an initial report.
  3. Asks for additional context.
  4. Refines its assessment based on new information.

3. RAG for Relevant Context

One of the challenges was efficiently retrieving only the relevant files from potentially huge GitOps repositories. We implemented a retrieval-augmented generation (RAG) approach:

This ensures that only the most relevant files for each vulnerable component are included in the context, keeping the prompt size manageable.

Security Considerations

When working with LLMs and infrastructure data, security is paramount. The vulnerability reports and infrastructure files we’re analyzing could contain sensitive information like:

  • Configuration details.
  • Authentication mechanisms.
  • Potentially leaked credentials in infrastructure files.

This is where the open source project CodeGate becomes essential. CodeGate acts as a protective layer between HAIstings and the LLM provider, offering crucial protections:

1. Secrets Redaction

CodeGate automatically identifies and redacts secrets like API keys, tokens and credentials from your prompts before they reach the large language model (LLM) provider. This prevents accidental leakage of sensitive data to third-party cloud services.

For example, if your Kubernetes manifest or GitOps repo contains:

CodeGate redacts these values from prompts before reaching the LLM; then it seamlessly unredacts them in responses.

You may be saying, “Hang on a second. We rely on things like ExternalSecretsOperator to include Kubernetes secrets, so we’re safe… right?”

Well, you might be experimenting with a cluster and have a token stored in a file in your local repository or in your current working directory. An agent might be a little too ambitious and accidentally add it to your context, as we’ve often seen with code editors. This is where CodeGate jumps in and redacts sensitive info before it is unintentionally shared.

2. PII Redaction

Beyond secrets, CodeGate also detects and redacts personally identifiable information (PII) that might be present in your infrastructure files or deployment manifests.

3. Controlled Model Access

CodeGate includes model multiplexing (muxing) capabilities help ensure that infrastructure vulnerability information goes only to approved, trusted models with appropriate security measures.

Model muxing allows you to create rules that route specific file types, projects or code patterns to different AI models. For example, you might want infrastructure code to be handled by a private, locally hosted model, while general application code can be processed by cloud-based models.

Model muxing enables:

  • Data sensitivity control: Route sensitive code (like infrastructure, security or authentication modules) to models with stricter privacy guarantees.
  • Compliance requirements: Meet regulatory needs by ensuring certain code types never leave your environment.
  • Cost optimization: Use expensive, high-powered models only for critical code sections.
  • Performance tuning: Match code complexity with the most appropriate model capabilities.

Here’s an example model muxing strategy with an infrastructure repository:

  • Rule: *.tf, *.yaml or *-infra.* can be muxed to a locally hosted Ollama model.
  • Benefit: Terraform files and infrastructure YAML never leave your environment, preventing potential leak of secrets, IP addresses or infrastructure design.

4. Traceable History

CodeGate maintains a central record of all interactions with AI models, creating an audit trail of all vulnerability assessments and recommendations.

Configuring HAIstings With CodeGate

Setting up HAIstings to work with CodeGate is straightforward. Update the LangChain configuration in HAIstings:

The Results

With HAIstings and CodeGate working together, the resulting system provides intelligent, context-aware vulnerability prioritization while maintaining strict security controls.

A sample report from HAIstings might look like:

Performance Considerations

LLM interactions are slow by themselves, and you shouldn’t rely on them for real-time and critical alerts. Proxying LLM traffic will add some latency into the mix. This is expected since these are computationally expensive operations. That said, we believe the security benefits are worth it. You’re trading a few extra seconds of processing time for dramatically better vulnerability prioritization that’s tailored to your specific infrastructure needs.

Secure AI for Infrastructure

Building HAIstings with LangGraph and LangChain has demonstrated how AI can help solve the problem of vulnerability prioritization in modern infrastructure. The combination with CodeGate ensures that this AI assistance doesn’t come at the cost of security. You get intelligent, context-aware guidance without compromising security standards, freeing up your team to focus on fixing what matters most.

As infrastructure becomes more complex and vulnerabilities more numerous, tools like HAIstings represent the future of infrastructure security management, providing intelligent, context-aware guidance while maintaining the strictest security standards.

You can try HAIstings by using the code in our GitHub repository.

Would you like to see how AI can help prioritize vulnerabilities in your infrastructure? Or do you have other ideas for combining AI with infrastructure management? Jump into Stacklok’s Discord community and continue the conversation.


Group Created with Sketch.





Source link

Leave a Comment