March 18, 2025

ikayaniaamirshahzad@gmail.com

How We Built a LangGraph Agent To Prioritize GitOps Vulns

In today’s complex Kubernetes environments, managing and prioritizing vulnerabilities can quickly become overwhelming. With dozens or even hundreds of containers running across multiple services, how do you decide which vulnerabilities to address first?

This is where AI can help, and in this article we’ll share our experience building HAIstings, an AI-powered vulnerability prioritizer, using LangGraph and LangChain, with security enhanced by CodeGate, an open source AI gateway developed by Stacklok.

Too Many Vulnerabilities, Too Little Time

If you’ve ever run a vulnerability scanner like Trivy against your Kubernetes cluster, you know the feeling: hundreds or thousands of common vulnerabilities and exposures (CVEs) across dozens of images, with limited time and resources to address them. Which ones should you tackle first?

The traditional approach relies on severity scores (i.e., critical, high, medium, low), but these scores don’t account for your specific infrastructure context. For example, a high vulnerability in an internal, non-critical service might be less urgent than a medium vulnerability in an internet-facing component.

We wanted to see if we could use AI to help solve this prioritization problem. Inspired by Arthur Hastings, the meticulous assistant to Agatha Christie’s detective Hercule Poirot, we built HAIstings to help infrastructure teams prioritize vulnerabilities based on:

Severity (critical/high/medium/low).
Infrastructure context (from GitOps repositories).
User-provided insights about component criticality.
Evolving understanding through conversation.

Building HAIstings With LangGraph and LangChain

LangGraph, built on top of LangChain, provides an excellent framework for creating conversational AI agents with memory. Here’s how we structured HAIstings:

1. Core Components

The main components of HAIstings include:

k8sreport: Connects to Kubernetes to gather vulnerability reports from trivy-operator.
repo_ingest: Ingests infrastructure repository files to provide context.
vector_db: Stores and retrieves relevant files using vector embeddings.
memory: Maintains conversation history across sessions.

2. Conversation Flow

HAIstings uses a LangGraph state machine with the following flow:

graph_builder = StateGraph(State) # Nodes graph_builder.add_node(“retrieve”, retrieve) # Get vulnerability data graph_builder.add_node(“generate_initial”, generate_initial) # Create initial report graph_builder.add_node(“extra_userinput”, extra_userinput) # Get more context # Edges graph_builder.add_edge(START, “retrieve”) graph_builder.add_edge(“retrieve”, “generate_initial”) graph_builder.add_edge(“generate_initial”, “extra_userinput”) graph_builder.add_conditional_edges(“extra_userinput”, needs_more_info, [“extra_userinput”, END])

graph_builder = StateGraph(State)

# Nodes

graph_builder.add_node(“retrieve”, retrieve) # Get vulnerability data

graph_builder.add_node(“generate_initial”, generate_initial) # Create initial report

graph_builder.add_node(“extra_userinput”, extra_userinput) # Get more context

# Edges

graph_builder.add_edge(START, “retrieve”)

graph_builder.add_edge(“retrieve”, “generate_initial”)

graph_builder.add_edge(“generate_initial”, “extra_userinput”)

graph_builder.add_conditional_edges(“extra_userinput”, needs_more_info, [“extra_userinput”, END])

This creates a loop where HAIstings:

Retrieves vulnerability data.
Generates an initial report.
Asks for additional context.
Refines its assessment based on new information.

3. RAG for Relevant Context

One of the challenges was efficiently retrieving only the relevant files from potentially huge GitOps repositories. We implemented a retrieval-augmented generation (RAG) approach:

def retrieve_relevant_files(repo_url: str, query: str, k: int = 5) -> List[Dict]: “””Retrieve relevant files from the vector database based on a query.””” vector_db = VectorDatabase() documents = vector_db.similarity_search(query, k=k) results = [] for doc in documents: results.append({ “path”: doc.metadata[“path”], “content”: doc.page_content, “is_kubernetes”: doc.metadata.get(“is_kubernetes”, False), }) return results

def retrieve_relevant_files(repo_url: str, query: str, k: int = 5) –> List[Dict]:

“””Retrieve relevant files from the vector database based on a query.”””

vector_db = VectorDatabase()

documents = vector_db.similarity_search(query, k=k)

results = []

for doc in documents:

results.append({

“path”: doc.metadata[“path”],

“content”: doc.page_content,

“is_kubernetes”: doc.metadata.get(“is_kubernetes”, False),

})

return results

This ensures that only the most relevant files for each vulnerable component are included in the context, keeping the prompt size manageable.

Security Considerations

When working with LLMs and infrastructure data, security is paramount. The vulnerability reports and infrastructure files we’re analyzing could contain sensitive information like:

Configuration details.
Authentication mechanisms.
Potentially leaked credentials in infrastructure files.

This is where the open source project CodeGate becomes essential. CodeGate acts as a protective layer between HAIstings and the LLM provider, offering crucial protections:

1. Secrets Redaction

CodeGate automatically identifies and redacts secrets like API keys, tokens and credentials from your prompts before they reach the large language model (LLM) provider. This prevents accidental leakage of sensitive data to third-party cloud services.

For example, if your Kubernetes manifest or GitOps repo contains:

apiVersion: v1 kind: Secret metadata: name: database-credentials type: Opaque data: username: YWRtaW4= # “admin” in base64 password: c3VwZXJzZWNyZXQ= # “supersecret” in base64

apiVersion: v1

kind: Secret

metadata:

name: database-credentials

type: Opaque

data:

username: YWRtaW4= # “admin” in base64

password: c3VwZXJzZWNyZXQ= # “supersecret” in base64

CodeGate redacts these values from prompts before reaching the LLM; then it seamlessly unredacts them in responses.

You may be saying, “Hang on a second. We rely on things like ExternalSecretsOperator to include Kubernetes secrets, so we’re safe… right?”

Well, you might be experimenting with a cluster and have a token stored in a file in your local repository or in your current working directory. An agent might be a little too ambitious and accidentally add it to your context, as we’ve often seen with code editors. This is where CodeGate jumps in and redacts sensitive info before it is unintentionally shared.

2. PII Redaction

Beyond secrets, CodeGate also detects and redacts personally identifiable information (PII) that might be present in your infrastructure files or deployment manifests.

3. Controlled Model Access

CodeGate includes model multiplexing (muxing) capabilities help ensure that infrastructure vulnerability information goes only to approved, trusted models with appropriate security measures.

Model muxing allows you to create rules that route specific file types, projects or code patterns to different AI models. For example, you might want infrastructure code to be handled by a private, locally hosted model, while general application code can be processed by cloud-based models.

Model muxing enables:

Data sensitivity control: Route sensitive code (like infrastructure, security or authentication modules) to models with stricter privacy guarantees.
Compliance requirements: Meet regulatory needs by ensuring certain code types never leave your environment.
Cost optimization: Use expensive, high-powered models only for critical code sections.
Performance tuning: Match code complexity with the most appropriate model capabilities.

Here’s an example model muxing strategy with an infrastructure repository:

Rule: *.tf, *.yaml or *-infra.* can be muxed to a locally hosted Ollama model.
Benefit: Terraform files and infrastructure YAML never leave your environment, preventing potential leak of secrets, IP addresses or infrastructure design.

4. Traceable History

CodeGate maintains a central record of all interactions with AI models, creating an audit trail of all vulnerability assessments and recommendations.

Configuring HAIstings With CodeGate

Setting up HAIstings to work with CodeGate is straightforward. Update the LangChain configuration in HAIstings:

# HAIstings configuration for using CodeGate self.llm = init_chat_model( # Using CodeGate’s Muxing feature model=”gpt-4o”, # This will be routed appropriately by CodeGate model_provider=”openai”, # API key not needed as it’s handled by CodeGate api_key=”fake-api-key”, # CodeGate Muxing API URL base_url=”http://127.0.0.1:8989/v1/mux”, )

# HAIstings configuration for using CodeGate

self.llm = init_chat_model(

# Using CodeGate’s Muxing feature

model=“gpt-4o”, # This will be routed appropriately by CodeGate

model_provider=“openai”,

# API key not needed as it’s handled by CodeGate

api_key=“fake-api-key”,

# CodeGate Muxing API URL

base_url=“http://127.0.0.1:8989/v1/mux”,

)

The Results

With HAIstings and CodeGate working together, the resulting system provides intelligent, context-aware vulnerability prioritization while maintaining strict security controls.

A sample report from HAIstings might look like:

# HAIsting’s Security Report ## Introduction Good day! Arthur Hastings at your service. I’ve meticulously examined the vulnerability reports from your Kubernetes infrastructure and prepared a prioritized assessment of the security concerns that require your immediate attention. ## Summary After careful analysis, I’ve identified several critical vulnerabilities that demand prompt remediation: 1. **example-service (internet-facing service)** – Critical vulnerabilities: 3 – High vulnerabilities: 7 – Most concerning: CVE-2023-1234 (Remote code execution) This service is particularly concerning due to its internet-facing nature, as mentioned in your notes. I recommend addressing these vulnerabilities with the utmost urgency. 2. **Flux (GitOps controller)** – Critical vulnerabilities: 2 – High vulnerabilities: 5 – Most concerning: CVE-2023-5678 (Git request processing vulnerability) As you’ve noted, Flux is critical to your infrastructure, and this Git request processing vulnerability aligns with your specific concerns. ## Conclusion I say, these vulnerabilities require prompt attention, particularly the ones affecting your internet-facing services and deployment controllers. I recommend addressing the critical vulnerabilities in example-service and Flux as your top priorities.

# HAIsting’s Security Report

## Introduction

Good day! Arthur Hastings at your service. I‘ve meticulously examined the vulnerability reports from your Kubernetes infrastructure and prepared a prioritized assessment of the security concerns that require your immediate attention.

## Summary

After careful analysis, I‘ve identified several critical vulnerabilities that demand prompt remediation:

1. **example–service (internet–facing service)**

– Critical vulnerabilities: 3

– High vulnerabilities: 7

– Most concerning: CVE–2023–1234 (Remote code execution)

This service is particularly concerning due to its internet–facing nature, as mentioned in your notes. I recommend addressing these vulnerabilities with the utmost urgency.

2. **Flux (GitOps controller)**

– Critical vulnerabilities: 2

– High vulnerabilities: 5

– Most concerning: CVE–2023–5678 (Git request processing vulnerability)

As you‘ve noted, Flux is critical to your infrastructure, and this Git request processing vulnerability aligns with your specific concerns.

## Conclusion

I say, these vulnerabilities require prompt attention, particularly the ones affecting your internet–facing services and deployment controllers. I recommend addressing the critical vulnerabilities in example–service and Flux as your top priorities.

Performance Considerations

LLM interactions are slow by themselves, and you shouldn’t rely on them for real-time and critical alerts. Proxying LLM traffic will add some latency into the mix. This is expected since these are computationally expensive operations. That said, we believe the security benefits are worth it. You’re trading a few extra seconds of processing time for dramatically better vulnerability prioritization that’s tailored to your specific infrastructure needs.

Secure AI for Infrastructure

Building HAIstings with LangGraph and LangChain has demonstrated how AI can help solve the problem of vulnerability prioritization in modern infrastructure. The combination with CodeGate ensures that this AI assistance doesn’t come at the cost of security. You get intelligent, context-aware guidance without compromising security standards, freeing up your team to focus on fixing what matters most.

As infrastructure becomes more complex and vulnerabilities more numerous, tools like HAIstings represent the future of infrastructure security management, providing intelligent, context-aware guidance while maintaining the strictest security standards.

You can try HAIstings by using the code in our GitHub repository.

Would you like to see how AI can help prioritize vulnerabilities in your infrastructure? Or do you have other ideas for combining AI with infrastructure management? Jump into Stacklok’s Discord community and continue the conversation.

Juan Antonio “Ozz” Osorio is a Mexican software engineer living in Finland. He has worked in security with cloud-related open source projects such as OpenStack and Kubernetes, as well as security for bare metal environments. He’s currently working at Stacklok…

Radoslav Dimitrov is a Senior Software Engineer at Stacklok with a background in supply chain security. Previously at VMware, he is an open-source maintainer of go-tuf and is currently working on CodeGate, a gateway that enhances AI coding assistants by…

Latest articles

COCA Secures Strategic Investment from Stellar Development Foundation and FunFair Ventures to Drive Mass Adoption of Stablecoin Payments

ikayaniaamirshahzad@gmail.com

Best Laptops of 2025 – CNET

ikayaniaamirshahzad@gmail.com

Europe is trying to get non-Apple smartwatches to work better with iPhones

ikayaniaamirshahzad@gmail.com

How We Built a LangGraph Agent To Prioritize GitOps Vulns

Too Many Vulnerabilities, Too Little Time

Building HAIstings With LangGraph and LangChain

1. Core Components

2. Conversation Flow

3. RAG for Relevant Context

Security Considerations

1. Secrets Redaction

2. PII Redaction

3. Controlled Model Access

4. Traceable History

Configuring HAIstings With CodeGate

The Results

Performance Considerations

Secure AI for Infrastructure

Latest articles

COCA Secures Strategic Investment from Stellar Development Foundation and FunFair Ventures to Drive Mass Adoption of Stablecoin Payments

Best Laptops of 2025 – CNET

Europe is trying to get non-Apple smartwatches to work better with iPhones

Leave a Comment Cancel reply

Featured articles

COCA Secures Strategic Investment from Stellar Development Foundation and FunFair Ventures to Drive Mass Adoption of Stablecoin Payments

Best Laptops of 2025 – CNET

Europe is trying to get non-Apple smartwatches to work better with iPhones