Skip to content

kotaicode/self_healing_kagent_infrastructure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Nginx Self-Healing Infrastructure with KAgent

This guide provides comprehensive documentation for the autonomous nginx self-healing infrastructure built on KAgent, a Kubernetes-native AI agent framework.

Table of Contents

Overview

The Nginx Self-Healing Infrastructure is an autonomous system that monitors, validates, and automatically fixes nginx configurations in Kubernetes environments. It leverages AI-powered agents to detect configuration issues, security vulnerabilities, and performance problems, then applies fixes without human intervention.

What It Does

  • Monitors nginx deployments for configuration issues and pod failures
  • Validates nginx configurations for syntax errors, security vulnerabilities, and best practices
  • Analyzes comprehensive configuration structure and performance patterns
  • Fixes detected issues automatically by updating ConfigMaps and restarting deployments
  • Logs all actions for audit and compliance tracking

Why It Matters

Traditional nginx management requires manual intervention when configurations fail or need updates. This self-healing system:

  • Reduces MTTR (Mean Time To Recovery) from hours to 30 minutes
  • Prevents configuration-related outages
  • Ensures security best practices are automatically enforced
  • Provides detailed audit trails for compliance
  • Enables autonomous remediation without human intervention

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Kubernetes Cluster                            β”‚
β”‚                                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚  Nginx Pods  β”‚         β”‚   KAgent Hook System        β”‚      β”‚
β”‚  β”‚              │────────▢│  - Pod Restart Events       β”‚      β”‚
β”‚  β”‚  - nginx-    β”‚         β”‚  - Pod Pending Events       β”‚      β”‚
β”‚  β”‚    test      β”‚         β”‚  - Probe Failures           β”‚      β”‚
β”‚  β”‚  - Config    β”‚         β”‚  - OOM Kill Events          β”‚      β”‚
β”‚  β”‚    Maps      β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚                       β”‚
β”‚                                          β–Ό                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚              Nginx Config Agent                            β”‚ β”‚
β”‚  β”‚  - nginx-config-agent (declarative AI agent)              β”‚ β”‚
β”‚  β”‚  - System Message: Nginx expert + troubleshooting         β”‚ β”‚
β”‚  β”‚  - Tools: validate, analyze, update, restart              β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                             β”‚                                   β”‚
β”‚                             β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚          RemoteMCPServer                                   β”‚ β”‚
β”‚  β”‚  - file-reader-remote-mcp                                 β”‚ β”‚
β”‚  β”‚  - Connection bridge to MCP Server                        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                             β”‚                                   β”‚
β”‚                             β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚          MCPServer (file-reader-mcpserver)                 β”‚ β”‚
β”‚  β”‚  - HTTP-based tool provider                               β”‚ β”‚
β”‚  β”‚  - Python script: file_reader.py                          β”‚ β”‚
β”‚  β”‚  - Tools implementation with security restrictions        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                             β”‚                                   β”‚
β”‚                             β–Ό                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚     Kubernetes API / ConfigMaps / Deployments              β”‚ β”‚
β”‚  β”‚  - Read nginx configurations                              β”‚ β”‚
β”‚  β”‚  - Update ConfigMaps (nginx-config)                       β”‚ β”‚
β”‚  β”‚  - Restart deployments (nginx-test)                       β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Flow:
1. Event Detection β†’ 2. Agent Triggered β†’ 3. Config Analysis β†’ 
4. Issue Identification β†’ 5. Automatic Fix β†’ 6. Apply & Restart β†’ 7. Verify

Key Features

πŸ€– Autonomous Self-Healing

  • Automatic detection and remediation of nginx configuration issues
  • No human intervention required for common problems
  • Intelligent analysis using AI-powered decision making
  • Realistic 30-minute recovery timeframe

πŸ”’ Security-First Design

  • Namespace restrictions (default, nginx, kagent only)
  • ConfigMap and deployment whitelisting
  • Audit logging for all operations
  • Content validation and dangerous command detection

πŸ“Š Comprehensive Analysis

  • Syntax validation (missing semicolons, braces, etc.)
  • Security checks (SSL/TLS, headers, server tokens)
  • Performance analysis (caching, compression, rate limiting)
  • Best practices validation

πŸ”„ Kubernetes-Native

  • Operates entirely within Kubernetes
  • Uses ConfigMaps for configuration management
  • Integrates with standard Kubernetes events
  • RBAC-compliant operations

πŸ“ Full Audit Trail

  • Every action is logged with timestamps
  • Security validations recorded
  • Command execution tracking
  • Compliance-ready audit logs

Prerequisites

Ensure you have the following installed and configured:

  • Kubernetes Cluster (Kind, Minikube, or any K8s cluster)
  • kubectl (configured to access your cluster)
  • Docker (for building images)
  • kagent CLI (KAgent command-line tool)
  • KAgent CRDs (installed in your cluster)

Installing KAgent

# Install KAgent CRDs and controllers
kubectl apply -f https://github.com/kagent-dev/kagent/releases/latest/download/install.yaml

# Verify KAgent is running
kubectl get pods -n kagent-system

Quick Start

Get the nginx self-healing infrastructure running in 5 minutes:

# 1. Clone the repository
git clone <your-repo-url>
cd file_reader_kagent_experiment

# 2. Build and load the Docker image
docker build -t local-file-reader:latest .
kind load docker-image local-file-reader:latest

# 3. Deploy the MCP Server
kubectl apply -f mcpserver.yaml

# 4. Deploy the Remote MCP Server
kubectl apply -f remotemcpserver.yaml

# 5. Deploy the Nginx Agent
kubectl apply -f nginx-agent.yaml

# 6. Deploy the test nginx deployment
kubectl apply -f nginx-test-deployment.yaml

# 7. Deploy monitoring hooks
kubectl apply -f nginx-config-monitoring.yaml

# 8. Verify all components are running
kubectl get mcpservers,agents,remotemcpservers,hooks -n kagent
kubectl get pods -n kagent

# 9. Test the agent
kagent invoke --agent "nginx-config-agent" --task "Check the nginx configuration and fix any issues"

Components

1. Nginx Config Agent (nginx-agent.yaml)

The AI agent responsible for nginx configuration management and self-healing.

Key Specifications:

  • Type: Declarative Agent
  • System Message: Nginx expert + Kubernetes troubleshooting assistant
  • Tools: 12 specialized tools for nginx management (4 additional tools available but not configured)
  • Model Config: Uses default LLM model

Capabilities:

  • Read nginx configurations from multiple locations
  • Validate configurations for common issues
  • Analyze comprehensive configuration structure
  • Update ConfigMaps with fixed configurations
  • Restart deployments to apply changes
  • Basic troubleshooting with pod and deployment discovery

Currently Configured Tools (12):

  • read_file - Read nginx config files
  • validate_nginx_config - Check for issues
  • analyze_nginx_config - Comprehensive analysis
  • list_nginx_configs - List available configs
  • write_file - Write content to files
  • update_configmap - Update Kubernetes ConfigMaps
  • restart_deployment - Restart nginx deployments
  • apply_manifest - Apply Kubernetes manifests
  • get_deployment_from_pod - Get deployment from pod name
  • get_configmaps_from_deployment - Get ConfigMaps from deployment
  • read_configmap - Read ConfigMap content
  • get_pods_by_label - Find nginx pods by label

Available but Not Configured (4):

  • get_container_logs - Retrieve pod logs
  • check_pod_events - Get Kubernetes events
  • check_pod_resources - Check resource usage
  • test_network_connectivity - Test backend connections

2. MCP Server (mcpserver.yaml)

The tool server that implements all nginx management functions.

Key Specifications:

  • Image: local-file-reader:v1
  • Port: 3000
  • Protocol: HTTP-based MCP
  • Script: file_reader.py

Implemented Tools:

  • read_file - Read nginx config files
  • validate_nginx_config - Check for issues
  • analyze_nginx_config - Comprehensive analysis
  • list_nginx_configs - List available configs
  • write_file - Write content to files
  • update_configmap - Update Kubernetes ConfigMaps
  • restart_deployment - Restart nginx deployments
  • apply_manifest - Apply Kubernetes manifests
  • get_deployment_from_pod - Get deployment from pod name
  • get_configmaps_from_deployment - Get ConfigMaps from deployment
  • read_configmap - Read ConfigMap content
  • get_pods_by_label - Find nginx pods by label
  • get_container_logs - Retrieve pod logs (available but not in agent)
  • check_pod_events - Get Kubernetes events (available but not in agent)
  • check_pod_resources - Check resource usage (available but not in agent)
  • test_network_connectivity - Test backend connections (available but not in agent)

3. Nginx Test Deployment (nginx-test-deployment.yaml)

A sample nginx deployment for testing the self-healing infrastructure.

Components:

  • Deployment: nginx-test (nginx:1.25)
  • ConfigMap: nginx-config (contains nginx.conf)
  • ConfigMap: nginx-html (contains index.html)
  • Service: nginx-test-service (ClusterIP)

4. Monitoring Hooks (nginx-config-monitoring.yaml)

Kubernetes event hooks that trigger the agent when issues occur.

Monitored Events:

  • pod-restart - Detects nginx pod restarts
  • pod-pending - Detects scheduling issues
  • probe-failed - Detects health check failures
  • oom-kill - Detects memory issues

Deployment

Standard Deployment

Deploy all components in the correct order:

# Step 1: Build and load Docker image
docker build -t local-file-reader:v1 .
kind load docker-image local-file-reader:v1

# Step 2: Deploy core infrastructure
kubectl apply -f mcpserver.yaml
kubectl apply -f remotemcpserver.yaml

# Step 3: Deploy nginx agent
kubectl apply -f nginx-agent.yaml

# Step 4: Deploy test nginx
kubectl apply -f nginx-test-deployment.yaml

# Step 5: Enable monitoring
kubectl apply -f nginx-config-monitoring.yaml

# Step 6: Verify deployment
kubectl get all -n kagent | grep -E "(nginx|mcp|agent)"

Secure Deployment (Production)

For production environments, use the secure configuration:

# Deploy with security restrictions
kubectl apply -f secure_yamls/nginx-agent-secure.yaml

# Apply RBAC restrictions
kubectl apply -f rbac-secure.yaml

# Verify security settings
kubectl describe agent nginx-config-agent-secure -n kagent

Verification Commands

# Check component status
kubectl get mcpservers,agents,remotemcpservers -n kagent

# Check pods
kubectl get pods -n kagent -l app=nginx-test

# Check logs
kubectl logs -n kagent deployment/file-reader-mcpserver --tail=20
kubectl logs -n kagent deployment/nginx-config-agent --tail=20

# Test connectivity
kubectl exec -n kagent deployment/file-reader-mcpserver -- curl -s http://localhost:3000

Self-Healing Workflows

Workflow 1: Configuration Issue Detection

1. Nginx pod crashes/restarts
   ↓
2. Hook detects pod-restart event
   ↓
3. Agent triggered with pod details
   ↓
4. Agent uses 'list_nginx_configs' to find config files
   ↓
5. Agent uses 'read_file' to read nginx.conf
   ↓
6. Agent uses 'validate_nginx_config' to check for issues
   ↓
7. Agent uses 'analyze_nginx_config' for comprehensive analysis
   ↓
8. Issues detected: syntax errors, missing headers, etc.
   ↓
9. Agent uses 'update_configmap' to fix configuration
   ↓
10. Agent uses 'restart_deployment' to apply changes
    ↓
11. Agent verifies fix by re-reading configuration
    ↓
12. Report success and log audit trail

Workflow 2: Advanced Troubleshooting

1. Nginx pod continues to crash after config fix
   ↓
2. Agent uses 'get_pods_by_label' to find related pods
   ↓
3. Agent uses 'get_deployment_from_pod' to get deployment info
   ↓
4. Agent uses 'get_configmaps_from_deployment' to find ConfigMaps
   ↓
5. Agent uses 'read_configmap' to analyze configuration
   ↓
6. Agent uses 'validate_nginx_config' and 'analyze_nginx_config'
   ↓
7. Agent reports detailed diagnostic information
   ↓
8. Escalates with actionable recommendations

Workflow 3: Security Remediation

1. Security scan detects vulnerable nginx config
   ↓
2. Agent analyzes configuration for security issues
   ↓
3. Issues found: TLSv1 enabled, missing security headers
   ↓
4. Agent updates config with:
   - ssl_protocols TLSv1.2 TLSv1.3;
   - add_header X-Frame-Options "SAMEORIGIN";
   - add_header X-Content-Type-Options "nosniff";
   - server_tokens off;
   ↓
5. Apply changes and restart deployment
   ↓
6. Verify security improvements

Available Tools

Configuration Management Tools

read_file

Reads nginx configuration files from multiple locations.

Parameters:

  • file_path (string): Relative or absolute path to file

Search Directories:

  • /tmp/shared_data
  • /etc/nginx
  • /etc/nginx/conf.d
  • /etc/nginx-configs
  • /var/nginx-configs
  • /shared/configs

Example:

kagent invoke --agent "nginx-config-agent" --task "Read the nginx configuration from /etc/nginx/nginx.conf"

list_nginx_configs

Lists all available nginx configuration files.

Parameters: None

Returns:

  • List of config files with paths, sizes, and modification times

Example:

kagent invoke --agent "nginx-config-agent" --task "List all nginx configuration files"

validate_nginx_config

Validates nginx configuration for common issues.

Parameters:

  • config_content (string): The nginx configuration to validate

Checks:

  • Syntax errors (missing semicolons, braces)
  • Security issues (SSL config, headers, server tokens)
  • Performance problems (inefficient regex, missing optimizations)
  • Best practices violations

Returns:

{
  "issues": ["Line 10: Missing semicolon in proxy_pass"],
  "warnings": ["Port 443 without SSL configuration"],
  "recommendations": ["Consider adding 'server_tokens off'"],
  "total_lines": 50,
  "has_ssl": true,
  "has_proxy": true
}

analyze_nginx_config

Performs comprehensive analysis of nginx configuration.

Parameters:

  • config_content (string): The nginx configuration to analyze

Analysis:

  • Validation results
  • Server block structure
  • Directive usage statistics
  • Configuration complexity metrics

Example:

kagent invoke --agent "nginx-config-agent" --task "Analyze the nginx configuration for performance issues"

Kubernetes Management Tools

update_configmap

Updates a Kubernetes ConfigMap with new content.

Parameters:

  • configmap_name (string): Name of ConfigMap (must be in allowed list)
  • namespace (string): Kubernetes namespace (default, nginx, or kagent)
  • key (string): Key in ConfigMap data (e.g., nginx.conf)
  • content (string): New configuration content

Security:

  • Only allowed ConfigMaps: nginx-config, nginx-html, nginx-conf, etc.
  • Only allowed namespaces: default, nginx, kagent
  • Only allowed keys: nginx.conf, default.conf, index.html, etc.
  • Content validation for dangerous commands

Example:

kagent invoke --agent "nginx-config-agent" --task "Fix the nginx configuration in ConfigMap nginx-config"

restart_deployment

Restarts a Kubernetes deployment to apply changes.

Parameters:

  • deployment_name (string): Name of deployment (must be in allowed list)
  • namespace (string): Kubernetes namespace

Security:

  • Only allowed deployments: nginx-test, nginx, nginx-server, etc.
  • Only allowed namespaces: default, nginx, kagent
  • All restarts are logged for audit

Example:

kagent invoke --agent "nginx-config-agent" --task "Restart the nginx-test deployment"

get_pods_by_label

Gets pods by label selector with deployment information.

Parameters:

  • label_selector (string): Label selector (e.g., "app=nginx-test")
  • namespace (string): Kubernetes namespace

Returns:

  • List of pods with names, status, and deployment names

Example:

kagent invoke --agent "nginx-config-agent" --task "Find all nginx pods in the kagent namespace"

Troubleshooting Tools

Note: The following troubleshooting tools are implemented in the MCP server but are not currently configured in the nginx agent. To enable them, add them to the toolNames list in nginx-agent.yaml.

get_container_logs (Available but not configured)

Retrieves container logs for troubleshooting.

Parameters:

  • pod_name (string): Name of the pod
  • namespace (string): Kubernetes namespace
  • container_name (string, optional): Specific container
  • tail_lines (integer, optional): Number of lines (default: 50)

Security:

  • Only nginx-related pods allowed
  • Limited to allowed namespaces

To enable:

# Add to nginx-agent.yaml toolNames list
toolNames: [..., "get_container_logs"]

check_pod_events (Available but not configured)

Gets Kubernetes events for a pod.

Parameters:

  • pod_name (string): Name of the pod
  • namespace (string): Kubernetes namespace

Returns:

  • Kubernetes events sorted by timestamp

To enable:

# Add to nginx-agent.yaml toolNames list
toolNames: [..., "check_pod_events"]

check_pod_resources (Available but not configured)

Checks pod resource usage and limits.

Parameters:

  • pod_name (string): Name of the pod
  • namespace (string): Kubernetes namespace

Returns:

  • Resource limits and requests
  • Resource usage information
  • Pod conditions

To enable:

# Add to nginx-agent.yaml toolNames list
toolNames: [..., "check_pod_resources"]

test_network_connectivity (Available but not configured)

Tests network connectivity to backend services.

Parameters:

  • service_name (string): Name of the service
  • namespace (string): Kubernetes namespace
  • port (integer, optional): Port to test (default: 80)

To enable:

# Add to nginx-agent.yaml toolNames list
toolNames: [..., "test_network_connectivity"]

Usage Examples

Example 1: Check and Fix Configuration

# Simple check
kagent invoke --agent "nginx-config-agent" --task "Check the nginx configuration and fix any issues"

# Agent will:
# 1. List available configs
# 2. Read nginx.conf from ConfigMap
# 3. Validate configuration
# 4. Fix detected issues
# 5. Update ConfigMap
# 6. Restart deployment
# 7. Verify fix

Example 2: Security Audit

kagent invoke --agent "nginx-config-agent" --task "Perform a security audit of the nginx configuration. Check for SSL/TLS issues, missing security headers, and vulnerabilities. Fix any issues found."

# Agent will:
# 1. Read configuration
# 2. Check SSL/TLS protocols
# 3. Verify security headers
# 4. Check server_tokens setting
# 5. Apply security fixes
# 6. Restart deployment

Example 3: Performance Optimization

kagent invoke --agent "nginx-config-agent" --task "Analyze nginx configuration for performance issues. Check for caching, compression, and optimize worker settings."

# Agent will:
# 1. Analyze configuration structure
# 2. Check worker_processes and connections
# 3. Verify gzip compression
# 4. Check caching headers
# 5. Recommend optimizations
# 6. Apply improvements

Example 4: Troubleshoot Crashing Pod

kagent invoke --agent "nginx-config-agent" --task "The nginx-test pod is crashing. Investigate the issue and fix the problem."

# Agent will:
# 1. Get pod status using get_pods_by_label
# 2. Get deployment info using get_deployment_from_pod
# 3. Find ConfigMaps using get_configmaps_from_deployment
# 4. Read configuration using read_configmap
# 5. Analyze configuration using validate_nginx_config and analyze_nginx_config
# 6. Apply fix using update_configmap
# 7. Restart deployment using restart_deployment
# 8. Verify resolution

Example 5: Configuration Migration

kagent invoke --agent "nginx-config-agent" --task "We need to migrate to nginx 1.25 configuration. Update the configuration to use modern best practices and remove deprecated directives."

# Agent will:
# 1. Read current configuration
# 2. Identify deprecated directives
# 3. Update to modern syntax
# 4. Add recommended settings
# 5. Test configuration
# 6. Apply changes

Security Features

Namespace Restrictions

Only allowed to operate in specific namespaces:

  • default - Default namespace
  • nginx - Nginx-specific namespace
  • kagent - KAgent system namespace

Any attempt to access other namespaces is denied with a security violation.

Resource Whitelisting

Allowed ConfigMaps:

  • nginx-config
  • nginx-html
  • nginx-conf
  • nginx-default
  • nginx-server

Allowed Deployments:

  • nginx-test
  • nginx
  • nginx-server
  • nginx-proxy
  • nginx-ingress

Allowed ConfigMap Keys:

  • nginx.conf
  • default.conf
  • index.html
  • server.conf
  • ssl.conf

Path Security

Forbidden Paths:

  • System directories: /etc/passwd, /etc/shadow, /root/
  • User directories: /home/
  • System logs: /var/log/
  • System directories: /proc/, /sys/, /dev/
  • Binary directories: /usr/bin/, /sbin/, /bin/
  • Sensitive configs: /etc/ssl/, /etc/ssh/

Content Validation

Dangerous Command Detection:

  • rm -rf - Recursive deletion
  • sudo - Privilege escalation
  • chmod 777 - Insecure permissions
  • wget, curl - Arbitrary downloads
  • Shell execution commands

Audit Logging

Every operation is logged with:

  • Timestamp: UTC timestamp
  • Event Type: Operation performed
  • User Context: Who/what initiated the action
  • Details: Full operation details
  • Security Validated: Confirmation of security checks

Logged Operations:

  • ConfigMap updates
  • Deployment restarts
  • Configuration changes
  • Resource access

Secure Agent Configuration

Use secure_yamls/nginx-agent-secure.yaml for production:

# Removed dangerous tools:
# - write_file (direct file writing)
# - apply_manifest (arbitrary resource creation)

# Retained safe tools:
# - read_file (read-only access)
# - validate_nginx_config (validation only)
# - analyze_nginx_config (analysis only)
# - update_configmap (restricted updates)
# - restart_deployment (restricted restarts)

Monitoring and Hooks

Hook Configuration

The nginx-config-monitoring.yaml defines event-driven automation:

Pod Restart Hook

Triggers when nginx pods restart unexpectedly.

Monitored Event: pod-restart

Agent Actions:

  1. Configuration check for syntax errors
  2. Review nginx error logs
  3. Verify upstream configurations
  4. Check SSL/TLS certificates
  5. Automated remediation
  6. Validation with nginx -t

Pod Pending Hook

Triggers when nginx pods can't be scheduled.

Monitored Event: pod-pending

Agent Actions:

  1. Check resource requests
  2. Verify ConfigMaps exist
  3. Check for missing secrets
  4. Validate volume mounts
  5. Fix configuration dependencies

Probe Failed Hook

Triggers when health checks fail.

Monitored Event: probe-failed

Agent Actions:

  1. Check health endpoint configuration
  2. Verify backend service health
  3. Analyze SSL termination
  4. Check rate limiting
  5. Fix health check endpoints

OOM Kill Hook

Triggers when pods are killed due to memory.

Monitored Event: oom-kill

Agent Actions:

  1. Review worker_processes configuration
  2. Check buffer size settings
  3. Analyze memory usage patterns
  4. Optimize configuration for memory
  5. Adjust resource limits

Custom Hook Creation

Create your own hooks for specific scenarios:

apiVersion: kagent.dev/v1alpha2
kind: Hook
metadata:
  name: nginx-custom-hook
  namespace: kagent
spec:
  eventConfigurations:
  - eventType: custom-event
    agentRef:
      name: nginx-config-agent
      namespace: kagent
    prompt: |
      Custom analysis prompt for your specific use case.
      
      Pod Details:
      - Pod: {{.ResourceName}}
      - Namespace: {{.Namespace}}
      - Event: {{.EventMessage}}
      
      Please analyze and provide remediation.

Troubleshooting

Common Issues

Issue 1: Agent Not Responding

Symptoms:

  • Agent commands timeout
  • No logs from agent pod

Diagnosis:

# Check agent pod status
kubectl get pods -n kagent -l app=nginx-config-agent

# Check agent logs
kubectl logs -n kagent deployment/nginx-config-agent --tail=50

# Check agent description
kubectl describe agent nginx-config-agent -n kagent

Solution:

# Restart agent
kubectl rollout restart deployment/nginx-config-agent -n kagent

# Check MCP server connectivity
kubectl exec -n kagent deployment/file-reader-mcpserver -- curl http://localhost:3000

Issue 2: ConfigMap Update Fails

Symptoms:

  • Error: "ConfigMap validation failed"
  • Security violation messages

Diagnosis:

# Check ConfigMap name
kubectl get configmap -n kagent | grep nginx

# Check agent security configuration
kubectl get agent nginx-config-agent -n kagent -o yaml | grep toolNames

Solution:

  1. Ensure ConfigMap name is in allowed list
  2. Verify namespace is correct
  3. Check content doesn't contain dangerous commands

Issue 3: Tools Not Available

Symptoms:

  • "Unknown tool" errors
  • Missing tool capabilities

Diagnosis:

# List available tools
kubectl exec -n kagent deployment/file-reader-mcpserver -- python3 -c "
import urllib.request, json
req = urllib.request.Request('http://localhost:3000/mcp',
    data=json.dumps({'jsonrpc': '2.0', 'id': 1, 'method': 'tools/list', 'params': {}}).encode(),
    headers={'Content-Type': 'application/json'})
response = urllib.request.urlopen(req, timeout=5)
print(json.loads(response.read().decode()).get('result', {}).get('tools', []))
"

Solution:

# Rebuild and reload Docker image
docker build -t local-file-reader:v1 .
kind load docker-image local-file-reader:v1

# Restart MCP server
kubectl rollout restart deployment/file-reader-mcpserver -n kagent

Issue 4: Nginx Pod Still Crashing

Symptoms:

  • Agent fixes config but pod still crashes
  • CrashLoopBackOff status

Diagnosis:

# Use agent for comprehensive troubleshooting
kagent invoke --agent "nginx-config-agent" --task "The nginx-test pod is still crashing after config fix. Investigate the issue and provide detailed analysis."

Manual Diagnosis:

# Check logs
kubectl logs -n kagent <nginx-pod-name> --tail=100

# Check events
kubectl get events -n kagent --field-selector involvedObject.name=<nginx-pod-name>

# Check resources
kubectl describe pod <nginx-pod-name> -n kagent

# Test manually
kubectl exec -n kagent <nginx-pod-name> -- nginx -t

Debug Mode

Enable verbose logging for troubleshooting:

# Check MCP server logs
kubectl logs -n kagent deployment/file-reader-mcpserver -f

# Check agent logs with context
kubectl logs -n kagent deployment/nginx-config-agent -f --tail=100

# Port forward for direct testing
kubectl port-forward -n kagent deployment/file-reader-mcpserver 3000:3000

Testing Tools Directly

Test MCP server tools without the agent:

# Test read_file
kubectl exec -n kagent deployment/file-reader-mcpserver -- python3 -c "
import urllib.request, json
data = {
    'jsonrpc': '2.0',
    'id': 1,
    'method': 'tools/call',
    'params': {
        'name': 'read_file',
        'arguments': {'file_path': '/etc/nginx/nginx.conf'}
    }
}
req = urllib.request.Request('http://localhost:3000/mcp',
    data=json.dumps(data).encode(),
    headers={'Content-Type': 'application/json'})
response = urllib.request.urlopen(req, timeout=5)
print(json.dumps(json.loads(response.read().decode()), indent=2))
"

# Test validate_nginx_config
kubectl exec -n kagent deployment/file-reader-mcpserver -- python3 -c "
import urllib.request, json
config = '''
server {
    listen 80;
    server_name example.com;
}
'''
data = {
    'jsonrpc': '2.0',
    'id': 2,
    'method': 'tools/call',
    'params': {
        'name': 'validate_nginx_config',
        'arguments': {'config_content': config}
    }
}
req = urllib.request.Request('http://localhost:3000/mcp',
    data=json.dumps(data).encode(),
    headers={'Content-Type': 'application/json'})
response = urllib.request.urlopen(req, timeout=5)
print(json.dumps(json.loads(response.read().decode()), indent=2))
"

Advanced Configuration

Enabling Additional Tools

The MCP server implements several troubleshooting tools that are not currently configured in the nginx agent. To enable them:

  1. Edit the nginx-agent.yaml file:
toolNames: [
  "read_file", 
  "validate_nginx_config", 
  "analyze_nginx_config", 
  "list_nginx_configs", 
  "write_file", 
  "update_configmap", 
  "restart_deployment", 
  "apply_manifest", 
  "get_deployment_from_pod", 
  "get_pods_by_label", 
  "get_configmaps_from_deployment", 
  "read_configmap",
  # Add these for enhanced troubleshooting:
  "get_container_logs",
  "check_pod_events", 
  "check_pod_resources",
  "test_network_connectivity"
]
  1. Apply the updated configuration:
kubectl apply -f nginx-agent.yaml
  1. Verify the tools are available:
kagent invoke --agent "nginx-config-agent" --task "List all available tools"

Custom Search Directories

Modify file_reader.py to add custom nginx config locations:

DEFAULT_SEARCH_DIRS = [
    "/tmp/shared_data",
    "/etc/nginx",
    "/etc/nginx/conf.d",
    "/etc/nginx-configs",
    "/your/custom/path",  # Add your path
]

Custom Validation Rules

Extend validation in file_reader.py:

def validate_nginx_config(config_content: str):
    issues = []
    
    # Add custom validation rule
    if 'custom_directive' not in config_content:
        issues.append("Missing required custom_directive")
    
    # Your custom checks here
    
    return {"issues": issues, ...}

Integration with External Systems

Webhook Integration

Send notifications when issues are detected:

# Add to Hook prompt
After analysis, send notification:
curl -X POST https://your-webhook.com/nginx-alert \
  -d '{"pod": "{{.ResourceName}}", "status": "fixed"}'

Metrics Export

Export metrics to Prometheus:

# Add to file_reader.py
from prometheus_client import Counter, start_http_server

nginx_fixes = Counter('nginx_config_fixes_total', 'Total nginx fixes')
nginx_validations = Counter('nginx_config_validations_total', 'Total validations')

Multi-Cluster Setup

Deploy agents across multiple clusters:

# Cluster 1
kubectl --context cluster1 apply -f nginx-agent.yaml

# Cluster 2
kubectl --context cluster2 apply -f nginx-agent.yaml

# Monitor all
kubectl --context cluster1 get agents -n kagent &
kubectl --context cluster2 get agents -n kagent &

Production Recommendations

  1. Use Secure Agent Configuration

    kubectl apply -f secure_yamls/nginx-agent-secure.yaml
  2. Configure Resource Limits

    resources:
      limits:
        memory: "512Mi"
        cpu: "500m"
      requests:
        memory: "256Mi"
        cpu: "250m"
  3. Enable Audit Logging

    • Configure persistent volume for audit logs
    • Ship logs to external SIEM system
  4. Set Up Monitoring

    • Monitor agent success/failure rates
    • Track configuration fix frequency
    • Alert on repeated failures
  5. Test in Staging First

    • Validate agent behavior in non-prod
    • Test all workflows before production
    • Monitor for false positives
  6. Regular Updates

    # Update Docker image
    docker build -t local-file-reader:v2 .
    kind load docker-image local-file-reader:v2
    
    # Update deployment
    kubectl set image deployment/file-reader-mcpserver \
      mcp-server=local-file-reader:v2 -n kagent

Conclusion

The Nginx Self-Healing Infrastructure provides autonomous configuration management and troubleshooting for nginx deployments in Kubernetes. By combining AI-powered agents, comprehensive validation tools, and event-driven automation, it reduces operational overhead and ensures nginx configurations remain secure, performant, and reliable.

Key Benefits

  • ⚑ Faster Recovery: 30 minutes instead of hours (realistic timeframe)
  • πŸ”’ Security: Automatic enforcement of best practices
  • πŸ€– Autonomous: No human intervention required
  • πŸ“Š Transparent: Full audit trail of all actions
  • πŸ”„ Kubernetes-Native: Seamless integration

Current Limitations

  • Recovery Time: Realistic recovery time is 30 minutes, not seconds
  • Troubleshooting Tools: Some advanced troubleshooting tools are available but not configured by default
  • Log Access: Container logs, events, and resource monitoring require additional configuration
  • Network Testing: Backend connectivity testing is available but not enabled by default

Next Steps

  1. Deploy the infrastructure in your cluster
  2. Customize agent prompts for your use cases
  3. Add custom validation rules
  4. Integrate with your monitoring systems
  5. Extend to other applications (Apache, HAProxy, etc.)

Resources

Support

For issues, questions, or contributions, please refer to the project repository.


Version: 1.0.0
Last Updated: 2025
License: MIT

About

Kagent file reader

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors