This guide provides comprehensive documentation for the autonomous nginx self-healing infrastructure built on KAgent, a Kubernetes-native AI agent framework.
- Overview
- Architecture
- Key Features
- Prerequisites
- Quick Start
- Components
- Deployment
- Self-Healing Workflows
- Available Tools
- Usage Examples
- Security Features
- Monitoring and Hooks
- Troubleshooting
- Advanced Configuration
The Nginx Self-Healing Infrastructure is an autonomous system that monitors, validates, and automatically fixes nginx configurations in Kubernetes environments. It leverages AI-powered agents to detect configuration issues, security vulnerabilities, and performance problems, then applies fixes without human intervention.
- Monitors nginx deployments for configuration issues and pod failures
- Validates nginx configurations for syntax errors, security vulnerabilities, and best practices
- Analyzes comprehensive configuration structure and performance patterns
- Fixes detected issues automatically by updating ConfigMaps and restarting deployments
- Logs all actions for audit and compliance tracking
Traditional nginx management requires manual intervention when configurations fail or need updates. This self-healing system:
- Reduces MTTR (Mean Time To Recovery) from hours to 30 minutes
- Prevents configuration-related outages
- Ensures security best practices are automatically enforced
- Provides detailed audit trails for compliance
- Enables autonomous remediation without human intervention
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kubernetes Cluster β
β β
β ββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β Nginx Pods β β KAgent Hook System β β
β β ββββββββββΆβ - Pod Restart Events β β
β β - nginx- β β - Pod Pending Events β β
β β test β β - Probe Failures β β
β β - Config β β - OOM Kill Events β β
β β Maps β βββββββββββββββ¬ββββββββββββββββ β
β ββββββββββββββββ β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Nginx Config Agent β β
β β - nginx-config-agent (declarative AI agent) β β
β β - System Message: Nginx expert + troubleshooting β β
β β - Tools: validate, analyze, update, restart β β
β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β RemoteMCPServer β β
β β - file-reader-remote-mcp β β
β β - Connection bridge to MCP Server β β
β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MCPServer (file-reader-mcpserver) β β
β β - HTTP-based tool provider β β
β β - Python script: file_reader.py β β
β β - Tools implementation with security restrictions β β
β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Kubernetes API / ConfigMaps / Deployments β β
β β - Read nginx configurations β β
β β - Update ConfigMaps (nginx-config) β β
β β - Restart deployments (nginx-test) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Flow:
1. Event Detection β 2. Agent Triggered β 3. Config Analysis β
4. Issue Identification β 5. Automatic Fix β 6. Apply & Restart β 7. Verify
- Automatic detection and remediation of nginx configuration issues
- No human intervention required for common problems
- Intelligent analysis using AI-powered decision making
- Realistic 30-minute recovery timeframe
- Namespace restrictions (default, nginx, kagent only)
- ConfigMap and deployment whitelisting
- Audit logging for all operations
- Content validation and dangerous command detection
- Syntax validation (missing semicolons, braces, etc.)
- Security checks (SSL/TLS, headers, server tokens)
- Performance analysis (caching, compression, rate limiting)
- Best practices validation
- Operates entirely within Kubernetes
- Uses ConfigMaps for configuration management
- Integrates with standard Kubernetes events
- RBAC-compliant operations
- Every action is logged with timestamps
- Security validations recorded
- Command execution tracking
- Compliance-ready audit logs
Ensure you have the following installed and configured:
- Kubernetes Cluster (Kind, Minikube, or any K8s cluster)
- kubectl (configured to access your cluster)
- Docker (for building images)
- kagent CLI (KAgent command-line tool)
- KAgent CRDs (installed in your cluster)
# Install KAgent CRDs and controllers
kubectl apply -f https://github.com/kagent-dev/kagent/releases/latest/download/install.yaml
# Verify KAgent is running
kubectl get pods -n kagent-systemGet the nginx self-healing infrastructure running in 5 minutes:
# 1. Clone the repository
git clone <your-repo-url>
cd file_reader_kagent_experiment
# 2. Build and load the Docker image
docker build -t local-file-reader:latest .
kind load docker-image local-file-reader:latest
# 3. Deploy the MCP Server
kubectl apply -f mcpserver.yaml
# 4. Deploy the Remote MCP Server
kubectl apply -f remotemcpserver.yaml
# 5. Deploy the Nginx Agent
kubectl apply -f nginx-agent.yaml
# 6. Deploy the test nginx deployment
kubectl apply -f nginx-test-deployment.yaml
# 7. Deploy monitoring hooks
kubectl apply -f nginx-config-monitoring.yaml
# 8. Verify all components are running
kubectl get mcpservers,agents,remotemcpservers,hooks -n kagent
kubectl get pods -n kagent
# 9. Test the agent
kagent invoke --agent "nginx-config-agent" --task "Check the nginx configuration and fix any issues"The AI agent responsible for nginx configuration management and self-healing.
Key Specifications:
- Type: Declarative Agent
- System Message: Nginx expert + Kubernetes troubleshooting assistant
- Tools: 12 specialized tools for nginx management (4 additional tools available but not configured)
- Model Config: Uses default LLM model
Capabilities:
- Read nginx configurations from multiple locations
- Validate configurations for common issues
- Analyze comprehensive configuration structure
- Update ConfigMaps with fixed configurations
- Restart deployments to apply changes
- Basic troubleshooting with pod and deployment discovery
Currently Configured Tools (12):
read_file- Read nginx config filesvalidate_nginx_config- Check for issuesanalyze_nginx_config- Comprehensive analysislist_nginx_configs- List available configswrite_file- Write content to filesupdate_configmap- Update Kubernetes ConfigMapsrestart_deployment- Restart nginx deploymentsapply_manifest- Apply Kubernetes manifestsget_deployment_from_pod- Get deployment from pod nameget_configmaps_from_deployment- Get ConfigMaps from deploymentread_configmap- Read ConfigMap contentget_pods_by_label- Find nginx pods by label
Available but Not Configured (4):
get_container_logs- Retrieve pod logscheck_pod_events- Get Kubernetes eventscheck_pod_resources- Check resource usagetest_network_connectivity- Test backend connections
The tool server that implements all nginx management functions.
Key Specifications:
- Image: local-file-reader:v1
- Port: 3000
- Protocol: HTTP-based MCP
- Script: file_reader.py
Implemented Tools:
read_file- Read nginx config filesvalidate_nginx_config- Check for issuesanalyze_nginx_config- Comprehensive analysislist_nginx_configs- List available configswrite_file- Write content to filesupdate_configmap- Update Kubernetes ConfigMapsrestart_deployment- Restart nginx deploymentsapply_manifest- Apply Kubernetes manifestsget_deployment_from_pod- Get deployment from pod nameget_configmaps_from_deployment- Get ConfigMaps from deploymentread_configmap- Read ConfigMap contentget_pods_by_label- Find nginx pods by labelget_container_logs- Retrieve pod logs (available but not in agent)check_pod_events- Get Kubernetes events (available but not in agent)check_pod_resources- Check resource usage (available but not in agent)test_network_connectivity- Test backend connections (available but not in agent)
A sample nginx deployment for testing the self-healing infrastructure.
Components:
- Deployment: nginx-test (nginx:1.25)
- ConfigMap: nginx-config (contains nginx.conf)
- ConfigMap: nginx-html (contains index.html)
- Service: nginx-test-service (ClusterIP)
Kubernetes event hooks that trigger the agent when issues occur.
Monitored Events:
pod-restart- Detects nginx pod restartspod-pending- Detects scheduling issuesprobe-failed- Detects health check failuresoom-kill- Detects memory issues
Deploy all components in the correct order:
# Step 1: Build and load Docker image
docker build -t local-file-reader:v1 .
kind load docker-image local-file-reader:v1
# Step 2: Deploy core infrastructure
kubectl apply -f mcpserver.yaml
kubectl apply -f remotemcpserver.yaml
# Step 3: Deploy nginx agent
kubectl apply -f nginx-agent.yaml
# Step 4: Deploy test nginx
kubectl apply -f nginx-test-deployment.yaml
# Step 5: Enable monitoring
kubectl apply -f nginx-config-monitoring.yaml
# Step 6: Verify deployment
kubectl get all -n kagent | grep -E "(nginx|mcp|agent)"For production environments, use the secure configuration:
# Deploy with security restrictions
kubectl apply -f secure_yamls/nginx-agent-secure.yaml
# Apply RBAC restrictions
kubectl apply -f rbac-secure.yaml
# Verify security settings
kubectl describe agent nginx-config-agent-secure -n kagent# Check component status
kubectl get mcpservers,agents,remotemcpservers -n kagent
# Check pods
kubectl get pods -n kagent -l app=nginx-test
# Check logs
kubectl logs -n kagent deployment/file-reader-mcpserver --tail=20
kubectl logs -n kagent deployment/nginx-config-agent --tail=20
# Test connectivity
kubectl exec -n kagent deployment/file-reader-mcpserver -- curl -s http://localhost:30001. Nginx pod crashes/restarts
β
2. Hook detects pod-restart event
β
3. Agent triggered with pod details
β
4. Agent uses 'list_nginx_configs' to find config files
β
5. Agent uses 'read_file' to read nginx.conf
β
6. Agent uses 'validate_nginx_config' to check for issues
β
7. Agent uses 'analyze_nginx_config' for comprehensive analysis
β
8. Issues detected: syntax errors, missing headers, etc.
β
9. Agent uses 'update_configmap' to fix configuration
β
10. Agent uses 'restart_deployment' to apply changes
β
11. Agent verifies fix by re-reading configuration
β
12. Report success and log audit trail
1. Nginx pod continues to crash after config fix
β
2. Agent uses 'get_pods_by_label' to find related pods
β
3. Agent uses 'get_deployment_from_pod' to get deployment info
β
4. Agent uses 'get_configmaps_from_deployment' to find ConfigMaps
β
5. Agent uses 'read_configmap' to analyze configuration
β
6. Agent uses 'validate_nginx_config' and 'analyze_nginx_config'
β
7. Agent reports detailed diagnostic information
β
8. Escalates with actionable recommendations
1. Security scan detects vulnerable nginx config
β
2. Agent analyzes configuration for security issues
β
3. Issues found: TLSv1 enabled, missing security headers
β
4. Agent updates config with:
- ssl_protocols TLSv1.2 TLSv1.3;
- add_header X-Frame-Options "SAMEORIGIN";
- add_header X-Content-Type-Options "nosniff";
- server_tokens off;
β
5. Apply changes and restart deployment
β
6. Verify security improvements
Reads nginx configuration files from multiple locations.
Parameters:
file_path(string): Relative or absolute path to file
Search Directories:
/tmp/shared_data/etc/nginx/etc/nginx/conf.d/etc/nginx-configs/var/nginx-configs/shared/configs
Example:
kagent invoke --agent "nginx-config-agent" --task "Read the nginx configuration from /etc/nginx/nginx.conf"Lists all available nginx configuration files.
Parameters: None
Returns:
- List of config files with paths, sizes, and modification times
Example:
kagent invoke --agent "nginx-config-agent" --task "List all nginx configuration files"Validates nginx configuration for common issues.
Parameters:
config_content(string): The nginx configuration to validate
Checks:
- Syntax errors (missing semicolons, braces)
- Security issues (SSL config, headers, server tokens)
- Performance problems (inefficient regex, missing optimizations)
- Best practices violations
Returns:
{
"issues": ["Line 10: Missing semicolon in proxy_pass"],
"warnings": ["Port 443 without SSL configuration"],
"recommendations": ["Consider adding 'server_tokens off'"],
"total_lines": 50,
"has_ssl": true,
"has_proxy": true
}Performs comprehensive analysis of nginx configuration.
Parameters:
config_content(string): The nginx configuration to analyze
Analysis:
- Validation results
- Server block structure
- Directive usage statistics
- Configuration complexity metrics
Example:
kagent invoke --agent "nginx-config-agent" --task "Analyze the nginx configuration for performance issues"Updates a Kubernetes ConfigMap with new content.
Parameters:
configmap_name(string): Name of ConfigMap (must be in allowed list)namespace(string): Kubernetes namespace (default, nginx, or kagent)key(string): Key in ConfigMap data (e.g., nginx.conf)content(string): New configuration content
Security:
- Only allowed ConfigMaps: nginx-config, nginx-html, nginx-conf, etc.
- Only allowed namespaces: default, nginx, kagent
- Only allowed keys: nginx.conf, default.conf, index.html, etc.
- Content validation for dangerous commands
Example:
kagent invoke --agent "nginx-config-agent" --task "Fix the nginx configuration in ConfigMap nginx-config"Restarts a Kubernetes deployment to apply changes.
Parameters:
deployment_name(string): Name of deployment (must be in allowed list)namespace(string): Kubernetes namespace
Security:
- Only allowed deployments: nginx-test, nginx, nginx-server, etc.
- Only allowed namespaces: default, nginx, kagent
- All restarts are logged for audit
Example:
kagent invoke --agent "nginx-config-agent" --task "Restart the nginx-test deployment"Gets pods by label selector with deployment information.
Parameters:
label_selector(string): Label selector (e.g., "app=nginx-test")namespace(string): Kubernetes namespace
Returns:
- List of pods with names, status, and deployment names
Example:
kagent invoke --agent "nginx-config-agent" --task "Find all nginx pods in the kagent namespace"Note: The following troubleshooting tools are implemented in the MCP server but are not currently configured in the nginx agent. To enable them, add them to the
toolNameslist innginx-agent.yaml.
Retrieves container logs for troubleshooting.
Parameters:
pod_name(string): Name of the podnamespace(string): Kubernetes namespacecontainer_name(string, optional): Specific containertail_lines(integer, optional): Number of lines (default: 50)
Security:
- Only nginx-related pods allowed
- Limited to allowed namespaces
To enable:
# Add to nginx-agent.yaml toolNames list
toolNames: [..., "get_container_logs"]Gets Kubernetes events for a pod.
Parameters:
pod_name(string): Name of the podnamespace(string): Kubernetes namespace
Returns:
- Kubernetes events sorted by timestamp
To enable:
# Add to nginx-agent.yaml toolNames list
toolNames: [..., "check_pod_events"]Checks pod resource usage and limits.
Parameters:
pod_name(string): Name of the podnamespace(string): Kubernetes namespace
Returns:
- Resource limits and requests
- Resource usage information
- Pod conditions
To enable:
# Add to nginx-agent.yaml toolNames list
toolNames: [..., "check_pod_resources"]Tests network connectivity to backend services.
Parameters:
service_name(string): Name of the servicenamespace(string): Kubernetes namespaceport(integer, optional): Port to test (default: 80)
To enable:
# Add to nginx-agent.yaml toolNames list
toolNames: [..., "test_network_connectivity"]# Simple check
kagent invoke --agent "nginx-config-agent" --task "Check the nginx configuration and fix any issues"
# Agent will:
# 1. List available configs
# 2. Read nginx.conf from ConfigMap
# 3. Validate configuration
# 4. Fix detected issues
# 5. Update ConfigMap
# 6. Restart deployment
# 7. Verify fixkagent invoke --agent "nginx-config-agent" --task "Perform a security audit of the nginx configuration. Check for SSL/TLS issues, missing security headers, and vulnerabilities. Fix any issues found."
# Agent will:
# 1. Read configuration
# 2. Check SSL/TLS protocols
# 3. Verify security headers
# 4. Check server_tokens setting
# 5. Apply security fixes
# 6. Restart deploymentkagent invoke --agent "nginx-config-agent" --task "Analyze nginx configuration for performance issues. Check for caching, compression, and optimize worker settings."
# Agent will:
# 1. Analyze configuration structure
# 2. Check worker_processes and connections
# 3. Verify gzip compression
# 4. Check caching headers
# 5. Recommend optimizations
# 6. Apply improvementskagent invoke --agent "nginx-config-agent" --task "The nginx-test pod is crashing. Investigate the issue and fix the problem."
# Agent will:
# 1. Get pod status using get_pods_by_label
# 2. Get deployment info using get_deployment_from_pod
# 3. Find ConfigMaps using get_configmaps_from_deployment
# 4. Read configuration using read_configmap
# 5. Analyze configuration using validate_nginx_config and analyze_nginx_config
# 6. Apply fix using update_configmap
# 7. Restart deployment using restart_deployment
# 8. Verify resolutionkagent invoke --agent "nginx-config-agent" --task "We need to migrate to nginx 1.25 configuration. Update the configuration to use modern best practices and remove deprecated directives."
# Agent will:
# 1. Read current configuration
# 2. Identify deprecated directives
# 3. Update to modern syntax
# 4. Add recommended settings
# 5. Test configuration
# 6. Apply changesOnly allowed to operate in specific namespaces:
default- Default namespacenginx- Nginx-specific namespacekagent- KAgent system namespace
Any attempt to access other namespaces is denied with a security violation.
Allowed ConfigMaps:
- nginx-config
- nginx-html
- nginx-conf
- nginx-default
- nginx-server
Allowed Deployments:
- nginx-test
- nginx
- nginx-server
- nginx-proxy
- nginx-ingress
Allowed ConfigMap Keys:
- nginx.conf
- default.conf
- index.html
- server.conf
- ssl.conf
Forbidden Paths:
- System directories:
/etc/passwd,/etc/shadow,/root/ - User directories:
/home/ - System logs:
/var/log/ - System directories:
/proc/,/sys/,/dev/ - Binary directories:
/usr/bin/,/sbin/,/bin/ - Sensitive configs:
/etc/ssl/,/etc/ssh/
Dangerous Command Detection:
rm -rf- Recursive deletionsudo- Privilege escalationchmod 777- Insecure permissionswget,curl- Arbitrary downloads- Shell execution commands
Every operation is logged with:
- Timestamp: UTC timestamp
- Event Type: Operation performed
- User Context: Who/what initiated the action
- Details: Full operation details
- Security Validated: Confirmation of security checks
Logged Operations:
- ConfigMap updates
- Deployment restarts
- Configuration changes
- Resource access
Use secure_yamls/nginx-agent-secure.yaml for production:
# Removed dangerous tools:
# - write_file (direct file writing)
# - apply_manifest (arbitrary resource creation)
# Retained safe tools:
# - read_file (read-only access)
# - validate_nginx_config (validation only)
# - analyze_nginx_config (analysis only)
# - update_configmap (restricted updates)
# - restart_deployment (restricted restarts)The nginx-config-monitoring.yaml defines event-driven automation:
Triggers when nginx pods restart unexpectedly.
Monitored Event: pod-restart
Agent Actions:
- Configuration check for syntax errors
- Review nginx error logs
- Verify upstream configurations
- Check SSL/TLS certificates
- Automated remediation
- Validation with
nginx -t
Triggers when nginx pods can't be scheduled.
Monitored Event: pod-pending
Agent Actions:
- Check resource requests
- Verify ConfigMaps exist
- Check for missing secrets
- Validate volume mounts
- Fix configuration dependencies
Triggers when health checks fail.
Monitored Event: probe-failed
Agent Actions:
- Check health endpoint configuration
- Verify backend service health
- Analyze SSL termination
- Check rate limiting
- Fix health check endpoints
Triggers when pods are killed due to memory.
Monitored Event: oom-kill
Agent Actions:
- Review worker_processes configuration
- Check buffer size settings
- Analyze memory usage patterns
- Optimize configuration for memory
- Adjust resource limits
Create your own hooks for specific scenarios:
apiVersion: kagent.dev/v1alpha2
kind: Hook
metadata:
name: nginx-custom-hook
namespace: kagent
spec:
eventConfigurations:
- eventType: custom-event
agentRef:
name: nginx-config-agent
namespace: kagent
prompt: |
Custom analysis prompt for your specific use case.
Pod Details:
- Pod: {{.ResourceName}}
- Namespace: {{.Namespace}}
- Event: {{.EventMessage}}
Please analyze and provide remediation.Symptoms:
- Agent commands timeout
- No logs from agent pod
Diagnosis:
# Check agent pod status
kubectl get pods -n kagent -l app=nginx-config-agent
# Check agent logs
kubectl logs -n kagent deployment/nginx-config-agent --tail=50
# Check agent description
kubectl describe agent nginx-config-agent -n kagentSolution:
# Restart agent
kubectl rollout restart deployment/nginx-config-agent -n kagent
# Check MCP server connectivity
kubectl exec -n kagent deployment/file-reader-mcpserver -- curl http://localhost:3000Symptoms:
- Error: "ConfigMap validation failed"
- Security violation messages
Diagnosis:
# Check ConfigMap name
kubectl get configmap -n kagent | grep nginx
# Check agent security configuration
kubectl get agent nginx-config-agent -n kagent -o yaml | grep toolNamesSolution:
- Ensure ConfigMap name is in allowed list
- Verify namespace is correct
- Check content doesn't contain dangerous commands
Symptoms:
- "Unknown tool" errors
- Missing tool capabilities
Diagnosis:
# List available tools
kubectl exec -n kagent deployment/file-reader-mcpserver -- python3 -c "
import urllib.request, json
req = urllib.request.Request('http://localhost:3000/mcp',
data=json.dumps({'jsonrpc': '2.0', 'id': 1, 'method': 'tools/list', 'params': {}}).encode(),
headers={'Content-Type': 'application/json'})
response = urllib.request.urlopen(req, timeout=5)
print(json.loads(response.read().decode()).get('result', {}).get('tools', []))
"Solution:
# Rebuild and reload Docker image
docker build -t local-file-reader:v1 .
kind load docker-image local-file-reader:v1
# Restart MCP server
kubectl rollout restart deployment/file-reader-mcpserver -n kagentSymptoms:
- Agent fixes config but pod still crashes
- CrashLoopBackOff status
Diagnosis:
# Use agent for comprehensive troubleshooting
kagent invoke --agent "nginx-config-agent" --task "The nginx-test pod is still crashing after config fix. Investigate the issue and provide detailed analysis."Manual Diagnosis:
# Check logs
kubectl logs -n kagent <nginx-pod-name> --tail=100
# Check events
kubectl get events -n kagent --field-selector involvedObject.name=<nginx-pod-name>
# Check resources
kubectl describe pod <nginx-pod-name> -n kagent
# Test manually
kubectl exec -n kagent <nginx-pod-name> -- nginx -tEnable verbose logging for troubleshooting:
# Check MCP server logs
kubectl logs -n kagent deployment/file-reader-mcpserver -f
# Check agent logs with context
kubectl logs -n kagent deployment/nginx-config-agent -f --tail=100
# Port forward for direct testing
kubectl port-forward -n kagent deployment/file-reader-mcpserver 3000:3000Test MCP server tools without the agent:
# Test read_file
kubectl exec -n kagent deployment/file-reader-mcpserver -- python3 -c "
import urllib.request, json
data = {
'jsonrpc': '2.0',
'id': 1,
'method': 'tools/call',
'params': {
'name': 'read_file',
'arguments': {'file_path': '/etc/nginx/nginx.conf'}
}
}
req = urllib.request.Request('http://localhost:3000/mcp',
data=json.dumps(data).encode(),
headers={'Content-Type': 'application/json'})
response = urllib.request.urlopen(req, timeout=5)
print(json.dumps(json.loads(response.read().decode()), indent=2))
"
# Test validate_nginx_config
kubectl exec -n kagent deployment/file-reader-mcpserver -- python3 -c "
import urllib.request, json
config = '''
server {
listen 80;
server_name example.com;
}
'''
data = {
'jsonrpc': '2.0',
'id': 2,
'method': 'tools/call',
'params': {
'name': 'validate_nginx_config',
'arguments': {'config_content': config}
}
}
req = urllib.request.Request('http://localhost:3000/mcp',
data=json.dumps(data).encode(),
headers={'Content-Type': 'application/json'})
response = urllib.request.urlopen(req, timeout=5)
print(json.dumps(json.loads(response.read().decode()), indent=2))
"The MCP server implements several troubleshooting tools that are not currently configured in the nginx agent. To enable them:
- Edit the nginx-agent.yaml file:
toolNames: [
"read_file",
"validate_nginx_config",
"analyze_nginx_config",
"list_nginx_configs",
"write_file",
"update_configmap",
"restart_deployment",
"apply_manifest",
"get_deployment_from_pod",
"get_pods_by_label",
"get_configmaps_from_deployment",
"read_configmap",
# Add these for enhanced troubleshooting:
"get_container_logs",
"check_pod_events",
"check_pod_resources",
"test_network_connectivity"
]- Apply the updated configuration:
kubectl apply -f nginx-agent.yaml- Verify the tools are available:
kagent invoke --agent "nginx-config-agent" --task "List all available tools"Modify file_reader.py to add custom nginx config locations:
DEFAULT_SEARCH_DIRS = [
"/tmp/shared_data",
"/etc/nginx",
"/etc/nginx/conf.d",
"/etc/nginx-configs",
"/your/custom/path", # Add your path
]Extend validation in file_reader.py:
def validate_nginx_config(config_content: str):
issues = []
# Add custom validation rule
if 'custom_directive' not in config_content:
issues.append("Missing required custom_directive")
# Your custom checks here
return {"issues": issues, ...}Send notifications when issues are detected:
# Add to Hook prompt
After analysis, send notification:
curl -X POST https://your-webhook.com/nginx-alert \
-d '{"pod": "{{.ResourceName}}", "status": "fixed"}'Export metrics to Prometheus:
# Add to file_reader.py
from prometheus_client import Counter, start_http_server
nginx_fixes = Counter('nginx_config_fixes_total', 'Total nginx fixes')
nginx_validations = Counter('nginx_config_validations_total', 'Total validations')Deploy agents across multiple clusters:
# Cluster 1
kubectl --context cluster1 apply -f nginx-agent.yaml
# Cluster 2
kubectl --context cluster2 apply -f nginx-agent.yaml
# Monitor all
kubectl --context cluster1 get agents -n kagent &
kubectl --context cluster2 get agents -n kagent &-
Use Secure Agent Configuration
kubectl apply -f secure_yamls/nginx-agent-secure.yaml
-
Configure Resource Limits
resources: limits: memory: "512Mi" cpu: "500m" requests: memory: "256Mi" cpu: "250m"
-
Enable Audit Logging
- Configure persistent volume for audit logs
- Ship logs to external SIEM system
-
Set Up Monitoring
- Monitor agent success/failure rates
- Track configuration fix frequency
- Alert on repeated failures
-
Test in Staging First
- Validate agent behavior in non-prod
- Test all workflows before production
- Monitor for false positives
-
Regular Updates
# Update Docker image docker build -t local-file-reader:v2 . kind load docker-image local-file-reader:v2 # Update deployment kubectl set image deployment/file-reader-mcpserver \ mcp-server=local-file-reader:v2 -n kagent
The Nginx Self-Healing Infrastructure provides autonomous configuration management and troubleshooting for nginx deployments in Kubernetes. By combining AI-powered agents, comprehensive validation tools, and event-driven automation, it reduces operational overhead and ensures nginx configurations remain secure, performant, and reliable.
- β‘ Faster Recovery: 30 minutes instead of hours (realistic timeframe)
- π Security: Automatic enforcement of best practices
- π€ Autonomous: No human intervention required
- π Transparent: Full audit trail of all actions
- π Kubernetes-Native: Seamless integration
- Recovery Time: Realistic recovery time is 30 minutes, not seconds
- Troubleshooting Tools: Some advanced troubleshooting tools are available but not configured by default
- Log Access: Container logs, events, and resource monitoring require additional configuration
- Network Testing: Backend connectivity testing is available but not enabled by default
- Deploy the infrastructure in your cluster
- Customize agent prompts for your use cases
- Add custom validation rules
- Integrate with your monitoring systems
- Extend to other applications (Apache, HAProxy, etc.)
For issues, questions, or contributions, please refer to the project repository.
Version: 1.0.0
Last Updated: 2025
License: MIT