Production Deployment
Deploy Aixgo agents to production with best practices for scaling and monitoring.
Aixgo is built for production deployment. This guide covers deployment patterns, best practices, and strategies for running AI agents at scale with enterprise security and full observability.
Deployment Patterns
Pattern 1: Single Binary on Cloud Run / Lambda
The simplest production deployment: compile to a single binary and deploy to serverless platforms.
Pros:
- Zero infrastructure management
- Auto-scaling included
- Pay-per-request pricing
- <100ms cold start
Best for:
- API endpoints
- Event-driven workflows
- Low to medium throughput (<1000 req/s)
Example: Cloud Run
# Build stage
FROM golang:1.21 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o agent main.go
# Runtime stage
FROM scratch
COPY --from=builder /app/agent /agent
COPY --from=builder /app/config/ /config/
CMD ["/agent"]# Build and deploy
docker build -t gcr.io/my-project/aixgo-agent .
docker push gcr.io/my-project/aixgo-agent
gcloud run deploy aixgo-agent \
--image gcr.io/my-project/aixgo-agent \
--platform managed \
--region us-central1 \
--allow-unauthenticatedConfiguration:
# config/agents.yaml
supervisor:
name: coordinator
mode: local # Single process
max_rounds: 10Pattern 2: Container on Kubernetes
For higher control and customization, deploy as containers on Kubernetes.
Pros:
- Full control over scaling
- Persistent connections
- Custom networking
- Multi-region support
Best for:
- High throughput (>1000 req/s)
- Stateful workflows
- Complex networking requirements
Example: Kubernetes Deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: aixgo-agent
spec:
replicas: 3
selector:
matchLabels:
app: aixgo-agent
template:
metadata:
labels:
app: aixgo-agent
spec:
containers:
- name: agent
image: my-registry/aixgo-agent:latest
ports:
- containerPort: 8080
env:
- name: CONFIG_PATH
value: '/config/agents.yaml'
- name: LOG_LEVEL
value: 'info'
resources:
requests:
memory: '128Mi'
cpu: '100m'
limits:
memory: '512Mi'
cpu: '500m'
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: aixgo-agent
spec:
selector:
app: aixgo-agent
ports:
- port: 80
targetPort: 8080
type: LoadBalancerPattern 3: Edge Deployment
Deploy to edge devices, IoT gateways, or resource-constrained environments.
Pros:
- Minimal footprint (8MB binary)
- No external dependencies
- Works offline
- Low latency (local processing)
Best for:
- IoT devices
- Edge computing
- Offline-first applications
- Latency-sensitive workloads
Example: Raspberry Pi Deployment
# Cross-compile for ARM
GOOS=linux GOARCH=arm64 go build -o agent-arm64 main.go
# Deploy to device
scp agent-arm64 pi@192.168.1.100:/home/pi/aixgo/
scp config/agents.yaml pi@192.168.1.100:/home/pi/aixgo/config/
# Run as systemd service
ssh pi@192.168.1.100 'sudo systemctl start aixgo-agent'Systemd service file:
# /etc/systemd/system/aixgo-agent.service
[Unit]
Description=Aixgo Agent
After=network.target
[Service]
Type=simple
User=pi
WorkingDirectory=/home/pi/aixgo
ExecStart=/home/pi/aixgo/agent-arm64
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetConfiguration Management
Environment-Based Configs
Use different configs for different environments:
// main.go
package main
import (
"fmt"
"os"
"github.com/aixgo-dev/aixgo"
_ "github.com/aixgo-dev/aixgo/agents"
)
func main() {
env := os.Getenv("ENV")
if env == "" {
env = "local"
}
configPath := fmt.Sprintf("config/agents-%s.yaml", env)
if err := aixgo.Run(configPath); err != nil {
panic(err)
}
}Config files:
config/agents-local.yaml- Developmentconfig/agents-staging.yaml- Stagingconfig/agents-prod.yaml- Production
Secrets Management
Never hardcode API keys or secrets. Use environment variables or secret managers.
# config/agents-prod.yaml
supervisor:
name: coordinator
agents:
- name: analyzer
role: react
model: gpt-4-turbo
api_key: ${GROK_API_KEY} # From environment
prompt: 'Analyze the data'# Cloud Run
gcloud run deploy aixgo-agent \
--set-env-vars GROK_API_KEY=your-secret-key
# Kubernetes Secret
kubectl create secret generic aixgo-secrets \
--from-literal=GROK_API_KEY=your-secret-key
# Reference in deployment
env:
- name: GROK_API_KEY
valueFrom:
secretKeyRef:
name: aixgo-secrets
key: GROK_API_KEYHealth Checks & Monitoring
Health Endpoints
Implement health check endpoints for orchestration platforms:
// health.go
package main
import (
"net/http"
"time"
)
func healthHandler(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("OK"))
}
func readyHandler(w http.ResponseWriter, r *http.Request) {
// Check if agents are initialized
if !agentsReady() {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("READY"))
}
func main() {
http.HandleFunc("/health", healthHandler)
http.HandleFunc("/ready", readyHandler)
go func() {
http.ListenAndServe(":8080", nil)
}()
// Start agents
aixgo.Run("config/agents.yaml")
}Metrics & Logging
Use structured logging and metrics for observability:
import (
"log/slog"
"os"
)
func main() {
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
slog.SetDefault(logger)
slog.Info("Starting Aixgo agent", "env", os.Getenv("ENV"))
if err := aixgo.Run("config/agents.yaml"); err != nil {
slog.Error("Agent failed", "error", err)
os.Exit(1)
}
}Scaling Strategies
Vertical Scaling
Increase resources for a single instance:
# Kubernetes resources
resources:
requests:
memory: '256Mi' # Increase from 128Mi
cpu: '200m' # Increase from 100m
limits:
memory: '1Gi'
cpu: '1000m'When to use:
- Single-instance bottleneck
- Memory-intensive LLM operations
- Before horizontal scaling
Horizontal Scaling
Add more instances:
# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aixgo-agent-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aixgo-agent
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70When to use:
- High request volume
- Redundancy requirements
- Multi-region deployment
Best Practices
1. Use Minimal Base Images
Prefer FROM scratch or alpine for smallest attack surface:
FROM scratch # 0MB base
# or
FROM alpine:latest # ~5MB base2. Enable Observability from Day One
Configure OpenTelemetry before deploying:
# config/agents-prod.yaml
observability:
tracing: true
service_name: 'aixgo-prod'
exporter: 'otlp'
endpoint: 'otel-collector:4317'3. Set Resource Limits
Always define resource requests and limits:
resources:
requests: # Minimum guaranteed
memory: '128Mi'
cpu: '100m'
limits: # Maximum allowed
memory: '512Mi'
cpu: '500m'4. Use Rolling Updates
Deploy with zero downtime:
# Kubernetes deployment strategy
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 05. Implement Graceful Shutdown
Handle termination signals properly:
import (
"context"
"os"
"os/signal"
"syscall"
"time"
)
func main() {
ctx, cancel := context.WithCancel(context.Background())
// Handle shutdown signals
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-sigChan
slog.Info("Shutdown signal received")
cancel()
}()
// Run agents with context
if err := aixgo.RunWithContext(ctx, "config/agents.yaml"); err != nil {
slog.Error("Agent failed", "error", err)
os.Exit(1)
}
}6. Monitor Cold Start Times
Track startup performance:
import "time"
func main() {
startTime := time.Now()
if err := aixgo.Run("config/agents.yaml"); err != nil {
panic(err)
}
slog.Info("Agent started", "duration_ms", time.Since(startTime).Milliseconds())
}Target: <100ms for serverless, <1s for containers
7. Use Configuration Validation
Validate configs before deployment:
# Pre-deployment validation
go run main.go --validate-config config/agents-prod.yamlPerformance Benchmarks
Real-world production metrics:
| Metric | Python (LangChain) | Aixgo | Improvement |
|---|---|---|---|
| Container Size | 1.2GB | 8MB | 150x smaller |
| Cold Start | 45 seconds | <100ms | 450x faster |
| Memory Baseline | 512MB | 50MB | 10x more efficient |
| Throughput | 500-1,000 req/s | 10,000 req/s | 10-20x higher |
Troubleshooting
High Memory Usage
Symptom: OOM kills, high memory consumption
Solutions:
- Reduce max_rounds to limit workflow iterations
- Implement message batching
- Increase memory limits
- Use pagination for large datasets
Slow Response Times
Symptom: High P99 latency
Solutions:
- Enable distributed tracing to identify bottlenecks
- Optimize LLM prompts (shorter = faster)
- Use faster models (gpt-3.5-turbo vs gpt-4)
- Add caching layer for repeated queries
Failed Health Checks
Symptom: Pods restarting, traffic not routing
Solutions:
- Increase initialDelaySeconds for slow startup
- Check agent initialization logic
- Verify configuration is valid
- Review logs for startup errors
Next Steps
- Observability & Monitoring - Set up comprehensive monitoring
- Building Docker Images - Optimize container builds
- Single Binary vs Distributed - Understand scaling patterns