Provider Integration Guide
Integrate Aixgo with OpenAI, Anthropic, Google Vertex AI, HuggingFace, and vector databases.
Provider Status
LLM Providers
| Provider | Status | Notes |
|---|---|---|
| OpenAI | AVAILABLE | Chat, streaming SSE, function calling, JSON mode |
| Anthropic (Claude) | AVAILABLE | Messages API, streaming SSE, tool use |
| Google Gemini | AVAILABLE | GenerateContent API, streaming SSE, function calling |
| xAI (Grok) | AVAILABLE | Chat, streaming SSE, function calling (OpenAI-compatible) |
| Vertex AI | AVAILABLE | Google Cloud AI Platform, streaming SSE, function calling |
| HuggingFace | AVAILABLE | Basic inference, streaming, Ollama/vLLM/cloud backends |
Vector Databases
| Provider | Status | Notes |
|---|---|---|
| Firestore | AVAILABLE | Google Cloud serverless vector search |
| In-Memory | AVAILABLE | Development and testing |
| Qdrant | PLANNED | Planned for v0.2 |
| pgvector | PLANNED | Planned for v0.2 |
Embedding Providers
| Provider | Status | Notes |
|---|---|---|
| OpenAI | AVAILABLE | text-embedding-3-small, text-embedding-3-large |
| HuggingFace API | AVAILABLE | Free inference API, 100+ models |
| HuggingFace TEI | AVAILABLE | Self-hosted high-performance server |
LLM Providers
OpenAI (GPT-4, GPT-3.5)
Supported models:
gpt-4- Most capable, higher costgpt-4-turbo- Faster, lower cost than GPT-4gpt-3.5-turbo- Fast, cost-effective
Configuration:
# config/agents.yaml
agents:
- name: analyzer
role: react
model: gpt-4-turbo
provider: openai
api_key: ${OPENAI_API_KEY}
temperature: 0.7
max_tokens: 1000Environment variables:
export OPENAI_API_KEY=sk-...Go code:
import "github.com/aixgo-dev/aixgo/providers/openai"
agent := aixgo.NewAgent(
aixgo.WithName("analyzer"),
aixgo.WithModel("gpt-4-turbo"),
aixgo.WithProvider(openai.Provider{
APIKey: os.Getenv("OPENAI_API_KEY"),
}),
)Features:
- ✅ Chat completions
- ✅ Function calling (tools)
- ✅ Streaming SSE responses
- ✅ JSON mode
- ✅ Token usage tracking
Pricing (as of 2025):
- GPT-4 Turbo: $0.01 per 1K input tokens, $0.03 per 1K output tokens
- GPT-3.5 Turbo: $0.0005 per 1K input tokens, $0.0015 per 1K output tokens
Anthropic (Claude)
Supported models:
claude-3-opus- Most capableclaude-3-sonnet- Balanced performance/costclaude-3-haiku- Fastest, lowest cost
Configuration:
agents:
- name: analyst
role: react
model: claude-3-sonnet
provider: anthropic
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.5
max_tokens: 2000Environment variables:
export ANTHROPIC_API_KEY=sk-ant-...Go code:
import "github.com/aixgo-dev/aixgo/providers/anthropic"
agent := aixgo.NewAgent(
aixgo.WithName("analyst"),
aixgo.WithModel("claude-3-sonnet"),
aixgo.WithProvider(anthropic.Provider{
APIKey: os.Getenv("ANTHROPIC_API_KEY"),
}),
)Features:
- ✅ Long context window (200K tokens supported by API)
- ✅ Tool use
- ✅ Streaming SSE responses
- 🚧 Vision support (Planned)
Pricing:
- Claude 3 Opus: $0.015 per 1K input tokens, $0.075 per 1K output tokens
- Claude 3 Sonnet: $0.003 per 1K input tokens, $0.015 per 1K output tokens
- Claude 3 Haiku: $0.00025 per 1K input tokens, $0.00125 per 1K output tokens
Google Vertex AI (Gemini)
Supported models:
gemini-1.5-pro- Most capablegemini-1.5-flash- Fast, cost-effective
Configuration:
agents:
- name: processor
role: react
model: gemini-1.5-flash
provider: vertexai
project_id: ${GCP_PROJECT_ID}
location: us-central1
temperature: 0.8Authentication:
# Service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Or use gcloud default credentials
gcloud auth application-default loginGo code:
import "github.com/aixgo-dev/aixgo/providers/vertexai"
agent := aixgo.NewAgent(
aixgo.WithName("processor"),
aixgo.WithModel("gemini-1.5-flash"),
aixgo.WithProvider(vertexai.Provider{
ProjectID: os.Getenv("GCP_PROJECT_ID"),
Location: "us-central1",
}),
)Features:
- ✅ Long context (2M tokens for Gemini 1.5)
- ✅ Multimodal (text, images, video)
- ✅ Function calling
- ✅ Grounding with Google Search
Pricing:
- Gemini 1.5 Pro: $0.00125 per 1K input chars, $0.005 per 1K output chars
- Gemini 1.5 Flash: $0.000125 per 1K input chars, $0.000375 per 1K output chars
HuggingFace Inference API
Supported backends:
- HuggingFace Inference API (cloud)
- Ollama (local)
- vLLM (self-hosted)
Supported models:
- Any model on HuggingFace with Inference API enabled
- Popular:
meta-llama/Llama-2-70b-chat-hf,mistralai/Mixtral-8x7B-Instruct-v0.1
Configuration:
agents:
- name: classifier
role: react
model: meta-llama/Llama-2-70b-chat-hf
provider: huggingface
api_key: ${HUGGINGFACE_API_KEY}
endpoint: https://api-inference.huggingface.coEnvironment variables:
export HUGGINGFACE_API_KEY=hf_...Go code:
import "github.com/aixgo-dev/aixgo/providers/huggingface"
agent := aixgo.NewAgent(
aixgo.WithName("classifier"),
aixgo.WithModel("meta-llama/Llama-2-70b-chat-hf"),
aixgo.WithProvider(huggingface.Provider{
APIKey: os.Getenv("HUGGINGFACE_API_KEY"),
Endpoint: "https://api-inference.huggingface.co",
}),
)Features:
- ✅ Open-source models
- ✅ Self-hosted option (Ollama, vLLM)
- ✅ Cloud backends
- ✅ Streaming support
- ✅ Custom fine-tuned models
- ⚠️ Tool calling support (model-dependent)
Pricing:
- Pay-per-request or dedicated endpoints
- Varies by model size and usage
xAI (Grok)
Supported models:
gpt-4-turbo- Latest Grok model
Configuration:
agents:
- name: researcher
role: react
model: gpt-4-turbo
provider: xai
api_key: ${XAI_API_KEY}Environment variables:
export XAI_API_KEY=xai-...Features:
- ✅ Real-time web access
- ✅ Tool calling
- ✅ Long context window
Provider Comparison
| Provider | Best For | Context Length | Tool Support | Cost |
|---|---|---|---|---|
| OpenAI | General purpose, function calling | 128K tokens | ✅ Excellent | $$$ |
| Anthropic | Long documents, safety | 200K tokens | ✅ Excellent | $$$$ |
| Google Vertex | Multimodal, grounding | 2M tokens | ✅ Good | $$ |
| HuggingFace | Open source, custom models | Varies | ⚠️ Limited | $ |
| xAI | Real-time info, research | 128K tokens | ✅ Good | $$$ |
Multi-Provider Strategy
Fallback Configuration
Use multiple providers with automatic fallback:
agents:
- name: resilient-analyzer
role: react
providers:
- model: gpt-4-turbo
provider: openai
api_key: ${OPENAI_API_KEY}
- model: claude-3-sonnet
provider: anthropic
api_key: ${ANTHROPIC_API_KEY}
- model: gemini-1.5-flash
provider: vertexai
project_id: ${GCP_PROJECT_ID}
fallback_strategy: cascade # Try each in orderIf OpenAI fails, automatically try Anthropic, then Google.
Cost Optimization
Route based on complexity:
# Simple tasks: cheap model
- name: simple-classifier
role: react
model: gpt-3.5-turbo
provider: openai
# Complex reasoning: capable model
- name: complex-analyzer
role: react
model: gpt-4-turbo
provider: openaiRegion-Specific Routing
# US region: Vertex AI (low latency)
- name: us-agent
role: react
model: gemini-1.5-flash
provider: vertexai
location: us-central1
# EU region: OpenAI EU endpoint
- name: eu-agent
role: react
model: gpt-4-turbo
provider: openai
endpoint: https://api.openai.com/v1 # or EU-specific endpointVector Databases & Embeddings
Overview
Aixgo provides integrated support for vector databases and embeddings, enabling Retrieval-Augmented Generation (RAG) systems. The architecture separates embedding generation from vector storage for maximum flexibility.
Architecture:
Documents → Embeddings Service → Vector Database → Semantic SearchEmbedding Providers
OpenAI Embeddings
Best for: Production deployments, highest quality
Configuration:
embeddings:
provider: openai
openai:
api_key: ${OPENAI_API_KEY}
model: text-embedding-3-small # or text-embedding-3-largeGo code:
import "github.com/aixgo-dev/aixgo/pkg/embeddings"
config := embeddings.Config{
Provider: "openai",
OpenAI: &embeddings.OpenAIConfig{
APIKey: os.Getenv("OPENAI_API_KEY"),
Model: "text-embedding-3-small",
},
}
embSvc, err := embeddings.New(config)
if err != nil {
log.Fatal(err)
}
defer embSvc.Close()
// Generate embedding
embedding, err := embSvc.Embed(ctx, "Your text here")Models:
text-embedding-3-small: 1536 dimensions, $0.02 per 1M tokenstext-embedding-3-large: 3072 dimensions, $0.13 per 1M tokenstext-embedding-ada-002: 1536 dimensions (legacy)
HuggingFace Inference API
Best for: Development, cost-sensitive deployments
Configuration:
embeddings:
provider: huggingface
huggingface:
model: sentence-transformers/all-MiniLM-L6-v2
api_key: ${HUGGINGFACE_API_KEY} # Optional
wait_for_model: true
use_cache: truePopular models:
sentence-transformers/all-MiniLM-L6-v2: 384 dims, fastBAAI/bge-large-en-v1.5: 1024 dims, excellent qualitythenlper/gte-large: 1024 dims, multilingual
Pricing: FREE (Inference API) with rate limits
HuggingFace TEI (Self-Hosted)
Best for: High-throughput production workloads
Docker setup:
docker run -d \
--name tei \
-p 8080:8080 \
--gpus all \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id BAAI/bge-large-en-v1.5Configuration:
embeddings:
provider: huggingface_tei
huggingface_tei:
endpoint: http://localhost:8080
model: BAAI/bge-large-en-v1.5
normalize: trueVector Store Providers
Firestore Vector Search
Best for: Serverless production deployments on GCP
Setup:
# Enable Firestore
gcloud services enable firestore.googleapis.com
# Create vector index
gcloud firestore indexes composite create \
--collection-group=embeddings \
--query-scope=COLLECTION \
--field-config=field-path=embedding,vector-config='{"dimension":"384","flat":{}}'Configuration:
vectorstore:
provider: firestore
embedding_dimensions: 384
firestore:
project_id: ${GCP_PROJECT_ID}
collection: embeddings
credentials_file: /path/to/key.json # OptionalGo code:
import "github.com/aixgo-dev/aixgo/pkg/vectorstore"
config := vectorstore.Config{
Provider: "firestore",
EmbeddingDimensions: 384,
Firestore: &vectorstore.FirestoreConfig{
ProjectID: os.Getenv("GCP_PROJECT_ID"),
Collection: "embeddings",
},
}
store, err := vectorstore.New(config)
if err != nil {
log.Fatal(err)
}
defer store.Close()
// Upsert documents
doc := vectorstore.Document{
ID: "doc-1",
Content: "Your document content",
Embedding: embedding,
Metadata: map[string]interface{}{
"category": "documentation",
},
}
store.Upsert(ctx, []vectorstore.Document{doc})
// Search
results, err := store.Search(ctx, vectorstore.SearchQuery{
Embedding: queryEmbedding,
TopK: 5,
MinScore: 0.7,
})Features:
- ✅ Serverless, auto-scaling
- ✅ Persistent storage
- ✅ Real-time updates
- ✅ ACID transactions
Pricing: ~$0.06 per 100K reads + storage
In-Memory Vector Store
Best for: Development, testing, prototyping
Configuration:
vectorstore:
provider: memory
embedding_dimensions: 384
memory:
max_documents: 10000Features:
- ✅ Zero setup
- ✅ Fast for small datasets
- ❌ Data lost on restart
- ❌ Limited capacity
Qdrant (Planned - v0.2)
High-performance dedicated vector database:
# Coming soon
vectorstore:
provider: qdrant
embedding_dimensions: 384
qdrant:
host: localhost
port: 6333
collection: knowledge_basepgvector (Planned - v0.2)
PostgreSQL extension for vector search:
# Coming soon
vectorstore:
provider: pgvector
embedding_dimensions: 384
pgvector:
connection_string: postgresql://user:pass@localhost/db
table: embeddingsComplete RAG Example
package main
import (
"context"
"log"
"github.com/aixgo-dev/aixgo/pkg/embeddings"
"github.com/aixgo-dev/aixgo/pkg/vectorstore"
)
func main() {
ctx := context.Background()
// Setup embeddings
embConfig := embeddings.Config{
Provider: "huggingface",
HuggingFace: &embeddings.HuggingFaceConfig{
Model: "sentence-transformers/all-MiniLM-L6-v2",
},
}
embSvc, _ := embeddings.New(embConfig)
defer embSvc.Close()
// Setup vector store
storeConfig := vectorstore.Config{
Provider: "firestore",
EmbeddingDimensions: embSvc.Dimensions(),
Firestore: &vectorstore.FirestoreConfig{
ProjectID: "my-project",
Collection: "knowledge_base",
},
}
store, _ := vectorstore.New(storeConfig)
defer store.Close()
// Index documents
docs := []string{
"Aixgo is a production-grade AI framework",
"RAG combines retrieval with generation",
}
for i, content := range docs {
emb, _ := embSvc.Embed(ctx, content)
doc := vectorstore.Document{
ID: fmt.Sprintf("doc-%d", i),
Content: content,
Embedding: emb,
}
store.Upsert(ctx, []vectorstore.Document{doc})
}
// Search
query := "What is Aixgo?"
queryEmb, _ := embSvc.Embed(ctx, query)
results, _ := store.Search(ctx, vectorstore.SearchQuery{
Embedding: queryEmb,
TopK: 3,
})
for _, result := range results {
fmt.Printf("Score: %.2f - %s\n", result.Score, result.Document.Content)
}
}Provider Comparison: Embeddings
| Provider | Cost | Quality | Speed | Best For |
|---|---|---|---|---|
| OpenAI | $0.02-0.13/1M tokens | Excellent | Fast | Production |
| HuggingFace API | Free | Good-Excellent | Medium | Development |
| HuggingFace TEI | Free (self-host) | Good-Excellent | Very Fast | High-volume |
Provider Comparison: Vector Stores
| Provider | Persistence | Scalability | Setup | Cost |
|---|---|---|---|---|
| Memory | No | Low | None | Free |
| Firestore | Yes | Unlimited | Medium | $$ |
| Qdrant (planned) | Yes | Very High | Medium | Self-host |
| pgvector (planned) | Yes | High | Medium | Self-host |
Learn More
- Vector Databases Guide - Complete RAG implementation guide
- Extending Aixgo - Add custom vector store providers
- RAG Agent Example - Full working example
API Key Management
Environment Variables
# .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GCP_PROJECT_ID=my-project
HUGGINGFACE_API_KEY=hf_...Load with:
export $(cat .env | xargs)Kubernetes Secrets
kubectl create secret generic llm-keys \
--from-literal=OPENAI_API_KEY=sk-... \
--from-literal=ANTHROPIC_API_KEY=sk-ant-...Reference in deployment:
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: llm-keys
key: OPENAI_API_KEYCloud Secret Managers
Google Secret Manager:
import "cloud.google.com/go/secretmanager/apiv1"
func getAPIKey(ctx context.Context, secretName string) (string, error) {
client, _ := secretmanager.NewClient(ctx)
result, _ := client.AccessSecretVersion(ctx, &secretmanagerpb.AccessSecretVersionRequest{
Name: secretName,
})
return string(result.Payload.Data), nil
}Rate Limiting & Retries
Provider Rate Limits
| Provider | Tier | Requests/Min | Tokens/Min |
|---|---|---|---|
| OpenAI | Free | 3 | 40,000 |
| OpenAI | Paid Tier 1 | 500 | 90,000 |
| Anthropic | Free | 5 | 25,000 |
| Anthropic | Paid | 50 | 100,000 |
| Vertex AI | Default | 60 | 60,000 |
Retry Configuration
agents:
- name: resilient-agent
role: react
model: gpt-4-turbo
provider: openai
retry:
max_attempts: 3
initial_backoff: 1s
max_backoff: 10s
multiplier: 2
retry_on:
- rate_limit
- timeout
- server_errorMonitoring Provider Performance
Track Latency by Provider
import "github.com/prometheus/client_golang/prometheus"
var providerLatency = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "llm_provider_latency_seconds",
Help: "LLM API call latency by provider",
},
[]string{"provider", "model"},
)
// Aixgo tracks this automaticallyCost Tracking
observability:
cost_tracking: true
cost_alert_threshold: 100 # Alert if daily cost > $100Best Practices
1. Use Environment-Specific Keys
# Development
OPENAI_API_KEY=sk-dev-...
# Production
OPENAI_API_KEY=sk-prod-...2. Implement Fallback Providers
Always have a backup provider to avoid single point of failure.
3. Monitor Token Usage
Track and alert on unexpected token consumption:
observability:
llm_observability:
enabled: true
track_tokens: true
daily_token_limit: 10000004. Choose Models Strategically
- Simple tasks: gpt-3.5-turbo, gemini-flash, claude-haiku
- Complex reasoning: gpt-4-turbo, claude-3-opus
- Long documents: claude-3-opus (200K), gemini-pro (2M)
- Cost-sensitive: gemini-flash, gpt-3.5-turbo
5. Use Caching
Cache LLM responses for repeated queries:
import "github.com/aixgo-dev/aixgo/cache"
agent := aixgo.NewAgent(
aixgo.WithName("cached-analyzer"),
aixgo.WithCache(cache.NewRedisCache("localhost:6379")),
aixgo.WithCacheTTL(1 * time.Hour),
)Troubleshooting
Authentication Errors
Error: 401 Unauthorized
Solution:
- Verify API key is correct
- Check key has not expired
- Ensure environment variable is loaded
Rate Limit Exceeded
Error: 429 Too Many Requests
Solution:
- Implement exponential backoff
- Reduce request rate
- Upgrade to higher tier
- Add multiple API keys for rotation
Timeout Errors
Error: Request timeout
Solution:
agents:
- name: patient-agent
role: react
model: gpt-4-turbo
timeout: 60s # Increase timeoutNext Steps
- Type Safety & LLM Integration - Type-safe provider usage
- Observability & Monitoring - Monitor provider performance
- Production Deployment - Deploy with secrets management