ALPHA RELEASE (v0.1) — Aixgo is in active development. Not all features are complete. Production release planned for late 2025. Learn more →
← Back to Guides

Provider Integration Guide

Integrate Aixgo with OpenAI, Anthropic, Google Vertex AI, HuggingFace, and vector databases.

Provider Status

LLM Providers

ProviderStatusNotes
OpenAIAVAILABLEChat, streaming SSE, function calling, JSON mode
Anthropic (Claude)AVAILABLEMessages API, streaming SSE, tool use
Google GeminiAVAILABLEGenerateContent API, streaming SSE, function calling
xAI (Grok)AVAILABLEChat, streaming SSE, function calling (OpenAI-compatible)
Vertex AIAVAILABLEGoogle Cloud AI Platform, streaming SSE, function calling
HuggingFaceAVAILABLEBasic inference, streaming, Ollama/vLLM/cloud backends

Vector Databases

ProviderStatusNotes
FirestoreAVAILABLEGoogle Cloud serverless vector search
In-MemoryAVAILABLEDevelopment and testing
QdrantPLANNEDPlanned for v0.2
pgvectorPLANNEDPlanned for v0.2

Embedding Providers

ProviderStatusNotes
OpenAIAVAILABLEtext-embedding-3-small, text-embedding-3-large
HuggingFace APIAVAILABLEFree inference API, 100+ models
HuggingFace TEIAVAILABLESelf-hosted high-performance server

LLM Providers

OpenAI (GPT-4, GPT-3.5)

Supported models:

  • gpt-4 - Most capable, higher cost
  • gpt-4-turbo - Faster, lower cost than GPT-4
  • gpt-3.5-turbo - Fast, cost-effective

Configuration:

# config/agents.yaml
agents:
  - name: analyzer
    role: react
    model: gpt-4-turbo
    provider: openai
    api_key: ${OPENAI_API_KEY}
    temperature: 0.7
    max_tokens: 1000

Environment variables:

export OPENAI_API_KEY=sk-...

Go code:

import "github.com/aixgo-dev/aixgo/providers/openai"

agent := aixgo.NewAgent(
    aixgo.WithName("analyzer"),
    aixgo.WithModel("gpt-4-turbo"),
    aixgo.WithProvider(openai.Provider{
        APIKey: os.Getenv("OPENAI_API_KEY"),
    }),
)

Features:

  • ✅ Chat completions
  • ✅ Function calling (tools)
  • ✅ Streaming SSE responses
  • ✅ JSON mode
  • ✅ Token usage tracking

Pricing (as of 2025):

  • GPT-4 Turbo: $0.01 per 1K input tokens, $0.03 per 1K output tokens
  • GPT-3.5 Turbo: $0.0005 per 1K input tokens, $0.0015 per 1K output tokens

Anthropic (Claude)

Supported models:

  • claude-3-opus - Most capable
  • claude-3-sonnet - Balanced performance/cost
  • claude-3-haiku - Fastest, lowest cost

Configuration:

agents:
  - name: analyst
    role: react
    model: claude-3-sonnet
    provider: anthropic
    api_key: ${ANTHROPIC_API_KEY}
    temperature: 0.5
    max_tokens: 2000

Environment variables:

export ANTHROPIC_API_KEY=sk-ant-...

Go code:

import "github.com/aixgo-dev/aixgo/providers/anthropic"

agent := aixgo.NewAgent(
    aixgo.WithName("analyst"),
    aixgo.WithModel("claude-3-sonnet"),
    aixgo.WithProvider(anthropic.Provider{
        APIKey: os.Getenv("ANTHROPIC_API_KEY"),
    }),
)

Features:

  • ✅ Long context window (200K tokens supported by API)
  • ✅ Tool use
  • ✅ Streaming SSE responses
  • 🚧 Vision support (Planned)

Pricing:

  • Claude 3 Opus: $0.015 per 1K input tokens, $0.075 per 1K output tokens
  • Claude 3 Sonnet: $0.003 per 1K input tokens, $0.015 per 1K output tokens
  • Claude 3 Haiku: $0.00025 per 1K input tokens, $0.00125 per 1K output tokens

Google Vertex AI (Gemini)

Supported models:

  • gemini-1.5-pro - Most capable
  • gemini-1.5-flash - Fast, cost-effective

Configuration:

agents:
  - name: processor
    role: react
    model: gemini-1.5-flash
    provider: vertexai
    project_id: ${GCP_PROJECT_ID}
    location: us-central1
    temperature: 0.8

Authentication:

# Service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Or use gcloud default credentials
gcloud auth application-default login

Go code:

import "github.com/aixgo-dev/aixgo/providers/vertexai"

agent := aixgo.NewAgent(
    aixgo.WithName("processor"),
    aixgo.WithModel("gemini-1.5-flash"),
    aixgo.WithProvider(vertexai.Provider{
        ProjectID: os.Getenv("GCP_PROJECT_ID"),
        Location:  "us-central1",
    }),
)

Features:

  • ✅ Long context (2M tokens for Gemini 1.5)
  • ✅ Multimodal (text, images, video)
  • ✅ Function calling
  • ✅ Grounding with Google Search

Pricing:

  • Gemini 1.5 Pro: $0.00125 per 1K input chars, $0.005 per 1K output chars
  • Gemini 1.5 Flash: $0.000125 per 1K input chars, $0.000375 per 1K output chars

HuggingFace Inference API

Supported backends:

  • HuggingFace Inference API (cloud)
  • Ollama (local)
  • vLLM (self-hosted)

Supported models:

  • Any model on HuggingFace with Inference API enabled
  • Popular: meta-llama/Llama-2-70b-chat-hf, mistralai/Mixtral-8x7B-Instruct-v0.1

Configuration:

agents:
  - name: classifier
    role: react
    model: meta-llama/Llama-2-70b-chat-hf
    provider: huggingface
    api_key: ${HUGGINGFACE_API_KEY}
    endpoint: https://api-inference.huggingface.co

Environment variables:

export HUGGINGFACE_API_KEY=hf_...

Go code:

import "github.com/aixgo-dev/aixgo/providers/huggingface"

agent := aixgo.NewAgent(
    aixgo.WithName("classifier"),
    aixgo.WithModel("meta-llama/Llama-2-70b-chat-hf"),
    aixgo.WithProvider(huggingface.Provider{
        APIKey:   os.Getenv("HUGGINGFACE_API_KEY"),
        Endpoint: "https://api-inference.huggingface.co",
    }),
)

Features:

  • ✅ Open-source models
  • ✅ Self-hosted option (Ollama, vLLM)
  • ✅ Cloud backends
  • ✅ Streaming support
  • ✅ Custom fine-tuned models
  • ⚠️ Tool calling support (model-dependent)

Pricing:

  • Pay-per-request or dedicated endpoints
  • Varies by model size and usage

xAI (Grok)

Supported models:

  • gpt-4-turbo - Latest Grok model

Configuration:

agents:
  - name: researcher
    role: react
    model: gpt-4-turbo
    provider: xai
    api_key: ${XAI_API_KEY}

Environment variables:

export XAI_API_KEY=xai-...

Features:

  • ✅ Real-time web access
  • ✅ Tool calling
  • ✅ Long context window

Provider Comparison

ProviderBest ForContext LengthTool SupportCost
OpenAIGeneral purpose, function calling128K tokens✅ Excellent$$$
AnthropicLong documents, safety200K tokens✅ Excellent$$$$
Google VertexMultimodal, grounding2M tokens✅ Good$$
HuggingFaceOpen source, custom modelsVaries⚠️ Limited$
xAIReal-time info, research128K tokens✅ Good$$$

Multi-Provider Strategy

Fallback Configuration

Use multiple providers with automatic fallback:

agents:
  - name: resilient-analyzer
    role: react
    providers:
      - model: gpt-4-turbo
        provider: openai
        api_key: ${OPENAI_API_KEY}
      - model: claude-3-sonnet
        provider: anthropic
        api_key: ${ANTHROPIC_API_KEY}
      - model: gemini-1.5-flash
        provider: vertexai
        project_id: ${GCP_PROJECT_ID}
    fallback_strategy: cascade # Try each in order

If OpenAI fails, automatically try Anthropic, then Google.

Cost Optimization

Route based on complexity:

# Simple tasks: cheap model
- name: simple-classifier
  role: react
  model: gpt-3.5-turbo
  provider: openai

# Complex reasoning: capable model
- name: complex-analyzer
  role: react
  model: gpt-4-turbo
  provider: openai

Region-Specific Routing

# US region: Vertex AI (low latency)
- name: us-agent
  role: react
  model: gemini-1.5-flash
  provider: vertexai
  location: us-central1

# EU region: OpenAI EU endpoint
- name: eu-agent
  role: react
  model: gpt-4-turbo
  provider: openai
  endpoint: https://api.openai.com/v1 # or EU-specific endpoint

Vector Databases & Embeddings

Overview

Aixgo provides integrated support for vector databases and embeddings, enabling Retrieval-Augmented Generation (RAG) systems. The architecture separates embedding generation from vector storage for maximum flexibility.

Architecture:

Documents → Embeddings Service → Vector Database → Semantic Search

Embedding Providers

OpenAI Embeddings

Best for: Production deployments, highest quality

Configuration:

embeddings:
  provider: openai
  openai:
    api_key: ${OPENAI_API_KEY}
    model: text-embedding-3-small # or text-embedding-3-large

Go code:

import "github.com/aixgo-dev/aixgo/pkg/embeddings"

config := embeddings.Config{
    Provider: "openai",
    OpenAI: &embeddings.OpenAIConfig{
        APIKey: os.Getenv("OPENAI_API_KEY"),
        Model:  "text-embedding-3-small",
    },
}

embSvc, err := embeddings.New(config)
if err != nil {
    log.Fatal(err)
}
defer embSvc.Close()

// Generate embedding
embedding, err := embSvc.Embed(ctx, "Your text here")

Models:

  • text-embedding-3-small: 1536 dimensions, $0.02 per 1M tokens
  • text-embedding-3-large: 3072 dimensions, $0.13 per 1M tokens
  • text-embedding-ada-002: 1536 dimensions (legacy)

HuggingFace Inference API

Best for: Development, cost-sensitive deployments

Configuration:

embeddings:
  provider: huggingface
  huggingface:
    model: sentence-transformers/all-MiniLM-L6-v2
    api_key: ${HUGGINGFACE_API_KEY} # Optional
    wait_for_model: true
    use_cache: true

Popular models:

  • sentence-transformers/all-MiniLM-L6-v2: 384 dims, fast
  • BAAI/bge-large-en-v1.5: 1024 dims, excellent quality
  • thenlper/gte-large: 1024 dims, multilingual

Pricing: FREE (Inference API) with rate limits

HuggingFace TEI (Self-Hosted)

Best for: High-throughput production workloads

Docker setup:

docker run -d \
  --name tei \
  -p 8080:8080 \
  --gpus all \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id BAAI/bge-large-en-v1.5

Configuration:

embeddings:
  provider: huggingface_tei
  huggingface_tei:
    endpoint: http://localhost:8080
    model: BAAI/bge-large-en-v1.5
    normalize: true

Vector Store Providers

Best for: Serverless production deployments on GCP

Setup:

# Enable Firestore
gcloud services enable firestore.googleapis.com

# Create vector index
gcloud firestore indexes composite create \
  --collection-group=embeddings \
  --query-scope=COLLECTION \
  --field-config=field-path=embedding,vector-config='{"dimension":"384","flat":{}}'

Configuration:

vectorstore:
  provider: firestore
  embedding_dimensions: 384
  firestore:
    project_id: ${GCP_PROJECT_ID}
    collection: embeddings
    credentials_file: /path/to/key.json # Optional

Go code:

import "github.com/aixgo-dev/aixgo/pkg/vectorstore"

config := vectorstore.Config{
    Provider:            "firestore",
    EmbeddingDimensions: 384,
    Firestore: &vectorstore.FirestoreConfig{
        ProjectID:  os.Getenv("GCP_PROJECT_ID"),
        Collection: "embeddings",
    },
}

store, err := vectorstore.New(config)
if err != nil {
    log.Fatal(err)
}
defer store.Close()

// Upsert documents
doc := vectorstore.Document{
    ID:        "doc-1",
    Content:   "Your document content",
    Embedding: embedding,
    Metadata: map[string]interface{}{
        "category": "documentation",
    },
}
store.Upsert(ctx, []vectorstore.Document{doc})

// Search
results, err := store.Search(ctx, vectorstore.SearchQuery{
    Embedding: queryEmbedding,
    TopK:      5,
    MinScore:  0.7,
})

Features:

  • ✅ Serverless, auto-scaling
  • ✅ Persistent storage
  • ✅ Real-time updates
  • ✅ ACID transactions

Pricing: ~$0.06 per 100K reads + storage

In-Memory Vector Store

Best for: Development, testing, prototyping

Configuration:

vectorstore:
  provider: memory
  embedding_dimensions: 384
  memory:
    max_documents: 10000

Features:

  • ✅ Zero setup
  • ✅ Fast for small datasets
  • ❌ Data lost on restart
  • ❌ Limited capacity

Qdrant (Planned - v0.2)

High-performance dedicated vector database:

# Coming soon
vectorstore:
  provider: qdrant
  embedding_dimensions: 384
  qdrant:
    host: localhost
    port: 6333
    collection: knowledge_base

pgvector (Planned - v0.2)

PostgreSQL extension for vector search:

# Coming soon
vectorstore:
  provider: pgvector
  embedding_dimensions: 384
  pgvector:
    connection_string: postgresql://user:pass@localhost/db
    table: embeddings

Complete RAG Example

package main

import (
    "context"
    "log"

    "github.com/aixgo-dev/aixgo/pkg/embeddings"
    "github.com/aixgo-dev/aixgo/pkg/vectorstore"
)

func main() {
    ctx := context.Background()

    // Setup embeddings
    embConfig := embeddings.Config{
        Provider: "huggingface",
        HuggingFace: &embeddings.HuggingFaceConfig{
            Model: "sentence-transformers/all-MiniLM-L6-v2",
        },
    }
    embSvc, _ := embeddings.New(embConfig)
    defer embSvc.Close()

    // Setup vector store
    storeConfig := vectorstore.Config{
        Provider:            "firestore",
        EmbeddingDimensions: embSvc.Dimensions(),
        Firestore: &vectorstore.FirestoreConfig{
            ProjectID:  "my-project",
            Collection: "knowledge_base",
        },
    }
    store, _ := vectorstore.New(storeConfig)
    defer store.Close()

    // Index documents
    docs := []string{
        "Aixgo is a production-grade AI framework",
        "RAG combines retrieval with generation",
    }

    for i, content := range docs {
        emb, _ := embSvc.Embed(ctx, content)
        doc := vectorstore.Document{
            ID:        fmt.Sprintf("doc-%d", i),
            Content:   content,
            Embedding: emb,
        }
        store.Upsert(ctx, []vectorstore.Document{doc})
    }

    // Search
    query := "What is Aixgo?"
    queryEmb, _ := embSvc.Embed(ctx, query)
    results, _ := store.Search(ctx, vectorstore.SearchQuery{
        Embedding: queryEmb,
        TopK:      3,
    })

    for _, result := range results {
        fmt.Printf("Score: %.2f - %s\n", result.Score, result.Document.Content)
    }
}

Provider Comparison: Embeddings

ProviderCostQualitySpeedBest For
OpenAI$0.02-0.13/1M tokensExcellentFastProduction
HuggingFace APIFreeGood-ExcellentMediumDevelopment
HuggingFace TEIFree (self-host)Good-ExcellentVery FastHigh-volume

Provider Comparison: Vector Stores

ProviderPersistenceScalabilitySetupCost
MemoryNoLowNoneFree
FirestoreYesUnlimitedMedium$$
Qdrant (planned)YesVery HighMediumSelf-host
pgvector (planned)YesHighMediumSelf-host

Learn More

API Key Management

Environment Variables

# .env file
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GCP_PROJECT_ID=my-project
HUGGINGFACE_API_KEY=hf_...

Load with:

export $(cat .env | xargs)

Kubernetes Secrets

kubectl create secret generic llm-keys \
  --from-literal=OPENAI_API_KEY=sk-... \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-...

Reference in deployment:

env:
  - name: OPENAI_API_KEY
    valueFrom:
      secretKeyRef:
        name: llm-keys
        key: OPENAI_API_KEY

Cloud Secret Managers

Google Secret Manager:

import "cloud.google.com/go/secretmanager/apiv1"

func getAPIKey(ctx context.Context, secretName string) (string, error) {
    client, _ := secretmanager.NewClient(ctx)
    result, _ := client.AccessSecretVersion(ctx, &secretmanagerpb.AccessSecretVersionRequest{
        Name: secretName,
    })
    return string(result.Payload.Data), nil
}

Rate Limiting & Retries

Provider Rate Limits

ProviderTierRequests/MinTokens/Min
OpenAIFree340,000
OpenAIPaid Tier 150090,000
AnthropicFree525,000
AnthropicPaid50100,000
Vertex AIDefault6060,000

Retry Configuration

agents:
  - name: resilient-agent
    role: react
    model: gpt-4-turbo
    provider: openai
    retry:
      max_attempts: 3
      initial_backoff: 1s
      max_backoff: 10s
      multiplier: 2
      retry_on:
        - rate_limit
        - timeout
        - server_error

Monitoring Provider Performance

Track Latency by Provider

import "github.com/prometheus/client_golang/prometheus"

var providerLatency = prometheus.NewHistogramVec(
    prometheus.HistogramOpts{
        Name: "llm_provider_latency_seconds",
        Help: "LLM API call latency by provider",
    },
    []string{"provider", "model"},
)

// Aixgo tracks this automatically

Cost Tracking

observability:
  cost_tracking: true
  cost_alert_threshold: 100 # Alert if daily cost > $100

Best Practices

1. Use Environment-Specific Keys

# Development
OPENAI_API_KEY=sk-dev-...

# Production
OPENAI_API_KEY=sk-prod-...

2. Implement Fallback Providers

Always have a backup provider to avoid single point of failure.

3. Monitor Token Usage

Track and alert on unexpected token consumption:

observability:
  llm_observability:
    enabled: true
    track_tokens: true
    daily_token_limit: 1000000

4. Choose Models Strategically

  • Simple tasks: gpt-3.5-turbo, gemini-flash, claude-haiku
  • Complex reasoning: gpt-4-turbo, claude-3-opus
  • Long documents: claude-3-opus (200K), gemini-pro (2M)
  • Cost-sensitive: gemini-flash, gpt-3.5-turbo

5. Use Caching

Cache LLM responses for repeated queries:

import "github.com/aixgo-dev/aixgo/cache"

agent := aixgo.NewAgent(
    aixgo.WithName("cached-analyzer"),
    aixgo.WithCache(cache.NewRedisCache("localhost:6379")),
    aixgo.WithCacheTTL(1 * time.Hour),
)

Troubleshooting

Authentication Errors

Error: 401 Unauthorized

Solution:

  • Verify API key is correct
  • Check key has not expired
  • Ensure environment variable is loaded

Rate Limit Exceeded

Error: 429 Too Many Requests

Solution:

  • Implement exponential backoff
  • Reduce request rate
  • Upgrade to higher tier
  • Add multiple API keys for rotation

Timeout Errors

Error: Request timeout

Solution:

agents:
  - name: patient-agent
    role: react
    model: gpt-4-turbo
    timeout: 60s # Increase timeout

Next Steps