Provider Comparison

Compare vectorstore and embedding providers to choose the right stack

Choosing the right combination of vectorstore and embedding provider is critical for performance, cost, and developer experience. This guide helps you make informed decisions based on your requirements.

Why Provider Choice Matters

Your choice of vectorstore and embedding provider affects:

Performance: Query latency, throughput, and scalability
Cost: Development costs, API fees, and infrastructure expenses
Developer Experience: Setup complexity, debugging tools, and documentation
Production Readiness: Reliability, monitoring, and operational overhead

The good news: Aixgo’s abstraction layer lets you start simple and upgrade later without code changes.

Vector Store Providers

Detailed Comparison

Feature	Memory	Firestore	Qdrant	pgvector
Persistence	No	Yes	Yes	Yes
Scalability	10K docs	Unlimited	Very High	High
Setup Complexity	None	Medium	Medium	Medium
Cost	Free	Pay-per-use	Self-host/Cloud	Self-host
Collections	Yes	Yes	Yes	Yes
TTL Support	Yes	Yes	Yes	No
Multi-tenancy	Yes	Yes	Yes	Yes
Query Speed	Very Fast	Fast	Very Fast	Fast
Distributed	No	Yes	Yes	Yes
Backup/Recovery	No	Automatic	Manual	Manual
Best For	Dev/Test	Production	High Performance	PostgreSQL Apps

Memory

Ideal for: Development, testing, prototyping

import "github.com/aixgo-dev/aixgo/pkg/vectorstore/memory"

store, err := memory.New()
if err != nil {
    log.Fatal(err)
}

collection := store.Collection("my-data")

Pros:

Zero setup required
Blazing fast (in-memory)
Perfect for development and testing
No external dependencies

Cons:

Data lost on restart
Limited to single process
Not suitable for production
Memory constraints (typically 10K docs max)

When to use:

Local development
Integration tests
Proof of concepts
Learning and experimentation

Firestore

Ideal for: Managed production deployments, serverless applications

import (
    "github.com/aixgo-dev/aixgo/pkg/vectorstore/firestore"
    "cloud.google.com/go/firestore"
)

client, _ := firestore.NewClient(ctx, "project-id")
store, err := firestore.New(ctx, client)
if err != nil {
    log.Fatal(err)
}

collection := store.Collection("my-data")

Pros:

Fully managed (no ops overhead)
Automatic scaling
Built-in backup and recovery
Global distribution
Strong consistency guarantees
Generous free tier

Cons:

Vendor lock-in (GCP only)
Pay-per-operation pricing
Query complexity limitations
Network latency for queries

Pricing (approximate):

Free tier: 1 GB storage, 50K reads/day, 20K writes/day
Beyond free tier: $0.18/GB/month storage, $0.06 per 100K reads

When to use:

Production applications on GCP
Serverless deployments (Cloud Run, Cloud Functions)
Multi-region applications
When you want zero operational overhead

Qdrant

Ideal for: High-performance production, large-scale deployments

Status: Coming soon

import "github.com/aixgo-dev/aixgo/pkg/vectorstore/qdrant"

store, err := qdrant.New(ctx, qdrant.Config{
    URL: "http://localhost:6333",
})
if err != nil {
    log.Fatal(err)
}

collection := store.Collection("my-data")

Pros:

Extremely fast queries
Rich filtering capabilities
Excellent documentation
Active development
Cloud or self-hosted options
REST and gRPC APIs

Cons:

Requires infrastructure setup
Self-hosting operational overhead
Cloud pricing can be expensive at scale

Pricing (approximate):

Self-hosted: Free (infrastructure costs only)
Qdrant Cloud: Starting at $25/month for 1GB

When to use:

High-throughput applications
Complex filtering requirements
Large datasets (millions of vectors)
When you need maximum performance
Self-hosted infrastructure preferred

pgvector

Ideal for: PostgreSQL-based applications, existing database infrastructure

Status: Coming soon

import "github.com/aixgo-dev/aixgo/pkg/vectorstore/pgvector"

store, err := pgvector.New(ctx, "postgresql://user:pass@localhost/db")
if err != nil {
    log.Fatal(err)
}

collection := store.Collection("my-data")

Pros:

Leverage existing PostgreSQL infrastructure
ACID transactions
Mature ecosystem
Familiar SQL interface
Great for hybrid relational + vector data

Cons:

Manual scaling required
Performance depends on PostgreSQL tuning
No built-in TTL support
Less optimized for pure vector search

When to use:

Already using PostgreSQL
Need transactional consistency
Hybrid relational/vector queries
Want to avoid additional infrastructure

Embedding Providers

Detailed Comparison

Provider	Quality	Speed	Cost	Dimensions	Best For
HuggingFace API	Good-Excellent	Medium	Free	384-1024	Development
HuggingFace TEI	Good-Excellent	Very Fast	Self-host	384-1024	Production
OpenAI	Excellent	Fast	$0.02-0.13/1M tokens	1536-3072	Production

HuggingFace API

Ideal for: Development, testing, learning

import "github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"

embedder, err := huggingface.New(ctx, huggingface.Config{
    APIKey: os.Getenv("HF_API_KEY"),
    Model:  "sentence-transformers/all-MiniLM-L6-v2",
})
if err != nil {
    log.Fatal(err)
}

embedding, err := embedder.EmbedText(ctx, "Hello world")

Popular Models:

sentence-transformers/all-MiniLM-L6-v2 (384 dims, fast, good quality)
BAAI/bge-small-en-v1.5 (384 dims, excellent for English)
intfloat/multilingual-e5-base (768 dims, multilingual)

Pros:

Free tier available
Many model choices
Good quality embeddings
Easy to get started

Cons:

API rate limits
Network latency
Potential availability issues
Slower than self-hosted

Pricing:

Free tier with rate limits
Paid tiers starting at $9/month

When to use:

Development and prototyping
Low-volume applications
Budget-conscious projects
Testing different models

HuggingFace TEI (Text Embeddings Inference)

Ideal for: Production, high-throughput, cost-sensitive deployments

import "github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"

embedder, err := huggingface.New(ctx, huggingface.Config{
    BaseURL: "http://localhost:8080",
    Model:   "BAAI/bge-small-en-v1.5",
})
if err != nil {
    log.Fatal(err)
}

Deployment (Docker):

docker run -p 8080:80 \
    -v $PWD/data:/data \
    ghcr.io/huggingface/text-embeddings-inference:latest \
    --model-id BAAI/bge-small-en-v1.5

Pros:

Very fast (optimized inference)
No API costs (after infrastructure)
Batch processing support
GPU acceleration available
Full control over deployment

Cons:

Requires infrastructure setup
Operational overhead
GPU costs for maximum performance
Manual scaling required

Cost Estimate:

CPU instance: ~$50-100/month (Cloud Run, ECS)
GPU instance: ~$200-500/month (for high throughput)
Unlimited embeddings after infrastructure cost

When to use:

Production applications
High-volume embedding generation
Cost optimization at scale
When you need predictable latency

OpenAI

Ideal for: Production applications prioritizing quality over cost

import "github.com/aixgo-dev/aixgo/pkg/embeddings/openai"

embedder, err := openai.New(ctx, openai.Config{
    APIKey: os.Getenv("OPENAI_API_KEY"),
    Model:  "text-embedding-3-small",
})
if err != nil {
    log.Fatal(err)
}

Available Models:

text-embedding-3-small (1536 dims, $0.02/1M tokens)
text-embedding-3-large (3072 dims, $0.13/1M tokens)
text-embedding-ada-002 (1536 dims, $0.10/1M tokens, legacy)

Pros:

Excellent quality
Reliable infrastructure
Fast response times
Simple API
No operational overhead

Cons:

Costs scale with usage
Vendor lock-in
API rate limits (tier-based)
Less control over model

Pricing Examples (text-embedding-3-small at $0.02/1M tokens):

1K documents: ~$0.002 (essentially free)
100K documents: ~$0.20
1M documents: ~$2.00
10M documents: ~$20.00

When to use:

Production applications
When quality is critical
Moderate to high volume
When you want reliable, managed service

Recommended Stacks

Development Stack

Best for: Local development, prototyping, learning

Embedding: HuggingFace API (free tier)
Vectorstore: Memory

Setup time: 5 minutes

// Complete working example
import (
    "github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"
    "github.com/aixgo-dev/aixgo/pkg/vectorstore/memory"
)

// Embedding provider
embedder, _ := huggingface.New(ctx, huggingface.Config{
    APIKey: os.Getenv("HF_API_KEY"),
    Model:  "sentence-transformers/all-MiniLM-L6-v2",
})

// Vector store
store, _ := memory.New()
collection := store.Collection("dev-data")

Cost: Free Performance: Fast (local storage, API network latency) Best for: Getting started quickly

Production Stack (Managed)

Best for: Production applications on GCP, serverless deployments

Embedding: OpenAI text-embedding-3-small
Vectorstore: Firestore

Setup time: 30 minutes

import (
    "github.com/aixgo-dev/aixgo/pkg/embeddings/openai"
    "github.com/aixgo-dev/aixgo/pkg/vectorstore/firestore"
    "cloud.google.com/go/firestore"
)

// Embedding provider
embedder, _ := openai.New(ctx, openai.Config{
    APIKey: os.Getenv("OPENAI_API_KEY"),
    Model:  "text-embedding-3-small",
})

// Vector store
client, _ := firestore.NewClient(ctx, "project-id")
store, _ := firestore.New(ctx, client)
collection := store.Collection("prod-data")

Cost: ~$50-200/month (depending on scale) Performance: Fast, globally distributed Best for: Most production applications

Production Stack (Self-Hosted)

Best for: High-volume, cost-sensitive, maximum performance

Embedding: HuggingFace TEI (self-hosted)
Vectorstore: Qdrant (self-hosted or cloud)

Setup time: 2-4 hours

Infrastructure (Docker Compose):

version: '3.8'
services:
  embeddings:
    image: ghcr.io/huggingface/text-embeddings-inference:latest
    command: --model-id BAAI/bge-small-en-v1.5
    ports:
      - "8080:80"
    volumes:
      - ./data:/data

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - ./qdrant_data:/qdrant/storage

Application Code:

import (
    "github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"
    "github.com/aixgo-dev/aixgo/pkg/vectorstore/qdrant"
)

// Embedding provider (self-hosted TEI)
embedder, _ := huggingface.New(ctx, huggingface.Config{
    BaseURL: "http://localhost:8080",
    Model:   "BAAI/bge-small-en-v1.5",
})

// Vector store (self-hosted Qdrant)
store, _ := qdrant.New(ctx, qdrant.Config{
    URL: "http://localhost:6333",
})
collection := store.Collection("prod-data")

Cost: ~$100-300/month (infrastructure only) Performance: Extremely fast Best for: High-volume applications, cost optimization

Budget Stack

Best for: Bootstrapped startups, side projects, MVPs

Embedding: HuggingFace API (free tier)
Vectorstore: Firestore (free tier)

Setup time: 15 minutes

Cost: Free up to limits, then pay-as-you-go Performance: Good for low-moderate volume Best for: Getting to market fast with minimal costs

High-Volume Stack

Best for: Large-scale applications, millions of queries/day

Embedding: HuggingFace TEI (GPU-accelerated)
Vectorstore: Qdrant cluster

Cost: ~$500-2000/month Performance: Maximum throughput and minimal latency Best for: Applications with millions of users

Cost Calculator

Scenario 1: Small Application

Volume: 10K documents, 100K queries/month

Stack	Embedding Cost	Storage Cost	Total/Month
Dev (HF API + Memory)	Free	Free	$0
Budget (HF API + Firestore)	Free	Free	$0
Managed (OpenAI + Firestore)	$0.20	$0.18	$0.38
Self-Hosted (TEI + Qdrant)	$50	$50	$100

Recommendation: Budget stack (HF API + Firestore)

Scenario 2: Medium Application

Volume: 100K documents, 1M queries/month

Stack	Embedding Cost	Storage Cost	Total/Month
Budget (HF API + Firestore)	$9	$18	$27
Managed (OpenAI + Firestore)	$2.00	$18	$20
Self-Hosted (TEI + Qdrant)	$100	$100	$200

Recommendation: Managed stack (OpenAI + Firestore)

Scenario 3: Large Application

Volume: 1M documents, 10M queries/month

Stack	Embedding Cost	Storage Cost	Total/Month
Managed (OpenAI + Firestore)	$20	$180	$200
Self-Hosted (TEI + Qdrant)	$200	$200	$400

Recommendation: Self-hosted stack becomes cost-effective at this scale

Scenario 4: Enterprise Application

Volume: 10M documents, 100M queries/month

Stack	Embedding Cost	Storage Cost	Total/Month
Managed (OpenAI + Firestore)	$200	$1,800	$2,000
Self-Hosted (TEI + Qdrant)	$500	$1,000	$1,500

Recommendation: Self-hosted with dedicated infrastructure

Performance Benchmarks

Query Latency (p95)

Provider	Single Vector	Batch (10)	Batch (100)
Memory	0.1ms	0.5ms	3ms
Firestore	50ms	75ms	200ms
Qdrant (local)	1ms	5ms	25ms
Qdrant (cloud)	25ms	40ms	100ms

Notes:

Memory is fastest but not persistent
Firestore latency includes network round-trip
Qdrant local is nearly as fast as memory
All tested with 100K documents, 384-dimensional vectors

Embedding Generation

Provider	Single Text	Batch (10)	Batch (100)
HF API	200ms	500ms	2000ms
HF TEI (CPU)	50ms	200ms	800ms
HF TEI (GPU)	10ms	30ms	100ms
OpenAI	100ms	300ms	1500ms

Notes:

HF TEI with GPU is fastest
Batch processing significantly improves throughput
OpenAI offers good balance of speed and quality

Throughput (queries per second)

Provider	Single Thread	10 Threads	100 Threads
Memory	10,000	50,000	100,000
Firestore	100	500	2,000
Qdrant (local)	5,000	25,000	50,000
Qdrant (cloud)	500	2,500	10,000

Notes:

Memory throughput limited only by CPU
Firestore throughput limited by API quotas
Qdrant scales well with parallelism

Migration Paths

Memory to Firestore

When: Moving from development to production

Process:

Update configuration (no code changes needed)
Re-index your documents
Update monitoring/alerting

// Before (Memory)
store, _ := memory.New()

// After (Firestore) - same interface!
client, _ := firestore.NewClient(ctx, "project-id")
store, _ := firestore.New(ctx, client)

Downtime: None (parallel indexing possible) Difficulty: Easy Time: 1-2 hours

HuggingFace API to TEI

When: Scaling beyond free tier, need better performance

Process:

Deploy TEI container
Update base URL in configuration
Test embedding compatibility
Cutover

// Before (API)
embedder, _ := huggingface.New(ctx, huggingface.Config{
    APIKey: os.Getenv("HF_API_KEY"),
    Model:  "BAAI/bge-small-en-v1.5",
})

// After (TEI) - same interface!
embedder, _ := huggingface.New(ctx, huggingface.Config{
    BaseURL: "http://localhost:8080",
    Model:   "BAAI/bge-small-en-v1.5",
})

Downtime: None Difficulty: Medium Time: 2-4 hours

Firestore to Qdrant

When: Need better performance, higher volume, cost optimization

Process:

Deploy Qdrant
Export documents from Firestore
Re-index in Qdrant
Parallel run for validation
Cutover

// Before (Firestore)
client, _ := firestore.NewClient(ctx, "project-id")
store, _ := firestore.New(ctx, client)

// After (Qdrant) - same interface!
store, _ := qdrant.New(ctx, qdrant.Config{
    URL: "http://localhost:6333",
})

Downtime: None (with parallel indexing) Difficulty: Medium-Hard Time: 1-2 days

OpenAI to HuggingFace

When: Cost optimization, need offline capability

Challenges:

Different embedding dimensions (requires re-indexing)
Potential quality differences
Need to test thoroughly

Process:

Choose comparable HF model
Generate test embeddings
Evaluate quality on your use case
Re-index all documents
Cutover

Downtime: Depends on index size Difficulty: Hard Time: 1-2 weeks (including testing)

Important: Different embedding models are NOT compatible. You must re-index all data.

Decision Framework

Start Here

Ask yourself these questions:

What’s your deployment environment?
- GCP → Consider Firestore
- AWS/Azure → Consider self-hosted options
- Multi-cloud → Memory (dev) or self-hosted (prod)
What’s your budget?
- $0/month → HuggingFace API + Memory/Firestore
- $50-200/month → OpenAI + Firestore
- $200+/month → Self-hosted TEI + Qdrant
What’s your scale?
- <100K docs → Managed solutions
- 100K-1M docs → Either managed or self-hosted
- 1M docs → Self-hosted recommended
What’s your team’s expertise?
- Small team, no ops → Managed solutions
- DevOps capability → Self-hosted for better economics
What’s your performance requirement?
- <100ms p95 → Self-hosted required
- <500ms p95 → Managed solutions work
- 500ms p95 → Any solution works

Quick Decision Tree

Are you in production?
├─ No → HuggingFace API + Memory
└─ Yes
   └─ On GCP?
      ├─ Yes → OpenAI + Firestore
      └─ No
         └─ High volume (>1M queries/day)?
            ├─ Yes → HuggingFace TEI + Qdrant
            └─ No → OpenAI + Firestore

Next Steps

Start with the Development Stack: Get familiar with the APIs
Benchmark your use case: Real performance depends on your data
Plan for migration: Design with future scaling in mind
Monitor costs: Set up billing alerts early

Additional Resources

Vector Databases Guide: Deep dive into vector search
Production Deployment: Best practices for production
Observability: Monitoring your vector infrastructure
Provider Integration: Integrating custom providers