Provider Comparison
Compare vectorstore and embedding providers to choose the right stack
Choosing the right combination of vectorstore and embedding provider is critical for performance, cost, and developer experience. This guide helps you make informed decisions based on your requirements.
Why Provider Choice Matters
Your choice of vectorstore and embedding provider affects:
- Performance: Query latency, throughput, and scalability
- Cost: Development costs, API fees, and infrastructure expenses
- Developer Experience: Setup complexity, debugging tools, and documentation
- Production Readiness: Reliability, monitoring, and operational overhead
The good news: Aixgo’s abstraction layer lets you start simple and upgrade later without code changes.
Vector Store Providers
Detailed Comparison
| Feature | Memory | Firestore | Qdrant | pgvector |
|---|---|---|---|---|
| Persistence | No | Yes | Yes | Yes |
| Scalability | 10K docs | Unlimited | Very High | High |
| Setup Complexity | None | Medium | Medium | Medium |
| Cost | Free | Pay-per-use | Self-host/Cloud | Self-host |
| Collections | Yes | Yes | Yes | Yes |
| TTL Support | Yes | Yes | Yes | No |
| Multi-tenancy | Yes | Yes | Yes | Yes |
| Query Speed | Very Fast | Fast | Very Fast | Fast |
| Distributed | No | Yes | Yes | Yes |
| Backup/Recovery | No | Automatic | Manual | Manual |
| Best For | Dev/Test | Production | High Performance | PostgreSQL Apps |
Memory
Ideal for: Development, testing, prototyping
import "github.com/aixgo-dev/aixgo/pkg/vectorstore/memory"
store, err := memory.New()
if err != nil {
log.Fatal(err)
}
collection := store.Collection("my-data")Pros:
- Zero setup required
- Blazing fast (in-memory)
- Perfect for development and testing
- No external dependencies
Cons:
- Data lost on restart
- Limited to single process
- Not suitable for production
- Memory constraints (typically 10K docs max)
When to use:
- Local development
- Integration tests
- Proof of concepts
- Learning and experimentation
Firestore
Ideal for: Managed production deployments, serverless applications
import (
"github.com/aixgo-dev/aixgo/pkg/vectorstore/firestore"
"cloud.google.com/go/firestore"
)
client, _ := firestore.NewClient(ctx, "project-id")
store, err := firestore.New(ctx, client)
if err != nil {
log.Fatal(err)
}
collection := store.Collection("my-data")Pros:
- Fully managed (no ops overhead)
- Automatic scaling
- Built-in backup and recovery
- Global distribution
- Strong consistency guarantees
- Generous free tier
Cons:
- Vendor lock-in (GCP only)
- Pay-per-operation pricing
- Query complexity limitations
- Network latency for queries
Pricing (approximate):
- Free tier: 1 GB storage, 50K reads/day, 20K writes/day
- Beyond free tier: $0.18/GB/month storage, $0.06 per 100K reads
When to use:
- Production applications on GCP
- Serverless deployments (Cloud Run, Cloud Functions)
- Multi-region applications
- When you want zero operational overhead
Qdrant
Ideal for: High-performance production, large-scale deployments
Status: Coming soon
import "github.com/aixgo-dev/aixgo/pkg/vectorstore/qdrant"
store, err := qdrant.New(ctx, qdrant.Config{
URL: "http://localhost:6333",
})
if err != nil {
log.Fatal(err)
}
collection := store.Collection("my-data")Pros:
- Extremely fast queries
- Rich filtering capabilities
- Excellent documentation
- Active development
- Cloud or self-hosted options
- REST and gRPC APIs
Cons:
- Requires infrastructure setup
- Self-hosting operational overhead
- Cloud pricing can be expensive at scale
Pricing (approximate):
- Self-hosted: Free (infrastructure costs only)
- Qdrant Cloud: Starting at $25/month for 1GB
When to use:
- High-throughput applications
- Complex filtering requirements
- Large datasets (millions of vectors)
- When you need maximum performance
- Self-hosted infrastructure preferred
pgvector
Ideal for: PostgreSQL-based applications, existing database infrastructure
Status: Coming soon
import "github.com/aixgo-dev/aixgo/pkg/vectorstore/pgvector"
store, err := pgvector.New(ctx, "postgresql://user:pass@localhost/db")
if err != nil {
log.Fatal(err)
}
collection := store.Collection("my-data")Pros:
- Leverage existing PostgreSQL infrastructure
- ACID transactions
- Mature ecosystem
- Familiar SQL interface
- Great for hybrid relational + vector data
Cons:
- Manual scaling required
- Performance depends on PostgreSQL tuning
- No built-in TTL support
- Less optimized for pure vector search
When to use:
- Already using PostgreSQL
- Need transactional consistency
- Hybrid relational/vector queries
- Want to avoid additional infrastructure
Embedding Providers
Detailed Comparison
| Provider | Quality | Speed | Cost | Dimensions | Best For |
|---|---|---|---|---|---|
| HuggingFace API | Good-Excellent | Medium | Free | 384-1024 | Development |
| HuggingFace TEI | Good-Excellent | Very Fast | Self-host | 384-1024 | Production |
| OpenAI | Excellent | Fast | $0.02-0.13/1M tokens | 1536-3072 | Production |
HuggingFace API
Ideal for: Development, testing, learning
import "github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"
embedder, err := huggingface.New(ctx, huggingface.Config{
APIKey: os.Getenv("HF_API_KEY"),
Model: "sentence-transformers/all-MiniLM-L6-v2",
})
if err != nil {
log.Fatal(err)
}
embedding, err := embedder.EmbedText(ctx, "Hello world")Popular Models:
sentence-transformers/all-MiniLM-L6-v2(384 dims, fast, good quality)BAAI/bge-small-en-v1.5(384 dims, excellent for English)intfloat/multilingual-e5-base(768 dims, multilingual)
Pros:
- Free tier available
- Many model choices
- Good quality embeddings
- Easy to get started
Cons:
- API rate limits
- Network latency
- Potential availability issues
- Slower than self-hosted
Pricing:
- Free tier with rate limits
- Paid tiers starting at $9/month
When to use:
- Development and prototyping
- Low-volume applications
- Budget-conscious projects
- Testing different models
HuggingFace TEI (Text Embeddings Inference)
Ideal for: Production, high-throughput, cost-sensitive deployments
import "github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"
embedder, err := huggingface.New(ctx, huggingface.Config{
BaseURL: "http://localhost:8080",
Model: "BAAI/bge-small-en-v1.5",
})
if err != nil {
log.Fatal(err)
}Deployment (Docker):
docker run -p 8080:80 \
-v $PWD/data:/data \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id BAAI/bge-small-en-v1.5Pros:
- Very fast (optimized inference)
- No API costs (after infrastructure)
- Batch processing support
- GPU acceleration available
- Full control over deployment
Cons:
- Requires infrastructure setup
- Operational overhead
- GPU costs for maximum performance
- Manual scaling required
Cost Estimate:
- CPU instance: ~$50-100/month (Cloud Run, ECS)
- GPU instance: ~$200-500/month (for high throughput)
- Unlimited embeddings after infrastructure cost
When to use:
- Production applications
- High-volume embedding generation
- Cost optimization at scale
- When you need predictable latency
OpenAI
Ideal for: Production applications prioritizing quality over cost
import "github.com/aixgo-dev/aixgo/pkg/embeddings/openai"
embedder, err := openai.New(ctx, openai.Config{
APIKey: os.Getenv("OPENAI_API_KEY"),
Model: "text-embedding-3-small",
})
if err != nil {
log.Fatal(err)
}Available Models:
text-embedding-3-small(1536 dims, $0.02/1M tokens)text-embedding-3-large(3072 dims, $0.13/1M tokens)text-embedding-ada-002(1536 dims, $0.10/1M tokens, legacy)
Pros:
- Excellent quality
- Reliable infrastructure
- Fast response times
- Simple API
- No operational overhead
Cons:
- Costs scale with usage
- Vendor lock-in
- API rate limits (tier-based)
- Less control over model
Pricing Examples (text-embedding-3-small at $0.02/1M tokens):
- 1K documents: ~$0.002 (essentially free)
- 100K documents: ~$0.20
- 1M documents: ~$2.00
- 10M documents: ~$20.00
When to use:
- Production applications
- When quality is critical
- Moderate to high volume
- When you want reliable, managed service
Recommended Stacks
Development Stack
Best for: Local development, prototyping, learning
Embedding: HuggingFace API (free tier)
Vectorstore: MemorySetup time: 5 minutes
// Complete working example
import (
"github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"
"github.com/aixgo-dev/aixgo/pkg/vectorstore/memory"
)
// Embedding provider
embedder, _ := huggingface.New(ctx, huggingface.Config{
APIKey: os.Getenv("HF_API_KEY"),
Model: "sentence-transformers/all-MiniLM-L6-v2",
})
// Vector store
store, _ := memory.New()
collection := store.Collection("dev-data")Cost: Free Performance: Fast (local storage, API network latency) Best for: Getting started quickly
Production Stack (Managed)
Best for: Production applications on GCP, serverless deployments
Embedding: OpenAI text-embedding-3-small
Vectorstore: FirestoreSetup time: 30 minutes
import (
"github.com/aixgo-dev/aixgo/pkg/embeddings/openai"
"github.com/aixgo-dev/aixgo/pkg/vectorstore/firestore"
"cloud.google.com/go/firestore"
)
// Embedding provider
embedder, _ := openai.New(ctx, openai.Config{
APIKey: os.Getenv("OPENAI_API_KEY"),
Model: "text-embedding-3-small",
})
// Vector store
client, _ := firestore.NewClient(ctx, "project-id")
store, _ := firestore.New(ctx, client)
collection := store.Collection("prod-data")Cost: ~$50-200/month (depending on scale) Performance: Fast, globally distributed Best for: Most production applications
Production Stack (Self-Hosted)
Best for: High-volume, cost-sensitive, maximum performance
Embedding: HuggingFace TEI (self-hosted)
Vectorstore: Qdrant (self-hosted or cloud)Setup time: 2-4 hours
Infrastructure (Docker Compose):
version: '3.8'
services:
embeddings:
image: ghcr.io/huggingface/text-embeddings-inference:latest
command: --model-id BAAI/bge-small-en-v1.5
ports:
- "8080:80"
volumes:
- ./data:/data
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- ./qdrant_data:/qdrant/storageApplication Code:
import (
"github.com/aixgo-dev/aixgo/pkg/embeddings/huggingface"
"github.com/aixgo-dev/aixgo/pkg/vectorstore/qdrant"
)
// Embedding provider (self-hosted TEI)
embedder, _ := huggingface.New(ctx, huggingface.Config{
BaseURL: "http://localhost:8080",
Model: "BAAI/bge-small-en-v1.5",
})
// Vector store (self-hosted Qdrant)
store, _ := qdrant.New(ctx, qdrant.Config{
URL: "http://localhost:6333",
})
collection := store.Collection("prod-data")Cost: ~$100-300/month (infrastructure only) Performance: Extremely fast Best for: High-volume applications, cost optimization
Budget Stack
Best for: Bootstrapped startups, side projects, MVPs
Embedding: HuggingFace API (free tier)
Vectorstore: Firestore (free tier)Setup time: 15 minutes
Cost: Free up to limits, then pay-as-you-go Performance: Good for low-moderate volume Best for: Getting to market fast with minimal costs
High-Volume Stack
Best for: Large-scale applications, millions of queries/day
Embedding: HuggingFace TEI (GPU-accelerated)
Vectorstore: Qdrant clusterCost: ~$500-2000/month Performance: Maximum throughput and minimal latency Best for: Applications with millions of users
Cost Calculator
Scenario 1: Small Application
Volume: 10K documents, 100K queries/month
| Stack | Embedding Cost | Storage Cost | Total/Month |
|---|---|---|---|
| Dev (HF API + Memory) | Free | Free | $0 |
| Budget (HF API + Firestore) | Free | Free | $0 |
| Managed (OpenAI + Firestore) | $0.20 | $0.18 | $0.38 |
| Self-Hosted (TEI + Qdrant) | $50 | $50 | $100 |
Recommendation: Budget stack (HF API + Firestore)
Scenario 2: Medium Application
Volume: 100K documents, 1M queries/month
| Stack | Embedding Cost | Storage Cost | Total/Month |
|---|---|---|---|
| Budget (HF API + Firestore) | $9 | $18 | $27 |
| Managed (OpenAI + Firestore) | $2.00 | $18 | $20 |
| Self-Hosted (TEI + Qdrant) | $100 | $100 | $200 |
Recommendation: Managed stack (OpenAI + Firestore)
Scenario 3: Large Application
Volume: 1M documents, 10M queries/month
| Stack | Embedding Cost | Storage Cost | Total/Month |
|---|---|---|---|
| Managed (OpenAI + Firestore) | $20 | $180 | $200 |
| Self-Hosted (TEI + Qdrant) | $200 | $200 | $400 |
Recommendation: Self-hosted stack becomes cost-effective at this scale
Scenario 4: Enterprise Application
Volume: 10M documents, 100M queries/month
| Stack | Embedding Cost | Storage Cost | Total/Month |
|---|---|---|---|
| Managed (OpenAI + Firestore) | $200 | $1,800 | $2,000 |
| Self-Hosted (TEI + Qdrant) | $500 | $1,000 | $1,500 |
Recommendation: Self-hosted with dedicated infrastructure
Performance Benchmarks
Query Latency (p95)
| Provider | Single Vector | Batch (10) | Batch (100) |
|---|---|---|---|
| Memory | 0.1ms | 0.5ms | 3ms |
| Firestore | 50ms | 75ms | 200ms |
| Qdrant (local) | 1ms | 5ms | 25ms |
| Qdrant (cloud) | 25ms | 40ms | 100ms |
Notes:
- Memory is fastest but not persistent
- Firestore latency includes network round-trip
- Qdrant local is nearly as fast as memory
- All tested with 100K documents, 384-dimensional vectors
Embedding Generation
| Provider | Single Text | Batch (10) | Batch (100) |
|---|---|---|---|
| HF API | 200ms | 500ms | 2000ms |
| HF TEI (CPU) | 50ms | 200ms | 800ms |
| HF TEI (GPU) | 10ms | 30ms | 100ms |
| OpenAI | 100ms | 300ms | 1500ms |
Notes:
- HF TEI with GPU is fastest
- Batch processing significantly improves throughput
- OpenAI offers good balance of speed and quality
Throughput (queries per second)
| Provider | Single Thread | 10 Threads | 100 Threads |
|---|---|---|---|
| Memory | 10,000 | 50,000 | 100,000 |
| Firestore | 100 | 500 | 2,000 |
| Qdrant (local) | 5,000 | 25,000 | 50,000 |
| Qdrant (cloud) | 500 | 2,500 | 10,000 |
Notes:
- Memory throughput limited only by CPU
- Firestore throughput limited by API quotas
- Qdrant scales well with parallelism
Migration Paths
Memory to Firestore
When: Moving from development to production
Process:
- Update configuration (no code changes needed)
- Re-index your documents
- Update monitoring/alerting
// Before (Memory)
store, _ := memory.New()
// After (Firestore) - same interface!
client, _ := firestore.NewClient(ctx, "project-id")
store, _ := firestore.New(ctx, client)Downtime: None (parallel indexing possible) Difficulty: Easy Time: 1-2 hours
HuggingFace API to TEI
When: Scaling beyond free tier, need better performance
Process:
- Deploy TEI container
- Update base URL in configuration
- Test embedding compatibility
- Cutover
// Before (API)
embedder, _ := huggingface.New(ctx, huggingface.Config{
APIKey: os.Getenv("HF_API_KEY"),
Model: "BAAI/bge-small-en-v1.5",
})
// After (TEI) - same interface!
embedder, _ := huggingface.New(ctx, huggingface.Config{
BaseURL: "http://localhost:8080",
Model: "BAAI/bge-small-en-v1.5",
})Downtime: None Difficulty: Medium Time: 2-4 hours
Firestore to Qdrant
When: Need better performance, higher volume, cost optimization
Process:
- Deploy Qdrant
- Export documents from Firestore
- Re-index in Qdrant
- Parallel run for validation
- Cutover
// Before (Firestore)
client, _ := firestore.NewClient(ctx, "project-id")
store, _ := firestore.New(ctx, client)
// After (Qdrant) - same interface!
store, _ := qdrant.New(ctx, qdrant.Config{
URL: "http://localhost:6333",
})Downtime: None (with parallel indexing) Difficulty: Medium-Hard Time: 1-2 days
OpenAI to HuggingFace
When: Cost optimization, need offline capability
Challenges:
- Different embedding dimensions (requires re-indexing)
- Potential quality differences
- Need to test thoroughly
Process:
- Choose comparable HF model
- Generate test embeddings
- Evaluate quality on your use case
- Re-index all documents
- Cutover
Downtime: Depends on index size Difficulty: Hard Time: 1-2 weeks (including testing)
Important: Different embedding models are NOT compatible. You must re-index all data.
Decision Framework
Start Here
Ask yourself these questions:
What’s your deployment environment?
- GCP → Consider Firestore
- AWS/Azure → Consider self-hosted options
- Multi-cloud → Memory (dev) or self-hosted (prod)
What’s your budget?
- $0/month → HuggingFace API + Memory/Firestore
- $50-200/month → OpenAI + Firestore
- $200+/month → Self-hosted TEI + Qdrant
What’s your scale?
- <100K docs → Managed solutions
- 100K-1M docs → Either managed or self-hosted
1M docs → Self-hosted recommended
What’s your team’s expertise?
- Small team, no ops → Managed solutions
- DevOps capability → Self-hosted for better economics
What’s your performance requirement?
- <100ms p95 → Self-hosted required
- <500ms p95 → Managed solutions work
500ms p95 → Any solution works
Quick Decision Tree
Are you in production?
├─ No → HuggingFace API + Memory
└─ Yes
└─ On GCP?
├─ Yes → OpenAI + Firestore
└─ No
└─ High volume (>1M queries/day)?
├─ Yes → HuggingFace TEI + Qdrant
└─ No → OpenAI + FirestoreNext Steps
- Start with the Development Stack: Get familiar with the APIs
- Benchmark your use case: Real performance depends on your data
- Plan for migration: Design with future scaling in mind
- Monitor costs: Set up billing alerts early
Additional Resources
- Vector Databases Guide: Deep dive into vector search
- Production Deployment: Best practices for production
- Observability: Monitoring your vector infrastructure
- Provider Integration: Integrating custom providers