Docker Deployment Specification
This document provides specifications for deploying the vector database system in AWS using Kubernetes.
System Components
Service |
Container Image |
Purpose |
Ports |
Ollama Embedding Service |
mingzilla/ollama-nomic-embed:latest |
Provides text embedding generation using the nomic-embed-text model |
11434 (HTTP) |
Qdrant Vector Database |
qdrant/qdrant:latest |
Stores and searches vector embeddings |
6333 (HTTP API), 6334 (gRPC) |
Deployment Pattern
- Deploy one set of services (Ollama + Qdrant) per customer cluster
- Each Panintelligence dashboard cluster should have its own dedicated vector database services
- All nodes in a PI dashboard cluster share the same vector database services
Network Configuration
Security Model
- Primary security relies on network isolation via private subnet
- For any token verification requirements, use a reverse proxy in front of the services
- Both services should be deployed in a private subnet
- Access should be restricted to the PI dashboard application nodes only
Connectivity Requirements
Service |
Source |
Destination |
Port |
Protocol |
PI Dashboard → Ollama |
Dashboard Nodes |
Ollama Service |
11434 |
HTTP |
PI Dashboard → Qdrant |
Dashboard Nodes |
Qdrant Service |
6333 |
HTTP |
Ollama → Internet |
Ollama Service |
Internet |
443 |
HTTPS |
Resource Requirements
Service |
CPU |
Memory |
Storage |
Ollama |
2 cores (minimum) 4 cores (recommended) |
4GB (minimum) 8GB (recommended) |
2GB |
Qdrant |
1 core (minimum) 2 cores (recommended) |
2GB (minimum) 4GB (recommended) |
Ephemeral (see persistence notes) |
Persistence Configuration
- Qdrant: No persistent volume needed
- Vectors are rebuilt on service restart
- This is an intentional design choice to avoid volume maintenance
- The embedding model ensures consistent vector generation
- Ollama: No persistent volume needed
- The custom image
mingzilla/ollama-nomic-embed
has the model pre-installed
- No model downloading or storage is required
Health Checks
Service |
Endpoint |
Initial Delay |
Interval |
Timeout |
Ollama |
http://localhost:11434/api/version |
30s |
10s |
5s |
Qdrant |
http://localhost:6333/health |
5s |
10s |
5s |
Scaling Considerations
- Ollama: Does not need to scale horizontally; vertical scaling recommended
- Qdrant: Does not need to scale horizontally for current workloads
- Resource allocation should be adjusted based on customer size and usage patterns
Environment Variables and Configuration
Environment Variables
Service |
Variable |
Value |
Purpose |
Ollama |
OLLAMA_HOST |
0.0.0.0 |
Binds to all network interfaces |
Qdrant |
QDRANT_ALLOW_RECOVERY_MODE |
false |
Prevents automatic recovery attempts on startup |
Using a Reverse Proxy for Token Verification
If authentication is required for these services, we recommend implementing a reverse proxy:
- Deploy a reverse proxy (such as NGINX or Envoy) in front of the Ollama and Qdrant services
- Configure the reverse proxy to handle token verification
- Update the VectorStoreConfig in your application to include the required tokens:
VectorStoreConfig config = VectorStoreConfig.create(
"http://ollama-service:11434/api/embeddings", // Embedding service URL
"nomic-embed-text", // Embedding model
"your_ollama_token", // Token for Ollama (verified by reverse proxy)
"http://qdrant-service:6333", // Qdrant URL
"your_qdrant_token", // Token for Qdrant (verified by reverse proxy)
"default", // Namespace
"vector_store" // Collection name
);
The library will include these tokens in the appropriate headers when making requests to these services:
- For Ollama: Currently sent in a custom header, but will be changed to the Authorization header with Bearer format
- For Qdrant: Currently sent in the “api-key” header, but will be changed to the Authorization header with Bearer format
Additional Notes
- Service Discovery:
- Use standard Kubernetes service discovery
- Services should be accessible within the cluster by name
- Monitoring:
- Both services expose metrics that can be scraped by Prometheus
- Qdrant exposes metrics at
/metrics
- Disaster Recovery:
- No special backup procedures required
- The system is designed to rebuild vectors as needed
- Deployment Schedule:
- Services can be deployed ahead of PI dashboard updates
- No special initialization is required beyond container startup
- Rolling Updates:
- Standard Kubernetes rolling update procedures can be used
- No special handling required for upgrades
- Security Considerations:
- The primary security mechanism should be network isolation via VPC/private subnet
- If token verification is needed, implement it consistently via a reverse proxy
- Store authentication tokens in Kubernetes secrets or other secure credential storage
- Rotate tokens periodically according to your security policy
- Token verification provides an additional layer of protection if network configuration errors occur