Ollama Integration
Ollama enables high-quality language models to run locally without third-party API calls. This guide explains how to install Ollama, connect it to RAG Loom, and optimise performance.
When to Use Ollama
- Offline inference or strict data residency requirements.
- Avoiding per-token API charges from hosted providers.
- Rapid experimentation with community-maintained models before promoting to production.
If you prefer hosted providers, configure the relevant environment variables for OpenAI, Cohere, or Hugging Face instead.
System Requirements
| Tier | CPU | Memory | Storage | Notes |
|---|---|---|---|---|
| Development | 4 cores | 16 GB | 20 GB | Suitable for 7B models |
| Staging | 8 cores | 32 GB | 50 GB | Recommended for 13B models |
| Production | 16 cores | 64 GB | 100 GB SSD | Supports 34B+ models; consider dedicated hardware |
Apple Silicon (M1/M2) or GPU-backed Linux servers deliver the best throughput.
Installation
macOS (Homebrew)
brew install ollama
brew services start ollama
ollama --version