REST API Reference

RAG Loom exposes a REST API for document ingestion, semantic search, and answer generation. All endpoints are prefixed with /api/v1 unless noted otherwise.

Base URL (local): http://localhost:8000
OpenAPI documentation: GET /docs
Prometheus metrics: GET /metrics

Service Health

`GET /`

Returns a lightweight banner confirming the service is running.

{
  "message": "RAG Microservice API",
  "version": "0.2.0"
}

`GET /health`

Returns service status and dependency diagnostics.

{
  "status": "healthy",
  "timestamp": "2024-03-01T10:15:00.123456+00:00",
  "service": "RAG Loom API",
  "version": "0.2.0",
  "vector_store": {
    "status": "up",
    "type": "chroma"
  },
  "embedding": {
    "status": "up",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "provider": "local"
  },
  "llm": {
    "status": "up",
    "provider": "ollama",
    "model": "gemma2:2b"
  }
}

If any dependency test fails, the top-level status is set to degraded and the impacted subsystem reports "status": "down" along with an error message.

Ingestion API

`POST /api/v1/ingest`

Uploads a single document for processing and indexing.

Request (multipart form):

file: binary file (pdf or txt)
chunk_params (optional JSON): { "chunk_size": 1000, "chunk_overlap": 200 }

Response (200 OK):

{
  "message": "Document processed successfully",
  "file_id": "playbook.pdf",
  "file_name": "playbook.pdf",
  "file_type": "pdf",
  "chunks_created": 12,
  "processing_time": 1.42,
  "metadata": {
    "document_id": "a4bf2c30-23f1-40cc-8d40-9b8c8338ff49"
  }
}

Maximum upload size is governed by MAX_FILE_SIZE in .env.

`POST /api/v1/ingest/batch`

Accepts multiple files in a single multipart request. Each file is processed independently and returns an array of per-file ingestion results.

Request (multipart form):

files: repeatable binary field for each document (pdf or txt)
chunk_params (optional JSON string): { "chunk_size": 1000, "chunk_overlap": 200 }

Each uploaded file is chunked, embedded, and stored in the configured vector store in the same way as the single-file endpoint.

Response (200 OK):

[
  {
    "message": "Document processed successfully",
    "file_id": "handbook.pdf",
    "file_name": "handbook.pdf",
    "file_type": "pdf",
    "chunks_created": 18,
    "processing_time": 2.11
  },
  {
    "message": "Failed to process notes.txt",
    "file_id": "notes.txt",
    "file_name": "notes.txt",
    "file_type": "unknown",
    "chunks_created": 0,
    "processing_time": 0.0
  }
]

Search

`POST /api/v1/search`

Performs a semantic similarity search across ingested content.

Request:

{
  "query": "Summarise the data ingestion pipeline",
  "top_k": 5,
  "similarity_threshold": 0.7,
  "filters": {
    "file_name": "handbook.pdf"
  }
}

Response (200 OK):

[
  {
    "id": "cb8b0dd5-7a49-4a75-a705-6fe2f3fc51fa",
    "content": "The ingestion pipeline extracts text...",
    "metadata": {
      "document_id": "21f5329e-86c7-42b0-938a-4c815b635f2c",
      "file_name": "handbook.pdf",
      "chunk_index": 3
    },
    "similarity_score": 0.83,
    "document_id": "21f5329e-86c7-42b0-938a-4c815b635f2c"
  }
]

Results are filtered by similarity_threshold. If no chunks meet the threshold, the top top_k results are returned as a fallback.

Generation

`POST /api/v1/generate`

Combines retrieval and language model inference to produce an answer.

Request:

{
  "query": "How do I deploy RAG Loom to production?",
  "search_params": {
    "top_k": 5,
    "similarity_threshold": 0.7,
    "filters": {
      "document_id": "21f5329e-86c7-42b0-938a-4c815b635f2c"
    }
  },
  "temperature": 0.7,
  "max_tokens": 500
}

Response (200 OK):

{
  "answer": "To deploy RAG Loom...",
  "sources": [
    {
      "id": "cb8b0dd5-7a49-4a75-a705-6fe2f3fc51fa",
      "content": "For production deployments...",
      "metadata": {
        "document_id": "21f5329e-86c7-42b0-938a-4c815b635f2c",
        "file_name": "deployment_guide.pdf",
        "chunk_index": 9
      },
      "similarity_score": 0.82,
      "document_id": "21f5329e-86c7-42b0-938a-4c815b635f2c"
    }
  ],
  "generation_time": 1.86
}

If the request omits context, the service automatically retrieves context using the provided search_params or defaults from configuration.

Errors

API responses follow standard FastAPI error formats. Handle the following status codes:

Status	Meaning
`400`	Invalid payload or missing required fields
`401`	Authentication failed (when `ENABLE_AUTH=true`)
`404`	Requested resource not found
`422`	Validation error
`500`	Unexpected server error

Testing Endpoints

Use the bundled test_service.py or quick curl commands:

pytest test_service.py

curl -X POST "http://localhost:8000/api/v1/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "What models are supported?", "top_k": 3}'

For SDK integrations, explore the ready-made snippets in the API Script Playbook.

Service Health​

GET /​

GET /health​

Ingestion API​

POST /api/v1/ingest​

POST /api/v1/ingest/batch​

Search​

POST /api/v1/search​

Generation​

POST /api/v1/generate​

Errors​

Testing Endpoints​

Service Health

`GET /`

`GET /health`

Ingestion API

`POST /api/v1/ingest`

`POST /api/v1/ingest/batch`

Search

`POST /api/v1/search`

Generation

`POST /api/v1/generate`

Errors

Testing Endpoints