Skip to main content

Hippo: RAG & Retrieval 🦛

Build intelligent Q&A systems over documents with flagship accuracy at mini model cost. Hippo delivers precision retrieval with 70% smaller context windows, achieving 80% cost reduction while maintaining 99.5% accuracy match to flagship models.

80% COST REDUCTION Only retrieve relevant chunks, not entire documents

99.5% ACCURACY MATCH Flagship model quality with intelligent retrieval

70% SMALLER CONTEXT Precision RAG eliminates noise, keeps what matters

What is Hippo?

Hippo is Cerevox’s RAG (Retrieval-Augmented Generation) API that enables AI agents to search and query document collections with natural language. Instead of sending entire documents to your LLM (expensive, slow, noisy), Hippo:
  1. Indexes your documents with semantic understanding
  2. Retrieves only the most relevant chunks (70% smaller context)
  3. Generates AI answers with source citations
  4. Saves you 80% on LLM costs while matching flagship accuracy
Perfect for: Customer support bots, internal knowledge bases, document Q&A, research assistants, and any AI system that needs to “know” information from documents.

Core Concepts

Folders are collections of documents that form a searchable knowledge base.
  • Each folder is an isolated knowledge domain
  • Upload PDFs, DOCX, PPTX, and more
  • Automatically indexed for semantic search
  • Support 1 to 10,000+ documents per folder
Use cases: Product docs, customer records, research papers, legal cases
Files are the documents you upload to folders.
  • Support 12+ formats: PDF, DOCX, PPTX, XLSX, TXT, HTML, CSV, etc.
  • Upload from local files or URLs
  • Automatic processing and indexing
  • Rich metadata extraction
Processing: Files are automatically parsed, chunked, and indexed for retrieval
Chat sessions maintain conversation context for Q&A.
  • Each chat is connected to a folder
  • Maintains conversation history
  • Supports follow-up questions
  • Multiple chats per folder
Use cases: Support conversations, research sessions, document analysis
Asks are questions submitted to a chat that generate AI-powered answers.
  • Natural language questions
  • AI-generated answers with source citations
  • Confidence scores for each answer
  • Full conversation history accessible
Returns: Answer text + source documents + page numbers + confidence scores

How It Works

1

Create a Folder

Organize documents into a knowledge base
2

Upload Files

Add documents from local files or URLs
3

Create Chat

Start a conversation session linked to the folder
4

Ask Questions

Submit natural language questions and get AI answers with sources

Quick Example

from cerevox import Hippo

# Initialize
hippo = Hippo(api_key="your-api-key")

# 1. Create knowledge base
folder = hippo.create_folder("Product Documentation")

# 2. Upload documents
hippo.upload_file(folder.id, "user-guide.pdf")
hippo.upload_file_from_url(folder.id, "https://example.com/api-docs.pdf")

# 3. Create chat
chat = hippo.create_chat(folder.id, "Support Q&A")

# 4. Ask questions
answer = hippo.submit_ask(
    chat.id,
    "How do I authenticate users?"
)

print(f"Answer: {answer.response}")
print(f"Sources: {[s.file_name for s in answer.sources]}")
# 80% cost reduction vs. full document retrieval!

Key Features

Semantic Search

AI-powered understanding
  • Finds relevant content by meaning, not just keywords
  • Handles synonyms and context
  • Multi-language support

Source Citations

Verify every answer
  • Exact source documents
  • Page numbers included
  • Confidence scores

Conversation Memory

Contextual follow-ups
  • Chats remember previous questions
  • Support clarifying questions
  • Full history accessible

Multi-format Support

12+ file formats
  • PDF, DOCX, PPTX, XLSX
  • TXT, HTML, CSV, and more
  • Automatic format detection

Async Operations

High performance
  • Full async/await support
  • Concurrent uploads
  • Batch processing

Enterprise Ready

Production proven
  • Automatic retries
  • Error handling
  • Usage tracking

The Cost Savings Advantage

  • Traditional RAG
  • Hippo RAG
# Traditional approach: Send entire documents
documents = load_all_documents()  # Large context
context = "\n\n".join([doc.content for doc in documents])

# Send to LLM - EXPENSIVE
llm_response = openai.chat.completions.create(
    messages=[{
        "role": "user",
        "content": f"Context: {context}\n\nQuestion: {question}"
    }],
    model="gpt-4"  # Flagship model required
)
# High token costs: 10,000+ tokens per query
# Slow: Large context = slower processing
# Noisy: Irrelevant content confuses the model

Use Cases

Build AI support agents that answer customer questions
  • Upload help docs, FAQs, and knowledge base
  • Customers ask questions in natural language
  • Get instant answers with source citations
  • 80% reduction in support costs
Example: “How do I reset my password?” → Answer + link to help article
Make company knowledge searchable
  • Upload policies, procedures, onboarding docs
  • Employees ask questions, get instant answers
  • Reduce time spent searching for information
  • Keep knowledge always accessible
Example: “What’s our remote work policy?” → Answer from HR handbook
Query research papers and technical docs
  • Upload papers, reports, technical documentation
  • Ask research questions
  • Get synthesized answers from multiple sources
  • Citations to original papers
Example: “What methods did Smith et al. use?” → Answer from relevant papers
Query financial reports and filings
  • Upload 10-Ks, earnings reports, analyst notes
  • Ask about metrics, trends, risks
  • Get answers with exact page references
  • Compare across multiple documents
Example: “What were Q3 revenue drivers?” → Answer from earnings call

Hippo vs. Traditional RAG

FeatureTraditional RAGHippo RAG
Context SizeFull documents (10,000+ tokens)Relevant chunks only (3,000 tokens)
Cost per Query0.100.10 - 0.500.020.02 - 0.10 (80% reduction)
AccuracyGood (with flagship models)99.5% match (with mini models)
Response TimeSlow (large context)Fast (smaller context)
Source CitationsManual implementationBuilt-in with confidence scores
Setup ComplexityHigh (vector DB, embeddings, retrieval logic)Low (API-only, no infrastructure)
MaintenanceOngoing (infrastructure, tuning)None (managed service)

API Clients

Hippo provides both synchronous and asynchronous clients:
from cerevox import Hippo

# Best for: Simple scripts, notebooks, learning
hippo = Hippo(api_key="your-api-key")

folder = hippo.create_folder("Docs")
chat = hippo.create_chat(folder.id)
answer = hippo.submit_ask(chat.id, "Question?")

Next Steps


Ready to save 80%? Check out the quickstart guide or explore RAG examples.
I