File Operations

Upload and manage documents that power your RAG Q&A system.

Upload Files

Upload from Local File

from cerevox import Hippo

hippo = Hippo(api_key="your-api-key")

# Upload a local file
file = hippo.upload_file(
    folder_id="folder_123",
    file_path="documents/user-guide.pdf"
)

print(f"Uploaded: {file.name}")
print(f"File ID: {file.id}")
print(f"Status: {file.status}")

Upload from URL

# Upload directly from a URL
file = hippo.upload_file_from_url(
    folder_id="folder_123",
    file_url="https://example.com/whitepaper.pdf",
    file_name="whitepaper.pdf"  # Optional custom name
)

print(f"Uploaded from URL: {file.name}")

Batch Upload

files = []

for file_path in ["doc1.pdf", "doc2.docx", "doc3.pptx"]:
    file = hippo.upload_file(folder_id, file_path)
    files.append(file)
    print(f"Uploaded: {file.name}")

Supported File Formats

Documents

PDF (.pdf)
Word (.docx, .doc)
PowerPoint (.pptx, .ppt)
Text (.txt)
RTF (.rtf)

Spreadsheets

Excel (.xlsx, .xls)
CSV (.csv)
TSV (.tsv)

Web & Other

HTML (.html)
MHTML (.mhtml)
Markdown (.md)

File size limits:

Max file size: 100MB per file
Contact support for larger files or custom formats

List Files

# Get all files in a folder
files = hippo.get_files(folder_id="folder_123")

for file in files:
    print(f"{file.name} - {file.status} - {file.size_bytes} bytes")

File status values:

uploading: File is being uploaded
processing: File is being indexed
completed: File is ready for Q&A
failed: Processing failed

Get File Details

# Get specific file information
file = hippo.get_file(file_id="file_456")

print(f"Name: {file.name}")
print(f"Status: {file.status}")
print(f"Size: {file.size_bytes} bytes")
print(f"Pages: {file.page_count}")
print(f"Uploaded: {file.created_at}")

Delete Files

# Delete a file
hippo.delete_file(file_id="file_456")

print("File deleted successfully")

Deleted files cannot be recovered. The file will be removed from all chats and answers that referenced it.

File Processing

Processing Time

Files are automatically processed after upload:

Upload

File is uploaded to Cerevox (a few seconds)

Parsing

Document is parsed for text and structure (10s - 2min)

Chunking

Content is split into semantic chunks (a few seconds)

Indexing

Chunks are indexed for search (10s - 1min)

Ready

File is ready for Q&A!

Total processing time:

Small files (< 10 pages): 10-30 seconds
Medium files (10-100 pages): 30-120 seconds
Large files (> 100 pages): 2-5 minutes

Monitor Processing Status

import time

# Upload file
file = hippo.upload_file(folder_id, "large-document.pdf")

# Poll until processing completes
while file.status != "completed":
    time.sleep(5)
    file = hippo.get_file(file.id)
    print(f"Status: {file.status}")

print("File ready for Q&A!")

Best Practices

Verify File Quality

Before uploading:

Ensure PDFs are text-based (not scanned images)
Check that documents aren’t password-protected
Verify file isn’t corrupted

Tip: OCR (scanned) PDFs work but may have lower accuracy. Use text-based PDFs when possible.

Optimize File Size

Reduce processing time:

Remove unnecessary pages (covers, blanks, ads)
Compress images in PDFs
Split very large documents (500+ pages)

Smaller, focused documents = faster processing + better search results

Upload Related Documents Together

# Good: Upload product docs together
folder = hippo.create_folder("Product V2 Docs")

docs = ["overview.pdf", "features.pdf", "api.pdf"]
for doc in docs:
    hippo.upload_file(folder.id, doc)

Related documents in the same folder enable better cross-document Q&A.

Use Descriptive Filenames

Good: product-api-authentication-guide.pdf Bad: doc1.pdf, untitled.pdfDescriptive names help with source citations and debugging.

Use Async for Batch Uploads

# Async = 10x faster for multiple files
async with AsyncHippo() as hippo:
    tasks = [hippo.upload_file(folder_id, f) for f in files]
    results = await asyncio.gather(*tasks)

Async concurrent uploads are significantly faster than sequential.

Complete Example: Batch Upload

import asyncio
from pathlib import Path
from cerevox import AsyncHippo

async def batch_upload_directory(folder_id, directory_path):
    """Upload all PDFs from a directory"""
    async with AsyncHippo(api_key="your-api-key") as hippo:
        # Get all PDF files
        pdf_files = list(Path(directory_path).glob("*.pdf"))

        print(f"Found {len(pdf_files)} PDF files")

        # Upload concurrently
        tasks = [
            hippo.upload_file(folder_id, str(pdf))
            for pdf in pdf_files
        ]

        files = await asyncio.gather(*tasks)

        # Report results
        print(f"\n✅ Uploaded {len(files)} files:")
        for file in files:
            print(f"  - {file.name} ({file.status})")

        return files

# Usage
folder = await hippo.create_folder("Uploaded Docs")
files = await batch_upload_directory(folder.id, "./documents")

Error Handling

from cerevox import Hippo, HippoError

hippo = Hippo()

try:
    file = hippo.upload_file(folder_id, "document.pdf")
    print(f"Uploaded: {file.name}")

except FileNotFoundError:
    print("Error: File not found")
except HippoError as e:
    if "unsupported format" in str(e).lower():
        print("Error: File format not supported")
    elif "too large" in str(e).lower():
        print("Error: File exceeds size limit")
    else:
        print(f"Error: {e}")

Next Steps

Chat Sessions

Create chats to ask questions

Q&A System

Ask questions over uploaded files

Best Practices

Optimize file preparation

Getting Started

Hippo - RAG & Retrieval

Lexa - Document Parsing

Account Management

Examples

Guides

Use Cases

Company

Legal

File Operations

File Operations

Upload Files

Upload from Local File

Upload from URL

Batch Upload

Supported File Formats

Documents

Spreadsheets

Web & Other

List Files

Get File Details

Delete Files

File Processing

Processing Time

Monitor Processing Status

Best Practices

Complete Example: Batch Upload

Error Handling

Next Steps

Chat Sessions

Q&A System

Best Practices

Getting Started

Hippo - RAG & Retrieval

Lexa - Document Parsing

Account Management

Examples

Guides

Use Cases

Company

Legal

​File Operations

​Upload Files

​Upload from Local File

​Upload from URL

​Batch Upload

​Supported File Formats

Documents

Spreadsheets

Web & Other

​List Files

​Get File Details

​Delete Files

​File Processing

​Processing Time

​Monitor Processing Status

​Best Practices

​Complete Example: Batch Upload

​Error Handling

​Next Steps

Chat Sessions

Q&A System

Best Practices

File Operations

Upload Files

Upload from Local File

Upload from URL

Batch Upload

Supported File Formats

List Files

Get File Details

Delete Files

File Processing

Processing Time

Monitor Processing Status

Best Practices

Complete Example: Batch Upload

Error Handling

Next Steps