Client Initialization

Synchronous Client

from cerevox import Lexa

# Initialize with API key
client = Lexa(api_key="your-api-key")

# Parse documents
documents = client.parse(["document.pdf"])

Asynchronous Client

import asyncio
from cerevox import AsyncLexa

async def main():
    async with AsyncLexa(api_key="your-api-key") as client:
        documents = await client.parse(["document.pdf"])
        return documents

asyncio.run(main())

Client Configuration

Parameters

api_key
string
required
Your Cerevox API key. Get one at cerevox.ai/lexa
base_url
string
default:"https://data.cerevox.ai"
Base URL for the Cerevox API endpoint
timeout
float
default:"60.0"
Request timeout in seconds for API calls
max_retries
int
default:"3"
Maximum number of retry attempts for failed requests
retry_delay
float
default:"1.0"
Delay in seconds between retry attempts

Environment Variables

# Required
export CEREVOX_API_KEY="your-api-key"

# Optional overrides
export CEREVOX_BASE_URL="https://data.cerevox.ai"
export CEREVOX_TIMEOUT="120"
export CEREVOX_MAX_RETRIES="5"

Client Methods

Core Parsing Methods

Parse local files or file-like objects.
documents = client.parse(
    files=["document.pdf", "report.docx"],
    mode=ProcessingMode.DEFAULT,
    progress_callback=None,
    timeout=60.0,
    poll_interval=2.0
)
Parse files from URLs.
documents = client.parse_urls(
    urls=["https://example.com/document.pdf"],
    mode=ProcessingMode.DEFAULT,
    progress_callback=None,
    timeout=120.0,
    poll_interval=2.0
)
Get the current status of a parsing job.
status = client.get_job_status(job_id="job_123")
print(f"Status: {status.status}")

Cloud Storage Methods

# List S3 buckets
buckets = client.list_s3_buckets()

# List S3 folder contents
contents = client.list_s3_folder("bucket-name", "folder-path/")

# Parse S3 folder
documents = client.parse_s3_folder(
    bucket="bucket-name",
    folder_path="documents/",
    mode=ProcessingMode.DEFAULT
)
# List SharePoint sites
sites = client.list_sharepoint_sites()

# List drives in a site
drives = client.list_sharepoint_drives("site-id")

# Parse SharePoint folder
documents = client.parse_sharepoint_folder(
    site_id="site-id",
    drive_id="drive-id",
    folder_path="Documents/",
    mode=ProcessingMode.DEFAULT
)
# List Box folders
folders = client.list_box_folders(parent_folder_id="0")

# Parse Box folder
documents = client.parse_box_folder(
    folder_id="123456789",
    mode=ProcessingMode.DEFAULT
)

Processing Modes

Choose the right processing mode for your use case:

DEFAULT

Fast and efficient
  • Optimized for speed
  • Good accuracy
  • Lower resource usage
  • Recommended for most use cases

ADVANCED

Maximum accuracy
  • Highest accuracy
  • Enhanced table extraction
  • More thorough analysis
  • Best for complex documents
from cerevox import Lexa, ProcessingMode

client = Lexa(api_key="your-api-key")

# Default mode (recommended) - fast and efficient
documents = client.parse(["document.pdf"], mode=ProcessingMode.DEFAULT)

# Advanced mode for maximum accuracy
documents = client.parse(["document.pdf"], mode=ProcessingMode.ADVANCED)

Error Handling

The Lexa client provides comprehensive error handling:
from cerevox import Lexa, LexaError

client = Lexa(api_key="your-api-key")

try:
    documents = client.parse(["document.pdf"])
    print(f"Successfully parsed {len(documents)} documents")
except LexaError as e:
    print(f"Lexa API error: {e.message}")
    print(f"Error code: {e.error_code}")
except Exception as e:
    print(f"Unexpected error: {e}")

Best Practices

Use the async client for better performance when processing multiple files:
import asyncio
from cerevox import AsyncLexa

async def process_documents(file_paths):
    async with AsyncLexa(api_key="your-api-key") as client:
        # Process multiple batches concurrently
        tasks = []
        batch_size = 10
        
        for i in range(0, len(file_paths), batch_size):
            batch = file_paths[i:i + batch_size]
            task = client.parse(batch)
            tasks.append(task)
        
        results = await asyncio.gather(*tasks)
        return [doc for batch in results for doc in batch]
Use progress callbacks for long-running operations:
def progress_callback(status):
    print(f"Status: {status.status}")
    if hasattr(status, 'progress') and status.progress:
        print(f"Progress: {status.progress}")

documents = client.parse(
    ["large-document.pdf"],
    progress_callback=progress_callback,
    timeout=300.0  # 5 minutes for large files
)
Properly manage client resources:
# ✅ Good: Use context manager for async
async with AsyncLexa(api_key="key") as client:
    documents = await client.parse(["file.pdf"])

# ✅ Good: Reuse sync client
client = Lexa(api_key="key")
for file_batch in file_batches:
    documents = client.parse(file_batch)
    process_documents(documents)

# ❌ Avoid: Creating new clients repeatedly
for file in files:
    client = Lexa(api_key="key")  # Wasteful
    documents = client.parse([file])

Next Steps

Ready to start parsing? Check out our quickstart guide or explore real-world examples.