FAQ

What makes Lexa different?

How quickly can I integrate?

You can start parsing documents in under 5 minutes:

pip install cerevox

from cerevox import Lexa

client = Lexa(api_key="your-api-key")
documents = client.parse(["document.pdf"])

# Get vector DB ready chunks
chunks = documents.get_all_text_chunks(target_size=500)

Our Python SDK handles authentication, retries, and error handling automatically. We also provide comprehensive examples for Django, Flask, FastAPI, and async applications.

What formats does Lexa support?

How does async processing work?

Lexa provides native async support with the AsyncLexa client:

import asyncio
from cerevox import AsyncLexa

async def main():
    async with AsyncLexa(api_key="your-api-key") as client:
        # Process multiple documents concurrently
        documents = await client.parse([
            "report1.pdf", "report2.docx", "data.xlsx"
        ])
        
        # Batch process with progress tracking
        async for status in client.parse_with_progress(files):
            print(f"Progress: {status.progress}")

asyncio.run(main())

This enables concurrent processing of multiple documents, significantly improving throughput for batch operations.

How do I optimize for RAG applications?

Lexa is designed specifically for RAG workflows with built-in vector database optimization:

# Get optimally sized chunks for embeddings
chunks = documents.get_all_text_chunks(
    target_size=500,  # Optimal for most embedding models
    overlap=50,       # Maintain context between chunks
    include_metadata=True  # Rich metadata for filtering
)

# Direct integration with vector databases
for chunk in chunks:
    # Each chunk includes source document, page numbers,
    # element types, and confidence scores
    vector_db.upsert(
        id=chunk.id,
        vector=embed(chunk.content),
        metadata=chunk.metadata
    )

We provide pre-built integration examples for Pinecone, Weaviate, ChromaDB, and Qdrant.

What cloud storage integrations are available?

Lexa integrates with 7+ major cloud storage platforms:

Amazon S3: Direct parsing from S3 buckets and folders
Microsoft SharePoint: Sites, drives, and document libraries
Google Drive: Files and folders with permission management
Box: Enterprise file storage with advanced metadata
Dropbox: Personal and business accounts
Salesforce: Document attachments and files
Coming Soon: Azure Blob, OneDrive, Notion

# Parse entire S3 folder
documents = client.parse_s3_folder(
    bucket="my-bucket",
    folder="documents/",
    recursive=True
)

What are Lexa's performance benchmarks?

How does Lexa handle large-scale document processing?

Lexa is built for enterprise scale with several optimization strategies:Horizontal Scaling: Our API automatically scales to handle spikes in demand
Batch Processing: Process up to 100 documents per API call
Async Processing: Non-blocking operations with progress callbacks
Caching: Intelligent caching reduces processing time for similar documents
Load Balancing: Global infrastructure ensures low latency worldwide

# Batch processing example
large_batch = ["doc1.pdf", "doc2.docx", ...] # 100+ files
documents = await client.parse(
    large_batch,
    progress_callback=lambda status: print(f"Progress: {status.completed}/{status.total}")
)

What are the API rate limits and pricing?

How does table extraction work in complex documents?

Lexa uses advanced computer vision and ML models to extract tables with high fidelity:

documents = client.parse("financial_report.pdf")

# Access extracted tables
for doc in documents:
    for table in doc.tables:
        print(f"Table on page {table.page_number}")
        print(f"Dimensions: {table.rows}x{table.columns}")
        
        # Export to pandas DataFrame
        df = table.to_pandas()
        
        # Or get raw structured data
        data = table.to_dict()

Features include:

Structure preservation: Maintains cell relationships and formatting
Multi-page tables: Automatically combines split tables
Header detection: Identifies and preserves table headers
Data type inference: Automatically detects numbers, dates, etc.

Can I customize the parsing behavior for my use case?

Lexa offers several processing modes and customization options:

from cerevox import ProcessingMode

# Standard processing (fastest)
docs = client.parse("document.pdf", mode=ProcessingMode.DEFAULT)

# Advanced processing (highest accuracy)
docs = client.parse("document.pdf", mode=ProcessingMode.ADVANCED)

# Custom chunking parameters
chunks = docs.get_text_chunks(
    target_size=1000,      # Larger chunks for long-form content
    tolerance=0.2,         # 20% size variance allowed
    respect_boundaries=True, # Don't break sentences/paragraphs
    include_tables=True    # Include table content in chunks
)

Contact our team for specialized processing modes for specific document types or industries.

What security and compliance features does Lexa provide?

What support options are available?

How can I stay updated on new features and improvements?

Getting Started

API Reference

Examples

Guides

Use Cases

Company

Legal

Getting Started

Technical Implementation

Performance and Scaling

Advanced Features

Support and Community

Getting Started

API Reference

Examples

Guides

Use Cases

Company

Legal

​Getting Started

​Technical Implementation

​Performance and Scaling

​Advanced Features

​Support and Community

Getting Started

Technical Implementation

Performance and Scaling

Advanced Features

Support and Community