Skip to main content

Parse Your First Document in 3 Minutes ⚡

This quickstart gets you from zero to parsing documents with Lexa. By the end, you’ll have made your first successful API call.
Requirements: Python 3.9+ • 3 minutes of your time

Step 1: Installation (30 seconds)

pip install cerevox

Step 2: Get API Key (30 seconds)

1

Get your API key

Visit cerevox.ai/lexa and sign up for your free API key
2

Copy the key

Save your API key - you’ll need it in the next step

Step 3: Authentication Setup (30 seconds)

Step 4: First API Call (60 seconds)

Copy and run this code to parse your first document:
  • Parse a File
  • Parse Text Content
  • Parse from URL
from cerevox import Lexa

# Initialize client (uses CEREVOX_API_KEY from environment)
client = Lexa()

# Parse a local file - replace with your file path
documents = client.parse(["path/to/your/document.pdf"])

# See what you got back
doc = documents[0]
print(f"✅ Success! Extracted {len(doc.content)} characters")
print(f"📊 Found {len(doc.tables)} tables")
print(f"📄 Content preview: {doc.content[:200]}...")

# That's it! Your document is now structured data

Step 5: Test Your Setup (30 seconds)

Run this verification script to confirm everything works:
from cerevox import Lexa, LexaError

def verify_setup():
    try:
        client = Lexa()
        
        # Quick test with sample content
        test_content = b"Test document for Lexa API verification."
        documents = client.parse(test_content)
        
        if documents and len(documents) > 0:
            print("🎉 Perfect! Lexa is working correctly.")
            print(f"📄 Test result: {documents[0].content}")
            return True
    
    except LexaError as e:
        print(f"❌ API Error: {e.message}")
        print("💡 Check your API key and try again")
        return False
    except Exception as e:
        print(f"❌ Error: {e}")
        return False

# Run verification
if verify_setup():
    print("\n✅ You're ready to parse documents with Lexa!")
You’re Ready! 🎉 Lexa is configured and working. Start parsing your documents!

What’s Next?

Common Next Steps

# Parse from Amazon S3
documents = client.parse_s3_folder(
    bucket_name="my-documents",
    folder_path="invoices/"
)

# Parse from SharePoint
documents = client.parse_sharepoint_folder(
    site_id="your-site-id",
    drive_id="your-drive-id", 
    folder_path="Documents"
)

print(f"✅ Processed {len(documents)} documents from cloud storage")
# Parse and get RAG-optimized chunks
documents = client.parse(["document.pdf"])

chunks = documents.get_all_text_chunks(
    target_size=500,        # Perfect for most embeddings
    overlap_size=50,        # Prevents context loss
    include_metadata=True   # Rich metadata included
)

# Each chunk is ready for your vector database
for chunk in chunks[:3]:
    print(f"Chunk: {chunk.content[:100]}...")
    print(f"Metadata: {chunk.metadata}")

Having Issues?

Error: Authentication failed or Invalid API keyQuick fixes:
  • Double-check your API key from cerevox.ai/lexa
  • Verify CEREVOX_API_KEY environment variable is set correctly
  • Remove any extra spaces or quotes around the key
  • Try passing the key directly: Lexa(api_key="your-key")
Error: Connection timeout or Network errorQuick fixes:
  • Check internet connection
  • Increase timeout: client.parse(files, timeout=120.0)
  • Verify firewall allows HTTPS connections to data.cerevox.ai
  • Test with smaller files first
Error: File not found or parsing errorsQuick fixes:
  • Verify file path is correct: os.path.exists("your-file.pdf")
  • Check file permissions (must be readable)
  • Supported formats: PDF, DOCX, XLSX, PPTX, HTML, CSV, TXT, and more
  • Try with the test content example above first

Ready to build? Check out our code examples or join the Discord community for help.
I