Documentation
Uploading Documents
Upload, process, and search documents with AI-powered text extraction and semantic search
RAG-Ready Documents
Documents are automatically processed through a pipeline: text extraction (PDF/DOCX parsing) → chunking (splitting into searchable segments) → embedding (generating vectors for semantic search). This enables powerful semantic search and RAG integration with chat.
Supported Formats
.pdf, .docx, .txt, and .md. Maximum file size is 50MB per document.
Quick Start
Upload a Document
async with AionVision(api_key="aion_...") as client: result = await client.documents.upload_one("report.pdf")
print(f"Document ID: {result.document_id}") print(f"Status: {result.text_extraction_status}") print(f"Pages: {result.page_count}") print(f"Chunks: {result.chunk_count}")Upload Options (SDK)
result = await client.documents.upload_one( "report.pdf", wait_for_processing=False, # Don't wait for extraction (returns immediately) processing_timeout=120.0, # Custom timeout for processing polling storage_target="default", # "default" or "custom" (BYOB))Batch Upload
Upload Multiple Documents
results = await client.documents.upload([ "report1.pdf", "report2.docx", "notes.txt"])
print(f"Uploaded {len(results)} documents")for result in results: print(f" {result.filename}: {result.text_extraction_status}")Processing Status
Check Processing Status
status = await client.documents.get_status(document_id="doc_abc123")
print(f"Extraction: {status.text_extraction_status}") # pending | processing | completed | failedprint(f"Pages: {status.page_count}")print(f"Chunks: {status.chunk_count}")print(f"Embedding: {status.embedding_status}")Semantic Search
AI-Powered Search
Search uses semantic similarity — search by meaning, not just keywords. "damaged equipment" will find content about "broken machinery" even if those exact words aren't present.
Search Documents
results = await client.documents.search( query="safety inspection requirements", limit=20, similarity_threshold=0.3)
for chunk in results.results: print(f"Document: {chunk.document_filename}") print(f"Score: {chunk.score:.2f}") print(f"Content: {chunk.content[:200]}...")Search Specific Documents
results = await client.documents.search( query="compliance requirements", document_ids=["doc_abc123", "doc_def456"], limit=10)
for chunk in results.results: print(f"{chunk.document_filename}: {chunk.content[:100]}...")Document Management
SDK operations for managing documents
List, Get, and Delete
# List documentsdocuments = await client.documents.list(page=1, page_size=20, status_filter="completed")print(f"Total: {documents.total_count}")
# Get document detailsdoc = await client.documents.get(document_id="doc_abc123")print(f"Filename: {doc.filename}, Pages: {doc.page_count}")
# Get extracted texttext = await client.documents.get_text(document_id="doc_abc123")print(f"Text length: {len(text)} characters")
# Get chunks (for RAG debugging)chunks = await client.documents.get_chunks(document_id="doc_abc123")print(f"Total chunks: {chunks.total_chunks}")
# Download original filedownload_url = await client.documents.download(document_id="doc_abc123")
# Delete document and all chunksawait client.documents.delete(document_id="doc_abc123")Documents in Chat (RAG)
Automatic Document Access
The chat system's document search agent has access to all your uploaded documents automatically. There is no need to explicitly attach documents — the AI searches relevant documents based on your query.
Chat with Documents
async with AionVision(api_key="aion_...") as client: async with client.chat_session() as session: response = await session.send( "What are the key safety requirements mentioned in the documents?" ) print(response.content)
response2 = await session.send("Which section covers equipment maintenance?") print(response2.content)Deduplication
Documents are deduplicated by SHA256 hash — safe to retry failed uploads
result1 = await client.documents.upload_one("report.pdf")result2 = await client.documents.upload_one("report.pdf")# Both reference the same document if content is identicalDocumentUploadResult Type
@dataclass(frozen=True)class DocumentUploadResult: document_id: str # Unique document identifier filename: str # Original filename object_key: str # Storage object key content_type: str # MIME type size_bytes: int # File size text_extraction_status: DocumentProcessingStatus # PENDING | PROCESSING | COMPLETED | FAILED embedding_status: Optional[str] page_count: Optional[int] chunk_count: Optional[int] extracted_text_preview: Optional[str] created_at: Optional[datetime] processing_error: Optional[str] processing_error_type: Optional[str] processing_is_retryable: Optional[bool]Limits & Quotas
- Max file size: 50MB per document
- Batch size: 1-100 documents per batch
- Concurrent batches: 200 active batches per tenant
- Supported formats: PDF, DOCX, TXT, MD
- Chat context: Up to 100 documents per session