Documentation

Uploading Documents

Upload, process, and search documents with AI-powered text extraction and semantic search

RAG-Ready Documents

Documents are automatically processed through a pipeline: text extraction (PDF/DOCX parsing) → chunking (splitting into searchable segments) → embedding (generating vectors for semantic search). This enables powerful semantic search and RAG integration with chat.

Supported Formats

.pdf, .docx, .txt, and .md. Maximum file size is 50MB per document.

Quick Start

Upload a Document

async with AionVision(api_key="aion_...") as client:
result = await client.documents.upload_one("report.pdf")
print(f"Document ID: {result.document_id}")
print(f"Status: {result.text_extraction_status}")
print(f"Pages: {result.page_count}")
print(f"Chunks: {result.chunk_count}")

Upload Options (SDK)

result = await client.documents.upload_one(
"report.pdf",
wait_for_processing=False, # Don't wait for extraction (returns immediately)
processing_timeout=120.0, # Custom timeout for processing polling
storage_target="default", # "default" or "custom" (BYOB)
)

Batch Upload

Upload Multiple Documents

results = await client.documents.upload([
"report1.pdf",
"report2.docx",
"notes.txt"
])
print(f"Uploaded {len(results)} documents")
for result in results:
print(f" {result.filename}: {result.text_extraction_status}")

Processing Status

Check Processing Status

status = await client.documents.get_status(document_id="doc_abc123")
print(f"Extraction: {status.text_extraction_status}") # pending | processing | completed | failed
print(f"Pages: {status.page_count}")
print(f"Chunks: {status.chunk_count}")
print(f"Embedding: {status.embedding_status}")

Semantic Search

AI-Powered Search

Search uses semantic similarity — search by meaning, not just keywords. "damaged equipment" will find content about "broken machinery" even if those exact words aren't present.

Search Documents

results = await client.documents.search(
query="safety inspection requirements",
limit=20,
similarity_threshold=0.3
)
for chunk in results.results:
print(f"Document: {chunk.document_filename}")
print(f"Score: {chunk.score:.2f}")
print(f"Content: {chunk.content[:200]}...")

Search Specific Documents

results = await client.documents.search(
query="compliance requirements",
document_ids=["doc_abc123", "doc_def456"],
limit=10
)
for chunk in results.results:
print(f"{chunk.document_filename}: {chunk.content[:100]}...")

Document Management

SDK operations for managing documents

List, Get, and Delete

# List documents
documents = await client.documents.list(page=1, page_size=20, status_filter="completed")
print(f"Total: {documents.total_count}")
# Get document details
doc = await client.documents.get(document_id="doc_abc123")
print(f"Filename: {doc.filename}, Pages: {doc.page_count}")
# Get extracted text
text = await client.documents.get_text(document_id="doc_abc123")
print(f"Text length: {len(text)} characters")
# Get chunks (for RAG debugging)
chunks = await client.documents.get_chunks(document_id="doc_abc123")
print(f"Total chunks: {chunks.total_chunks}")
# Download original file
download_url = await client.documents.download(document_id="doc_abc123")
# Delete document and all chunks
await client.documents.delete(document_id="doc_abc123")

Documents in Chat (RAG)

Automatic Document Access

The chat system's document search agent has access to all your uploaded documents automatically. There is no need to explicitly attach documents — the AI searches relevant documents based on your query.

Chat with Documents

async with AionVision(api_key="aion_...") as client:
async with client.chat_session() as session:
response = await session.send(
"What are the key safety requirements mentioned in the documents?"
)
print(response.content)
response2 = await session.send("Which section covers equipment maintenance?")
print(response2.content)

Deduplication

Documents are deduplicated by SHA256 hash — safe to retry failed uploads

result1 = await client.documents.upload_one("report.pdf")
result2 = await client.documents.upload_one("report.pdf")
# Both reference the same document if content is identical

DocumentUploadResult Type

@dataclass(frozen=True)
class DocumentUploadResult:
document_id: str # Unique document identifier
filename: str # Original filename
object_key: str # Storage object key
content_type: str # MIME type
size_bytes: int # File size
text_extraction_status: DocumentProcessingStatus # PENDING | PROCESSING | COMPLETED | FAILED
embedding_status: Optional[str]
page_count: Optional[int]
chunk_count: Optional[int]
extracted_text_preview: Optional[str]
created_at: Optional[datetime]
processing_error: Optional[str]
processing_error_type: Optional[str]
processing_is_retryable: Optional[bool]

Limits & Quotas

  • Max file size: 50MB per document
  • Batch size: 1-100 documents per batch
  • Concurrent batches: 200 active batches per tenant
  • Supported formats: PDF, DOCX, TXT, MD
  • Chat context: Up to 100 documents per session