Documentation
Documents API Reference
Upload, process, and search documents with AI-powered text extraction
Document Processing Pipeline
- • Supported formats: PDF, DOCX, TXT, MD (max 50MB per file)
- • Processing: Text extraction → Content segmentation → Search indexing
- • Search: Semantic similarity search for finding content by meaning
- • Chat Integration: Documents can be added to chat sessions for grounded AI responses
Document Upload
/api/v2/document-uploads/request-presigned-urlGenerate presigned URL for direct S3 document upload (expires in 10 minutes)
Request
{ "filename": "report.pdf", "content_type": "application/pdf", "size_bytes": 2048576, "storage_target": "default", "idempotency_key": "unique-retry-key-123"}
// Supported content_type values:// - application/pdf// - application/vnd.openxmlformats-officedocument.wordprocessingml.document (DOCX)// - text/plain// - text/markdown//// storage_target: "default" | "custom"// idempotency_key: optional, 1-128 chars, alphanumeric with _-. allowedResponse
{ "upload_url": "https://nyc3.digitaloceanspaces.com/bucket/presigned-url...", "upload_method": "PUT", "upload_fields": null, "upload_headers": { "Content-Type": "application/pdf" }, "object_key": "documents/{tenant_id}/{uuid}/report.pdf", "expires_at": "2025-01-15T10:40:00", "max_size_bytes": 52428800, "storage_target": "default", "bucket_name": null}
// upload_fields: populated for POST uploads (null for PUT)// upload_headers: populated for PUT uploads (null for POST)// storage_target: "default" or "custom"// bucket_name: populated only for custom storage targets/api/v2/document-uploads/confirmConfirm document upload and trigger text extraction processing (returns 201)
Request
{ "object_key": "documents/{tenant_id}/{uuid}/report.pdf", "size_bytes": 2048576, "content_type": "application/pdf", "checksum": "sha256:abc123def456..."}
// checksum is optional - if omitted, a hash is generated from object_key:size_bytesResponse
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "object_key": "documents/{tenant_id}/{uuid}/report.pdf", "filename": "report.pdf", "document_type": "pdf", "status": "queued", "confirmed": true, "deduplicated": false, "is_idempotent_retry": false}/api/v2/document-uploads/{document_id}/statusCheck document processing status
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "filename": "report.pdf", "document_type": "pdf", "text_extraction_status": "completed", "page_count": 15, "chunk_count": 42, "created_at": "2025-01-15T10:30:00Z", "processing_started_at": "2025-01-15T10:30:05Z", "completed_at": "2025-01-15T10:32:00Z"}
// text_extraction_status values: pending | processing | completed | failed/api/v2/document-uploads/quota-checkCheck document upload quota before starting
Request
// Query parameters:?file_count=5 // requiredResponse
{ "can_proceed": true, "requested": 5, "available": 995, "monthly_limit": 1000, "current_usage": 5, "message": null}Batch Document Upload
/api/v2/document-uploads/batch-preparePrepare batch document upload (1-100 files per batch)
Request
{ "files": [ { "filename": "report1.pdf", "size_bytes": 1048576, "content_type": "application/pdf", "idempotency_key": "file-1-retry-key", "storage_target": "default" }, { "filename": "manual.docx", "size_bytes": 2097152, "content_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document" } ], "additional_params": {}}
// Note: custom storage_target is not supported for batch uploads.// Use the single-file endpoint for custom storage.Response
{ "batch_id": "batch_doc_550e8400", "upload_plan": { "strategy": "parallel", "max_concurrent": 5, "retry_policy": {"max_retries": 3, "backoff_ms": 1000}, "timeout_per_upload": 300 }, "presigned_urls": [ { "file_index": 0, "filename": "report1.pdf", "upload_url": "https://nyc3.digitaloceanspaces.com/...", "upload_method": "PUT", "upload_fields": null, "upload_headers": null, "object_key": "documents/batch_doc_550e8400/report1.pdf", "expires_at": "2025-01-15T10:40:00Z", "upload_intent_id": "intent_abc123" } ], "total_size_bytes": 3145728, "estimated_time_seconds": 45, "expires_at": "2025-01-15T10:40:00Z"}/api/v2/document-uploads/batch-confirmConfirm batch document uploads and trigger processing
Request
{ "batch_id": "batch_doc_550e8400", "confirmations": [ { "object_key": "documents/batch_doc_550e8400/report1.pdf", "success": true, "file_size": 1048576, "checksum": "sha256:abc123...", "error_message": null } ]}Response
{ "batch_id": "batch_doc_550e8400", "successful_uploads": 2, "failed_uploads": 0, "processing_status": "queued", "failed_files": null, "message": "All 2 documents confirmed and queued for processing", "documents": [ { "document_id": "doc_550e8400_001", "object_key": "documents/batch_doc_550e8400/report1.pdf", "filename": "report1.pdf", "document_type": "pdf", "text_extraction_status": "pending", "created_at": "2025-01-15T10:30:00Z" } ]}
// processing_status values: queued | partial | failed | already_processed/api/v2/document-uploads/batch/{batch_id}/statusGet batch document upload status
Response
{ "batch_id": "batch_doc_550e8400", "status": "completed", "total_documents": 2, "confirmed_documents": 2, "processed_documents": 2, "failed_documents": 0, "created_at": "2025-01-15T10:30:00Z", "updated_at": "2025-01-15T10:35:00Z", "documents": [ { "document_id": "doc_550e8400_001", "object_key": "documents/batch_doc_550e8400/report1.pdf", "filename": "report1.pdf", "document_type": "pdf", "text_extraction_status": "completed", "created_at": "2025-01-15T10:30:00Z" } ]}
// status values: pending | uploading | processing | completed | failed | expiredDocument Management
/api/v2/documentsList all documents with pagination and filtering
Request
// Query parameters:?page=1 // optional, default: 1&page_size=20 // optional, default: 20, 1-100&status_filter=completed // optional, options: pending | processing | completed | failedResponse
{ "documents": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "content_type": "application/pdf", "size_bytes": 2048576, "page_count": 45, "chunk_count": 128, "extraction_status": "completed", "created_at": "2025-01-15T10:30:00Z" } ], "total_count": 15, "page": 1, "page_size": 20}/api/v2/documents/{document_id}Get document metadata
Response
{ "id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "content_type": "application/pdf", "size_bytes": 2048576, "page_count": 45, "chunk_count": 128, "extraction_status": "completed", "created_at": "2025-01-15T10:30:00Z"}/api/v2/documents/{document_id}Update document metadata (title and tags). Only provided fields are updated.
Request
{ "title": "Updated Safety Manual 2025", "tags": ["safety", "compliance", "manual"]}
// Both fields are optional - send only what you want to update// title: max 255 characters// tags: max 40 tags, each tag max 50 charactersResponse
{ "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Updated Safety Manual 2025", "tags": ["safety", "compliance", "manual"], "updated_at": "2025-01-15T11:00:00Z"}/api/v2/documents/{document_id}/textGet full extracted text from document
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "text": "SAFETY MANUAL\n\nChapter 1: Introduction\n\nThis manual provides...", "page_count": 45, "metadata": { "extraction_method": "pdfplumber", "language": "en" }}/api/v2/documents/{document_id}/chunksGet all chunks from a document
Request
// Query parameters:?include_embeddings=true // optional, default: falseResponse
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "chunks": [ { "chunk_id": "chunk_001", "document_id": "550e8400-e29b-41d4-a716-446655440000", "document_filename": "", "chunk_index": 0, "content": "Safety inspections must be conducted quarterly...", "page_numbers": [12, 13], "heading_hierarchy": ["Chapter 3", "Inspections"], "similarity_score": 1.0, "metadata": { "token_count": 256, "chunk_type": "paragraph", "embedding_status": "completed" } } ], "total_chunks": 128, "status_counts": {"completed": 128, "pending": 0, "failed": 0}}
// status_counts is only included when include_embeddings=true// similarity_score is always 1.0 (not from a search query)// embedding_status in metadata is null when include_embeddings=false/api/v2/documents/{document_id}/downloadDownload original document file (redirects to presigned S3 URL)
Response
// Returns 302 Redirect to presigned S3 URL// URL expires in 1 hour// Content-Disposition header set for download/api/v2/documents/{document_id}Delete document with cascade cleanup (chunks and S3 file)
Response
// Returns 204 No Content on success
// Note: Cannot delete while document is being processed// Returns 409 Conflict if text extraction or embedding in progress/api/v2/documents/batch-deleteDelete multiple documents at once (max 100 per batch)
Request
{ "document_ids": [ "550e8400-e29b-41d4-a716-446655440000", "660f9500-f39c-52e5-b827-557766550111" ]}Response
{ "deleted": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "status": "deleted", "message": null, "deleted_at": "2025-01-15T10:30:00Z" } ], "skipped": [], "failed": [ { "id": "660f9500-f39c-52e5-b827-557766550111", "status": "failed", "message": "Document is currently being processed", "deleted_at": null } ], "summary": { "total": 2, "deleted": 1, "skipped": 0, "failed": 1 }}Semantic Search
AI-Powered Search
Search uses semantic similarity to find content by meaning, not just keywords - "damaged equipment" matches "broken machinery" even without exact words. Multiple embeddings are used to power search across both documents and image descriptions.
/api/v2/documents/searchSearch document chunks by semantic similarity
Request
{ "query": "safety inspection requirements", "limit": 20, "similarity_threshold": 0.3, "document_ids": ["550e8400-e29b-41d4-a716-446655440000"]}
// document_ids is optional - omit to search all documentsResponse
{ "query": "safety inspection requirements", "results": [ { "chunk_id": "chunk_001", "document_id": "550e8400-e29b-41d4-a716-446655440000", "document_filename": "safety_manual.pdf", "chunk_index": 5, "content": "Safety inspections must be conducted quarterly...", "page_numbers": [12, 13], "heading_hierarchy": ["Chapter 3", "Inspections", "Schedule"], "similarity_score": 0.87, "metadata": {"chunk_type": "paragraph"} } ], "total_count": 15, "search_time_ms": 45}Limits & Constraints
Document Limits
- • Max file size: 50MB per document
- • Batch size: 1-100 documents per batch
- • Concurrent batches: 200 active batches per tenant
- • Presigned URL expiry: 10 minutes
- • Search query length: 1-1000 characters
- • Search results: 1-100 per query
- • Tags per document: max 40 tags, each max 50 characters
- • Title length: max 255 characters
- • Batch delete: max 100 documents per request