Documentation
Files API Reference
Unified file resource for images, documents, and videos — uploads, retrieval, search, digitization, and management
One Resource for Every Media Type
All file operations live under /api/v2/files/*. Images, documents, videos, and links share the same CRUD endpoints; media-specific sub-paths (variants, text, chunks, digitization, similar) return 400 if used on the wrong media type.
Automatic File Type Detection
The API auto-detects file types from content using magic byte signatures. You don't need to set the correct Content-Type header in multipart form data — if omitted or mismatched, the server inspects the payload and routes the file to the right pipeline. Unrecognizable or unsafe files (executables, scripts) are rejected.
Choosing an Upload Method
Three REST endpoints cover three scenarios. If you use the Python SDK,client.files.upload()auto-routes by file size — you don't need to pick.
| Use case | Endpoint | Size cap | Roundtrips |
|---|---|---|---|
| Single small file | POST /files/upload | 100 MB | 1 |
| Multiple small files | POST /files/upload/batch | 100 MB / file | 1 |
| Large file (>100 MB) | POST /files/direct-uploads→PUT to S3→POST /files/direct-uploads/{id}/complete | 5 TB | 3 |
If you send a file over 100 MB to the streaming endpoints, the response includes a suggestion.endpoint pointing you at the presigned flow.
Streaming Upload
Single-request multipart upload for files up to 100 MB. The recommended path for almost every upload — no init/complete dance, no client-side hashing.
/api/v2/files/uploadUpload a single file (multipart/form-data). Auto-routes by detected media type. Returns 201 Created.
Request
// multipart/form-data fields:// Required:// file: (binary) — file to upload (up to 100 MB)// Optional:// title: (string, max 255) — file title// tags: (string) — comma-separated tags// auto_describe: (boolean, default true) — run AI description pipeline// skip_duplicates: (boolean, default false) — skip if hash already exists// storage_target: (string, default "default") — "default" or "custom"// folder_id: (string) — destination folder UUID// project_id: (string) — project workspace UUID (used when no folder_id)// content_category: (string, default "general") — content category for tailored AI// Valid values: general, blueprint, ce_plan, technical_diagram,// architectural_design, product_photo, real_estate, mining, robotics,// artwork, screenshot, document, map, pid, pfd, construction,// facility_assessment// custom_schema_id: (string) — optional saved custom extraction schema UUID;// triggers a second VLM pass with that schema// compliance_type: (string) — "mls" or "marketplace"// compliance_standard: (string) — required if compliance_type is set// (e.g. "nar_baseline", "amazon")// compliance_image_type: (string, default "main") — "main" or "secondary"
curl -X POST https://api.scopix.ai/api/v2/files/upload \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@photo.jpg" \ -F "title=Site Inspection" \ -F "tags=inspection,site"Response
// 201 Created{ "file_id": "550e8400-e29b-41d4-a716-446655440000", "upload_method": "STREAMING", "status": "completed", // "completed" | "processing" | "skipped" "processing_time_ms": 1250.5, "upload_completed": true, "thumbnail_generation_started": true, "analysis_started": true, "skipped": false, "existing_file_id": null, // set when skipped=true and a prior copy exists "storage_target": "default", "media_type": "image", // "image" | "document" | "video" "document_type": null, // "pdf" | "docx" | "txt" | "md" (documents only) "text_extraction_status": null // "pending" | "processing" | "completed" | "failed" (documents)}
// 429 Too Many Requests — backpressure (Retry-After header set)// 413 Payload Too Large — file exceeds streaming limit (use /files/direct-uploads multipart)What does "batch" mean here?
"Batch" in the Scopix API means multiple files uploaded in one HTTP request — the endpoint groups them into a tracked upload session. It is not a job queue. All AI processing of uploaded files happens automatically in the background; you don't submit jobs separately.
/api/v2/files/upload/batchMulti-file batch upload. Per-tier file count: FREE 10, STARTER 50, PROFESSIONAL 100, ENTERPRISE 200. Each file is capped at 100 MB. Returns 201 Created.
Request
// multipart/form-data fields:// Required:// files: (binary[]) — multiple files (each up to 100 MB)// Optional:// tags: (string) — comma-separated tags applied to all files// auto_describe: (boolean, default true) — run AI description pipeline// skip_duplicates: (boolean, default false)// storage_target: (string, default "default")// folder_id: (string) — destination folder UUID// project_id: (string) — project workspace UUID// content_category: (string, default "general")// custom_schema_id: (string) — optional saved custom extraction schema UUID// applied to every file in the batch// compliance_type: (string) — "mls" or "marketplace"// compliance_standard: (string) — required if compliance_type is set// compliance_image_type: (string, default "main") — "main" or "secondary"
curl -X POST https://api.scopix.ai/api/v2/files/upload/batch \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "files=@photo1.jpg" \ -F "files=@photo2.jpg" \ -F "files=@report.pdf"Response
// 201 Created{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "total_files": 3, "accepted_files": 3, "rejected_files": 0, "status": "completed", // "completed" | "partial" | "processing" | "rejected" "immediate_results": [ { "file_id": "660f9500-e29b-41d4-a716-446655440000", "filename": "photo1.jpg", "status": "completed", // "completed" | "failed" | "skipped" "processing_time_ms": 850.2, "skipped": false, "existing_file_id": null, // set when skipped=true and a prior copy exists "error": null, "storage_target": "default", "media_type": "image", "document_type": null, "text_extraction_status": null } ], "status_url": "/api/v2/files/sessions/{session_id}/status", "websocket_channel": "batch.{session_id}", "rejections": null}
// For larger batches, poll status_url or subscribe to websocket_channelPresigned & Multipart Upload
For files larger than 100 MB or when you want the bytes to bypass the API entirely, use the upload-intent flow: request → PUT directly to S3 → complete. Use upload_mode: "single_shot" for files up to 5 GB; "multipart" for anything larger (videos, large datasets).
/api/v2/files/direct-uploadsCreate an upload intent. Returns a presigned PUT URL (single-shot) or per-part presigned URLs (multipart). The client must compute SHA-256 of the file and pin it as claimed_file_hash; the server verifies on /complete. Accepts an optional Idempotency-Key header (1-128 chars, [a-zA-Z0-9_-.]).
Request
{ "filename": "inspection.mp4", "content_type": "video/mp4", "size_bytes": 524288000, "claimed_file_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "upload_mode": "multipart", // optional — omit to let server pick by size. // "single_shot" (<=5 GB) | "multipart" (>=5 MB) "part_size_bytes": 8388608, // multipart only — min 5 MB per part "title": "Site Inspection", // optional "tags": ["inspection", "site-a"], // optional, max 20 tags (1-50 chars each) "folder_id": null, // optional folder UUID "project_id": null, // optional project UUID "skip_duplicates": false, // optional "storage_target": "default", // optional (not currently honored server-side) "auto_describe": true, // optional, default true "content_category": "general", // optional "custom_schema_id": null, // optional saved schema UUID "compliance_type": null, // optional: "mls" | "marketplace" "compliance_standard": null, // required if compliance_type is set "compliance_image_type": "main" // optional: "main" | "secondary"}
// Required: filename, content_type, size_bytes, claimed_file_hash// upload_mode is OPTIONAL — the server auto-selects by size_bytes// claimed_file_hash: 64-char SHA-256 hex (server verifies post-upload)Response
// Single-shot response:{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "upload_mode": "single_shot", "media_type": "video", "method": "PUT", "presigned_url": "https://s3.amazonaws.com/...", "headers": { "Content-Type": "video/mp4", "x-amz-checksum-sha256": "<base64(sha256)>", "x-amz-sdk-checksum-algorithm": "SHA256" }, "object_key": "videos/<tenant>/<hash>.mp4", "expires_at": "2026-04-15T10:40:00Z", "max_size_bytes": 524288000, "bucket_name": "scopix-uploads"}
// Multipart response:{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "upload_mode": "multipart", "media_type": "video", "s3_upload_id": "abc...XYZ", "object_key": "videos/<tenant>/<hash>.mp4", "part_urls": [ {"part_number": 1, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:40:00Z"}, {"part_number": 2, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:40:00Z"} ], "part_size_bytes": 8388608, "total_parts": 63, "expires_at": "2026-04-15T10:40:00Z", "bucket_name": "scopix-uploads"}/api/v2/files/direct-uploads/{upload_id}Get the current state of an upload intent (PENDING, UPLOADED, COMPLETED, FAILED) and per-part progress for multipart.
Response
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "upload_mode": "multipart", "status": "UPLOADED", // PENDING | UPLOADED | COMPLETED | FAILED "media_type": "video", "object_key": "videos/<tenant>/<hash>.mp4", "filename": "inspection.mp4", "size_bytes": 524288000, "total_parts": 63, // null for single_shot "parts_confirmed": 63, // null for single_shot "progress_percent": 100.0, // null for single_shot "created_at": "2026-04-15T10:30:00Z", "expires_at": "2026-04-15T10:40:00Z", "confirmed_at": "2026-04-15T10:38:00Z", "error_message": null}/api/v2/files/direct-uploads/{upload_id}/parts/confirmConfirm a successfully uploaded multipart chunk. Call after each PUT to S3 with the returned ETag.
Request
{ "part_number": 1, "etag": "\"abc123def456\"", "size_bytes": 8388608}
// part_number: 1-indexed// etag: from S3 PUT response (quoted form is fine)Response
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "part_number": 1, "parts_confirmed": 1, "total_parts": 63, "progress_percent": 1.59}/api/v2/files/direct-uploads/{upload_id}/parts/retryGet a fresh presigned URL for re-uploading a failed multipart chunk.
Request
{ "part_number": 5}Response
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "part_number": 5, "url": "https://s3.amazonaws.com/...", "expires_at": "2026-04-15T10:50:00Z"}/api/v2/files/direct-uploads/{upload_id}/completeFinalize an upload (single-shot or multipart). Server completes the S3 multipart, verifies the SHA-256 against claimed_file_hash, creates the file record, and queues media-specific processing (variants/description for images, extraction for documents, ffprobe + analysis for videos). Empty body — server is fully authoritative. Supports the Idempotency-Key header — see /docs/api/idempotency.
Request
{}
// Body must be empty by design. The server uses claimed_file_hash from the// initiate request and the parts list it tracked from /parts/confirm calls.// No client-supplied duration/analysis params — videos use server-side// ffprobe and a 2-credit reservation that the worker reconciles.Response
// 202 Accepted — file stored, downstream processing queued.// Poll GET /files/{file_id} until text_extraction_status /// description_status / video_analysis_status leave "pending".{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "file_id": "660f9500-e29b-41d4-a716-446655440000", "media_type": "video", // "image" | "document" | "video" "filename": "inspection.mp4", "object_key": "videos/<tenant>/<hash>.mp4", "size_bytes": 524288000, "deduplicated": false, // true if an existing file had the same hash "status": "processing" // "processing" | "completed"}
// 409 Conflict — claimed_file_hash mismatch (SHA-256 didn't match S3 object)// 422 Unprocessable Entity — required parts missing on multipart complete/api/v2/files/direct-uploads/{upload_id}Abort an upload intent. For multipart, also aborts the underlying S3 multipart upload (refunds reserved credits if applicable).
Request
// Optional query parameter:?reason=User%20cancelled // up to 255 charsResponse
{ "upload_id": "550e8400-e29b-41d4-a716-446655440000", "aborted": true, "reason": "User cancelled"}File Listing & Retrieval
/api/v2/filesList files with full-text search and filters. Heterogeneous results across media types — use media_types query param to scope.
Request
// Query parameters:?search=damage report // optional, full-text search&search_mode=all // optional, default: all, options: all | metadata | visible_text&tags=safety&tags=inspection // optional, multi-value filter by tags&media_types=image&media_types=document // optional, multi-value: image | document | video | link&folder_id=folder_abc123 // optional, filter by folder&project_id=uuid // optional, filter by project workspace&has_description=true // optional, filter by description status&ids=uuid1&ids=uuid2 // optional, multi-value filter by file IDs&compliance_status=passed // optional, filter by compliance status&date_from=2026-01-01T00:00:00Z // optional&date_to=2026-01-31T23:59:59Z // optional&sort_by=content_created_at // optional, options: created_at | content_created_at | title | size_bytes&sort_order=desc // optional, default: desc&limit=20 // optional, default: 20, 1-100&offset=0 // optional, default: 0Response
{ "items": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Site Photo A", "filename": "site_photo.jpg", "thumbnail_url": "https://cdn.scopix.ai/thumbs/...", "upload_description": "Damaged concrete pillar with visible cracks...", "visible_text": "WARNING: STRUCTURAL DAMAGE", "tags": ["damage", "concrete"], "size_bytes": 2048576, "created_at": "2026-01-15T10:30:00Z", "content_created_at": "2026-01-14T08:00:00Z", "has_full_description": true, "dimensions": {"width": 4000, "height": 3000}, "format": "jpeg", "primary_status": "completed", "variant_status": "completed", "variant_count": 5, "medium_url": "https://cdn.scopix.ai/medium/...", "full_url": "https://cdn.scopix.ai/large/...", "blur_hash": "L6PZfSi_.AyE_3t7t7R**0o#DgR4", "description_status": "completed", "description_error": null, "content_type": "image/jpeg", "media_type": "image", "content_category": "general", "document_type": null, "source_url": null } ], "total_count": 150, "limit": 20, "offset": 0, "has_more": true}
// primary_status is the canonical "poll until done" field — one of:// "pending" | "processing" | "completed" | "failed" | "partially_completed"// It's derived per media type from the component statuses below; prefer it over// branching on media_type + per-component statuses in client code.//// Component statuses (all share the ComponentStatus enum:// "pending" | "queued" | "processing" | "completed" | "failed" | "skipped"):// image: variant_status, description_status// document: text_extraction_status, digitization_status, description_status// video: video_analysis_status, description_status// link: crawl_status, description_status//// Conditional fields by media_type:// document: document_type, page_count, text_extraction_status, chunk_count, document_url// video: duration_seconds, frame_rate, video_codec, resolution, video_analysis_status// link: source_url, domain, og_metadata, favicon_url, crawl_status,// extracted_images, extracted_images_count/api/v2/files/{file_id}Get detailed file information. Discriminated by media_type — variant-specific fields appear only on the matching variant. Accepts full UUID or 8-character prefix.
Request
// Optional query parameter:?format=markdown // optional — when set to "markdown" on an image, the // response includes a formatted_document rendering // of CE plan / legend / schedule / description dataResponse
// media_type: "image"{ "id": "550e8400-e29b-41d4-a716-446655440000", "media_type": "image", "title": "Site Photo A", "tags": ["damage", "concrete"], "size_bytes": 2048576, "content_type": "image/jpeg", "dimensions": {"width": 4000, "height": 3000}, "format": "jpeg", "full_url": "https://cdn.scopix.ai/large/...", "thumbnail_url": "https://cdn.scopix.ai/thumbs/...", "medium_url": "https://cdn.scopix.ai/medium/...", "original_url": "https://cdn.scopix.ai/originals/...", "variant_status": "completed", "variant_count": 5, "upload_description": "Damaged concrete pillar...", "visible_text": "WARNING: STRUCTURAL DAMAGE", "text_regions": [ {"text": "WARNING: STRUCTURAL DAMAGE", "bounding_box": {"x_min": 0.25, "y_min": 0.4, "x_max": 0.75, "y_max": 0.52}} ], "description_generated_at": "2026-01-15T10:32:00Z", "full_descriptions": [...], "created_at": "2026-01-15T10:30:00Z", "updated_at": "2026-01-15T10:35:00Z", "blur_hash": "L6PZfSi_.AyE_3t7t7R**0o#DgR4", "description_status": "completed", "content_category": "general"}
// media_type: "document"{ "id": "...", "media_type": "document", "filename": "safety_manual.pdf", "document_type": "pdf", "page_count": 45, "chunk_count": 128, "text_extraction_status": "completed", "extracted_text": "SAFETY MANUAL\n\nChapter 1...", ...}
// media_type: "video"{ "id": "...", "media_type": "video", "filename": "inspection.mp4", "duration_seconds": 240.5, "frame_rate": 30.0, "video_codec": "h264", "resolution": "1920x1080", "analysis_status": "completed", "thumbnail_url": "https://...", ...}/api/v2/files/{file_id}/downloadDownload original file. Returns 302 redirect to a temporary download URL with Content-Disposition header.
Response
// Returns 302 Redirect to presigned download URL// URL expires in 5 minutes (300 seconds)// Content-Disposition header set for downloadFile Updates & Deletion
/api/v2/files/{file_id}Update file metadata (title, tags, user_description). Pass only the fields you want to change.
Request
{ "title": "Updated Photo Title", "tags": ["updated", "reviewed"], "user_description": "Quarterly inspection — minor surface cracks only"}
// title: optional, max 255 characters// tags: optional, max 40 tags, each max 50 characters// user_description: optional, max 10000 chars; pass null to reset to AI-generated descriptionResponse
{ "id": "550e8400-e29b-41d4-a716-446655440000", "title": "Updated Photo Title", "tags": ["updated", "reviewed"], "user_description": "Quarterly inspection — minor surface cracks only", "upload_description": "A concrete pillar with visible damage...", "updated_at": "2026-01-15T11:00:00Z"}/api/v2/files/{file_id}Soft-delete a file. Recoverable within 30 days.
Response
{ "id": "550e8400-e29b-41d4-a716-446655440000", "deleted_at": "2026-01-15T11:00:00Z", "message": "File deleted successfully"}
// 409 Conflict — cannot delete while document text extraction or// embedding is in progress/api/v2/files/batch-deleteDelete up to 100 files in a single request. Each file is reported individually so partial failures don't block the batch.
Request
{ "file_ids": [ "550e8400-e29b-41d4-a716-446655440000", "660f9500-f39c-52e5-b827-557766550111" ]}
// 1-100 unique UUIDsResponse
{ "deleted": [ {"id": "550e8400-e29b-41d4-a716-446655440000", "status": "deleted", "message": null, "deleted_at": "2026-01-15T11:00:00Z"} ], "skipped": [], "failed": [ {"id": "660f9500-f39c-52e5-b827-557766550111", "status": "failed", "message": "File not found", "deleted_at": null} ], "summary": {"total": 2, "deleted": 1, "skipped": 0, "failed": 1}}Image Operations
Image-only sub-paths. Calling these on a non-image file returns 400.
/api/v2/files/{file_id}/variant/{variant_type}Get a specific image variant. Returns 302 redirect to the variant URL (1-hour expiry).
Request
// variant_type options:// - original: Original uploaded image// - tiny_64: 64px max dimension// - small_256: 256px max dimension// - medium_750: 750px max dimension// - large_1024: 1024px max dimensionResponse
// Returns 302 Redirect to variant URL// 400 Bad Request if file media_type != "image"/api/v2/files/{file_id}/trigger-variantsManually re-queue variant generation. Useful for recovery if the original variant pipeline failed.
Response
{ "success": true, "message": "Variant generation triggered", "task_id": "task_550e8400", "current_status": "processing", "image_id": "550e8400-e29b-41d4-a716-446655440000"}
// If already processing:// {"success": true, "message": "Variant generation already in progress",// "skipped_duplicate": true, ...}/api/v2/files/{file_id}/similarFind visually similar images using hybrid embedding + semantic similarity.
Request
// Query parameters:?limit=20 // optional, 1-50, default: 20Response
{ "reference_image_id": "550e8400-e29b-41d4-a716-446655440000", "items": [ { "image_id": "660f9500-e29b-41d4-a716-446655440000", "title": "Similar beam photo", "description": "Steel beam with surface corrosion...", "relevance_score": 0.92, "vector_similarity": 0.88, "thumbnail_url": "https://cdn.scopix.ai/thumbs/...", "medium_url": "https://cdn.scopix.ai/medium/...", "full_url": "https://cdn.scopix.ai/large/...", "folder_id": "770a0600-e29b-41d4-a716-446655440000", "created_at": "2026-01-10T08:00:00Z" } ], "total_count": 1}
// 400 Bad Request if file media_type != "image"/api/v2/files/{file_id}/extractions/{domain_name}/reviewReview AI extraction results — confirm, reject, or edit extracted items for a domain. Corrections layer on top of AI outputs (originals preserved). Multiple calls merge additively.
Request
{ "item_reviews": { "furniture_items.0": "confirmed", "furniture_items.1": "rejected", "materials.2": "confirmed" }, "field_edits": { "furniture_items.0.name": "Barcelona Chair", "furniture_items.0.material": "leather" }}
// At least one of item_reviews or field_edits is required.//// domain_name: one of:// architectural_design, ce_plan, layout_region, legend,// mining, real_estate, technical_diagram, pid, pfd,// text_regions, mls_compliance, schedule//// item_reviews: keys are dot-path identifiers (e.g. "items.0"),// values must be "confirmed" or "rejected"// field_edits: keys are dot-path field identifiers (e.g. "items.0.name"),// values are the corrected dataResponse
{ "image_id": "550e8400-e29b-41d4-a716-446655440000", "domain_name": "architectural_design", "corrections": { "item_reviews": {"furniture_items.0": "confirmed", "furniture_items.1": "rejected"}, "field_edits": {"furniture_items.0.name": "Barcelona Chair"} }, "updated_at": "2026-04-13T10:30:00Z"}
// 400 Bad Request if file media_type != "image" or invalid domain// 404 Not Found if file or extraction does not existDocument Operations
Document-only sub-paths. Calling these on a non-document file returns 400.
/api/v2/files/{file_id}/textGet the full extracted plain text from a document.
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "text": "SAFETY MANUAL\n\nChapter 1: Introduction\n\nThis manual provides...", "page_count": 45, "metadata": {"language": "en"}}/api/v2/files/{file_id}/chunksGet all chunks (for RAG / search) from a document. Optionally include the embedding vectors.
Request
// Query parameters:?include_embeddings=false // optional, default: falseResponse
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "chunks": [ { "chunk_id": "chunk_001", "document_id": "550e8400-e29b-41d4-a716-446655440000", "document_filename": "safety_manual.pdf", "chunk_index": 0, "content": "Safety inspections must be conducted quarterly...", "page_numbers": [12, 13], "heading_hierarchy": ["Chapter 3", "Inspections"], "similarity_score": null, "metadata": { "token_count": 256, "chunk_type": "paragraph", "embedding_status": "completed" } } ], "total_chunks": 128, "status_counts": {"completed": 128, "pending": 0, "failed": 0}}
// status_counts is only included when include_embeddings=true// similarity_score is null for direct-fetch (only populated in search results)/api/v2/files/{file_id}/digitizationGet the full structural digitization (per-page elements with bounding boxes) for a document.
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "total_pages": 3, "completed_pages": 3, "failed_pages": 0, "pages": [ { "page_number": 1, "status": "completed", "element_count": 5, "elements": [ { "type": "heading", "content": "Safety Manual", "bounding_box": {"x": 0.15, "y": 0.05, "w": 0.70, "h": 0.04}, "metadata": {"level": 1} }, { "type": "paragraph", "content": "This manual provides comprehensive safety guidelines...", "bounding_box": {"x": 0.10, "y": 0.12, "w": 0.80, "h": 0.15} }, { "type": "table", "content": "| Category | Frequency |\n|---|---|\n| Fire | Quarterly |", "bounding_box": {"x": 0.10, "y": 0.30, "w": 0.80, "h": 0.20} } ], "error_message": null } ]}
// status: pending | processing | completed | failed// element types: heading, paragraph, table, key_value, list, figure// bounding_box coordinates are normalized 0-1 relative to page dimensions/api/v2/files/{file_id}/digitization/pages/{page_number}Get digitization elements for a single page (1-indexed).
Response
{ "page_number": 2, "status": "completed", "element_count": 3, "elements": [ { "type": "heading", "content": "Chapter 2: Fire Safety", "bounding_box": {"x": 0.10, "y": 0.05, "w": 0.60, "h": 0.04}, "metadata": {"level": 2} } ], "error_message": null}
// 404 Not Found if no digitization exists for the requested page/api/v2/files/{file_id}/digitization/statusLightweight status check for digitization progress (no element data).
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", "total_pages": 5, "page_statuses": { "1": "completed", "2": "completed", "3": "processing", "4": "pending", "5": "pending" }}/api/v2/files/{file_id}/processing-statusCross-media processing status (works for image, document, and video). Includes per-component subprocess statuses.
Response
{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "filename": "safety_manual.pdf", "document_type": "pdf", "text_extraction_status": "completed", "page_count": 45, "chunk_count": 128, "created_at": "2026-01-15T10:30:00Z", "processing_started_at": "2026-01-15T10:30:05Z", "completed_at": "2026-01-15T10:32:00Z", "error_message": null}Semantic Search & Analyze
AI-Powered Search
Search uses semantic similarity to find content by meaning, not just keywords — "damaged equipment" matches "broken machinery" even without exact words.
/api/v2/files/searchSemantic search over document chunks. Scope to specific documents via document_ids; omit to search all.
Request
{ "query": "safety inspection requirements", "limit": 20, "similarity_threshold": 0.3, "document_ids": ["550e8400-e29b-41d4-a716-446655440000"]}
// document_ids is optional — omit to search all documentsResponse
{ "query": "safety inspection requirements", "items": [ { "chunk_id": "chunk_001", "document_id": "550e8400-e29b-41d4-a716-446655440000", "document_filename": "safety_manual.pdf", "chunk_index": 5, "content": "Safety inspections must be conducted quarterly...", "page_numbers": [12, 13], "heading_hierarchy": ["Chapter 3", "Inspections", "Schedule"], "similarity_score": 0.87, "metadata": {"chunk_type": "paragraph"} } ], "total_count": 15, "search_time_ms": 45}/api/v2/files/analyzeUpload a document and receive extracted text in a single call. Waits up to timeout seconds (default 60). If processing exceeds the timeout, response has status: processing and a job_id for polling. Max 10 MB — use POST /files/upload (or /files/direct-uploads for >100 MB) for larger files.
Request
curl -X POST https://api.scopix.ai/api/v2/files/analyze \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@report.pdf" \ -F "timeout=60"
// Form fields:// file: required (PDF, DOCX, TXT, MD)// timeout: optional, 5-120 (default: 60)// skip_duplicates: optional (default: false)// folder_id: optional// project_id: optionalResponse
// Discriminated union — check status first.
// status: "completed"{ "document_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "filename": "report.pdf", "size_bytes": 2048576, "processing_time_ms": 4500.0, "document_type": "pdf", "text_extraction_status": "completed", "page_count": 15, "chunk_count": 42, "extracted_text": "SAFETY MANUAL\n\nChapter 1: Introduction..."}
// status: "processing" (timeout exceeded — poll GET /job/{job_id}){ "document_id": "...", "status": "processing", "job_id": "...", "poll_url": "/api/v2/job/...", "document_type": "pdf", "text_extraction_status": "pending"}
// status: "failed" or "skipped" (content-hash duplicate)/api/v2/files/analyze/asyncSame input as POST /files/analyze but always returns 202 immediately with a job_id. Use for fire-and-forget or concurrent document processing.
Request
curl -X POST https://api.scopix.ai/api/v2/files/analyze/async \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "file=@report.pdf"Response
// 202 Accepted{ "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", "poll_url": "/api/v2/job/550e8400-e29b-41d4-a716-446655440000"}Export, Quota & Deduplication
/api/v2/files/export/columnsGet available columns grouped by category for building export requests.
Response
{ "groups": { "basic": [ {"field_key": "id", "display_name": "ID", "group": "basic"}, {"field_key": "filename", "display_name": "Filename", "group": "basic"}, {"field_key": "title", "display_name": "Title", "group": "basic"}, {"field_key": "size_bytes", "display_name": "Size (bytes)", "group": "basic"}, {"field_key": "created_at", "display_name": "Created At", "group": "basic"} ], "descriptions": [ {"field_key": "upload_description", "display_name": "AI Description", "group": "descriptions"}, {"field_key": "user_description", "display_name": "User Description", "group": "descriptions"}, {"field_key": "tags", "display_name": "Tags", "group": "descriptions"} ] }}/api/v2/files/exportExport file metadata as CSV, XLSX, DOCX, or Google Sheets.
Request
{ "format": "csv", "columns": [ {"field_key": "filename"}, {"field_key": "title"}, {"field_key": "upload_description", "display_name": "AI Description"}, {"field_key": "tags"}, {"field_key": "created_at"} ], "folder_id": "550e8400-e29b-41d4-a716-446655440000", "include_subfolders": true, "flatten_tags": true, "sheet_name": "Files"}
// format: required — "csv", "xlsx", "docx", or "google_sheets"// columns: required, at least 1 column// field_key: required — from the /export/columns registry// file_ids: optional UUIDs to scope export// folder_id: optional folder scope// include_subfolders: optional, default: false// flatten_tags: optional, default: true// google_sheets_title: optional (for google_sheets format)// connection_id: optional UUID — Google Drive connection (required for google_sheets)Response
{ "download_url": "https://storage.example.com/exports/files_2026-04-13.csv", "spreadsheet_url": null, "record_count": 42, "format": "csv"}/api/v2/files/quota-checkCheck upload quota before starting (prevents failed uploads from quota exhaustion).
Request
// Query parameters:?file_count=10 // requiredResponse
{ "can_proceed": true, "requested": 10, "available": 990, "monthly_limit": 1000, "current_usage": 10, "prepaid_credits": 0, "max_batch_size": 50, "max_concurrent_uploads": 10, "message": null}
// monthly_limit: -1 for unlimited tiers// When quota exceeded, can_proceed=false and message describes the shortfall/api/v2/files/check-duplicatesCheck which file hashes already exist for this tenant before uploading.
Request
{ "hashes": [ "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592" ]}
// hashes: SHA-256 content hashes (1-250 items)Response
{ "duplicates": [ "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" ], "unique": [ "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592" ]}Upload Sessions & Status
Batch uploads create a session — a per-batch tracking record. Use these endpoints to poll progress, retrieve per-file results, cancel pending work, and look up the unified processing status of any individual file by ID.
/api/v2/files/uploads-status/{file_id}Get the unified upload + processing status for any file ID.
Response
{ "file_id": "550e8400-e29b-41d4-a716-446655440000", "session_id": "660f9511-f3ac-52e5-b827-557766551111", "unified_status": "processing", // uploading | confirming | queued | processing | // completed | failed | partially_completed "component_statuses": { "variant_status": "completed", "description_status": "processing", "upload_status": "completed", "processing_status": "processing" }, "processing_ids": ["task_001", "task_002"], "error_message": null, "last_error_at": null, "created_at": "2026-01-15T10:30:00Z", "last_updated_at": "2026-01-15T10:30:45Z", "completed_at": null, "retry_count": 0, "processing_duration_seconds": 45.2, "is_stuck": false, "is_terminal": false}/api/v2/files/sessionsList upload sessions for the authenticated tenant.
Request
// Query parameters:?status=processing // optional, filter by status&upload_method=streaming // optional, "streaming" or "presigned"&offset=0 // pagination (default: 0)&limit=20 // default: 20, 1-100Response
{ "items": [ { "session_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "upload_method": "streaming", "total_files": 20, "completed_files": 18, "failed_files": 1, "skipped_files": 1, "progress_percentage": 100.0, "created_at": "2026-01-15T10:30:00Z", "completed_at": "2026-01-15T10:32:00Z" } ], "total_count": 15, "limit": 20, "offset": 0, "has_more": false}/api/v2/files/sessions/{session_id}/statusGet current progress and recent activity for an upload session.
Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "status": "processing", // pending | uploading | processing | // completed | failed | expired | cancelled "total_files": 20, "completed_files": 15, "failed_files": 1, "skipped_files": 2, "pending_files": 2, "progress_percentage": 90.0, "created_at": "2026-01-15T10:30:00Z", "started_at": "2026-01-15T10:30:00Z", "completed_at": null, "estimated_completion_time": null, "recent_completions": [ { "filename": "photo1.jpg", "file_id": "660e8400-e29b-41d4-a716-446655440001", "status": "completed", "description": "A site inspection showing...", "processing_time_ms": null } ], "recent_errors": [], "results_url": "/api/v2/files/sessions/{session_id}/results", "websocket_channel": "batch.{session_id}"}/api/v2/files/sessions/{session_id}/resultsPaginated per-file results from a session.
Request
// Query parameters:?include_failed=true // include failed files (default: true)&offset=0 // default: 0&limit=100 // 1-500, default: 100Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "results": [ { "file_id": "660e8400-e29b-41d4-a716-446655440001", "filename": "photo1.jpg", "status": "completed", "description": "Safety inspection showing...", "visible_text": "EXIT sign visible...", "tags": ["safety", "construction"], "processing_time_ms": null, "error_message": null, "thumbnail_url": "https://...", "created_at": "2026-01-15T10:30:05Z" } ], "total_count": 20, "offset": 0, "limit": 100, "has_more": false, "summary": {"total_files": 20, "completed": 17, "failed": 1, "skipped": 2}}/api/v2/files/sessions/{session_id}/cancelCancel a pending or in-progress session. Already-processed files keep their results.
Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "status": "cancelled", "total_files": 20, "completed_files": 10, "failed_files": 0, "skipped_files": 0, "pending_files": 10, "progress_percentage": 50.0, "created_at": "2026-01-15T10:30:00Z", "started_at": "2026-01-15T10:30:00Z", "completed_at": "2026-01-15T10:31:00Z", "estimated_completion_time": null, "recent_completions": [], "recent_errors": [], "results_url": "/api/v2/files/sessions/{session_id}/results", "websocket_channel": "batch.{session_id}"}
// 400 Bad Request if session is already completed/cancelled/expired/api/v2/files/sessions/{session_id}/summaryAggregated per-file status counts for every file in a session (uploading / processing / completed / failed / stuck) plus description status counts. Optimised for dashboards.
Response
{ "session_id": "550e8400-e29b-41d4-a716-446655440000", "overall_status": "processing", "completion_percentage": 65.0, "counts": { "total": 20, "uploading": 2, "confirming": 0, "queued": 1, "processing": 4, "completed": 13, "failed": 0, "partially_completed": 0, "stuck": 0 }, "description_counts": { "pending": 0, "processing": 4, "completed": 13, "failed": 0, "skipped": 0 }, "error_summary": { "count": 0, "messages": [] }, "created_at": "2026-01-15T10:30:00Z", "last_activity_at": "2026-01-15T10:33:15Z"}/api/v2/files/sessions/stuckList uploads that have not made progress in the threshold window. Useful for client-side recovery flows. Each entry is the same shape as GET /files/uploads-status/{file_id}.
Request
// Query parameters:?stuck_minutes=30 // default: 30, threshold for "stuck" (min 1)&limit=100 // default: 100, 1-500Response
{ "stuck_count": 1, "images": [ { "file_id": "550e8400-e29b-41d4-a716-446655440000", "session_id": "770a0600-e29b-41d4-a716-446655440000", "unified_status": "uploading", "component_statuses": { "variant_status": null, "description_status": null, "upload_status": "streaming", "processing_status": null }, "processing_ids": [], "error_message": null, "last_error_at": null, "created_at": "2026-01-15T10:00:00Z", "last_updated_at": "2026-01-15T10:00:15Z", "completed_at": null, "retry_count": 0, "processing_duration_seconds": 1845.0, "is_stuck": true, "is_terminal": false } ]}Telemetry
/api/v2/files/log-upload-eventFire-and-forget client-side upload telemetry (e.g., browser-side errors, retries). Unauthenticated. Server logs the event for diagnostics; never blocks the upload.
Request
{ "event_type": "upload_retry", // required, max 100 chars "message": "Chunk 5 failed with NetworkError, retrying", // required, max 2000 chars "data": { // optional, serialized size <= 10 KB "upload_id": "550e8400-e29b-41d4-a716-446655440000", "part_number": 5, "user_agent": "Mozilla/5.0 ..." }, "timestamp": "2026-04-15T10:35:00Z", // optional client-side timestamp "session_id": "660f9511-f3ac-52e5-b827-557766551111", // optional, max 100 chars (matches session_id from POST /files/upload/batch) "file_index": 5, // optional, 0-10000 "file_name": "huge.pdf" // optional, max 500 chars}Response
{ "status": "logged", "timestamp": "2026-04-15T10:35:00.123456+00:00"}Limits & Constraints
- • Streaming upload max: 100 MB per file
- • Single-shot presigned max: 5 GB (S3 PUT cap)
- • Multipart max: 5 TB (per S3 limits)
- • Multipart part size: 5 MB minimum per part; S3 imposes a 5 GB per-part hard limit
- • Synchronous document analyze: 10 MB (`/files/analyze`); use `/files/upload` for larger
- • Streaming batch size: 10–200 files per request (tier-dependent); each file capped at 100 MB
- • Batch delete: max 100 files per request
- • Search query length: 1–1000 characters
- • Tags per file: max 40 tags, each max 50 characters
- • Title length: max 255 characters
- • User description: max 10000 characters
- • Hash dedup batch: 1–250 hashes per call

