Documentation

File Types

Types for file management operations

UserFile

Summary of a user file (in list responses)

python
@dataclass(frozen=True)
class UserFile:
id: str # Unique file identifier
size_bytes: int # File size
has_full_description: bool # Whether full descriptions exist
title: Optional[str] # User or auto-generated title
filename: Optional[str] # Original filename
thumbnail_url: Optional[str] # URL to thumbnail
upload_description: Optional[str] # Quick AI description
visible_text: Optional[str] # OCR text
tags: Optional[list[str]] # Tags
created_at: Optional[datetime] # Upload timestamp
content_created_at: Optional[datetime] # EXIF date
dimensions: Optional[dict[str, int]] # {width, height}
format: Optional[str] # jpeg, png, etc.
variant_status: Optional[str] # Variant generation status
variant_count: Optional[int] # Number of variants generated
medium_url: Optional[str] # URL to medium variant
full_url: Optional[str] # URL to full-size
blur_hash: Optional[str] # BlurHash for placeholders
description_status: Optional[str] # Description generation status
description_error: Optional[str] # Error if description failed
variant_error: Optional[str] # Error if variant generation failed
confidence_score: Optional[float] # Always null for descriptions
content_type: Optional[str] # MIME type (image/jpeg, etc.)
media_type: Optional[str] # "image" | "document" | "link"
content_category: Optional[str] # Content category for tailored AI analysis
# Domain extraction data (populated by content_category)
legend_data: Optional[LegendData]
architectural_design_data: Optional[ArchitecturalDesignData]
ce_plan_data: Optional[CEPlanData]
layout_region_data: Optional[dict[str, Any]]
real_estate_data: Optional[dict[str, Any]]
mining_data: Optional[dict[str, Any]]
pid_data: Optional[dict[str, Any]]
pfd_data: Optional[dict[str, Any]]
schedule_data: Optional[dict[str, Any]]
extraction_corrections: Optional[dict[str, Any]] # Per-domain review corrections
# Document-specific fields
document_type: Optional[str] # pdf, docx, txt, etc.
page_count: Optional[int] # Number of pages
text_extraction_status: Optional[str] # Text extraction status
chunk_count: Optional[int] # Number of text chunks
document_url: Optional[str] # URL to document file
# Link-specific fields
source_url: Optional[str] # Original URL
domain: Optional[str] # Domain of the link
og_metadata: Optional[dict[str, Any]] # Open Graph metadata
favicon_url: Optional[str] # URL to favicon
crawl_status: Optional[str] # Web crawl status
extracted_images: Optional[dict[str, Any]] # Images extracted from link
extracted_images_count: Optional[int] # Number of extracted images

UserFileDetails

Full file details (from get response)

python
@dataclass(frozen=True)
class UserFileDetails:
id: str
size_bytes: int
content_type: str # MIME type
hash: str # File hash
title: Optional[str]
tags: Optional[list[str]]
dimensions: Optional[dict[str, int]]
format: Optional[str]
full_url: Optional[str] # 1024px variant URL
thumbnail_url: Optional[str]
medium_url: Optional[str]
original_url: Optional[str] # Always available fallback
upload_description: Optional[str]
visible_text: Optional[str] # OCR text (plain string)
text_regions: Optional[list[dict[str, Any]]] # OCR text with bounding boxes (0-1 normalized)
description_generated_at: Optional[datetime]
full_descriptions: Optional[list[FullDescription]]
processing_history: Optional[list[ProcessingHistory]]
created_at: Optional[datetime]
updated_at: Optional[datetime]
content_created_at: Optional[datetime] # EXIF metadata date
original_filename: Optional[str] # Original filename from upload
variant_status: Optional[str] # pending | processing | completed | failed
variant_count: Optional[int] # Number of variants generated
blur_hash: Optional[str] # BlurHash for placeholders
description_status: Optional[str] # pending | processing | completed | failed
content_category: Optional[str] # Content category for tailored AI analysis

FileList

Paginated list of files

python
@dataclass(frozen=True)
class FileList:
files: list[UserFile] # File summaries
total_count: int # Total files matching query
has_more: bool # More files available

FullDescription

Detailed AI-generated description

python
@dataclass(frozen=True)
class FullDescription:
id: str # Description identifier
description: str # Full description text
visible_text: Optional[str] # OCR text (plain string)
text_regions: Optional[list[dict[str, Any]]] # OCR text with bounding boxes (0-1 normalized)
confidence_score: Optional[float] # Always null for descriptions
processing_time_ms: Optional[int]
created_at: Optional[datetime]

Text Regions (Bounding Boxes)

The text_regions field on UserFileDetails and FullDescription contains structured OCR data with bounding box coordinates for each detected text region. Coordinates are normalized to 0-1 range (relative to image dimensions).

python
# Each text region is a dict with this structure:
{
"text": "EXIT", # Verbatim text as it appears
"bounding_box": {
"x_min": 0.12, # Left edge (0-1)
"y_min": 0.05, # Top edge (0-1)
"x_max": 0.28, # Right edge (0-1)
"y_max": 0.14 # Bottom edge (0-1)
}
}
# Access bounding boxes from file details:
details = await client.files.get(file_id)
for region in details.text_regions or []:
text = region["text"]
bbox = region["bounding_box"]
print(f"'{text}' at ({bbox['x_min']:.2f}, {bbox['y_min']:.2f})"
f" → ({bbox['x_max']:.2f}, {bbox['y_max']:.2f})")

ProcessingHistory

Processing history entry for a file

python
@dataclass(frozen=True)
class ProcessingHistory:
id: str # History entry identifier
status: str # Processing status
created_at: Optional[datetime] # When the operation started
completed_at: Optional[datetime] # When the operation completed
error_message: Optional[str] # Error message if failed

UpdateFileResult

Result of file update operation, returned by files.update()

python
@dataclass(frozen=True)
class UpdateFileResult:
id: str # File identifier
title: Optional[str] # Updated title
tags: Optional[list[str]] # Updated tags
updated_at: Optional[datetime] # Update timestamp

DeleteFileResult

Result of file deletion operation, returned by files.delete()

python
@dataclass(frozen=True)
class DeleteFileResult:
id: str # Deleted file identifier
deleted_at: Optional[datetime] # Deletion timestamp
message: str # Confirmation message

BatchDeleteFileResult

Result for a single file in a batch delete operation

python
@dataclass(frozen=True)
class BatchDeleteFileResult:
id: str # File identifier
status: str # "deleted", "skipped", or "failed"
message: Optional[str] # Additional details about the operation
deleted_at: Optional[datetime] # Deletion timestamp (if deleted)

BatchDeleteFilesResponse

Response for batch delete operation, returned by files.batch_delete()

python
@dataclass(frozen=True)
class BatchDeleteFilesResponse:
deleted: list[BatchDeleteFileResult] # Successfully deleted files
skipped: list[BatchDeleteFileResult] # Files skipped (e.g., currently processing)
failed: list[BatchDeleteFileResult] # Files that failed to delete
summary: dict[str, int] # Stats: {total, deleted, skipped, failed}