Document Understander
Extract fields and structured information from documents (images) with optional OCR data using various AI algorithms.
Document Understander Block
The Document Understander block is designed for extracting fields from documents (images) with optional Optical Character Recognition (OCR) data. It uses various AI algorithms to understand document structure and extract specific information fields automatically.
Overview
The Document Understander combines document understanding, field extraction, and structured data processing to automatically identify and extract specific information from documents. It can handle various document types and extract structured data for further processing.
Configuration Options
Algorithm Selection
Select an operation from the Choose Algorithm dropdown:
- Field Extraction: Extract specific fields from documents
- Structured Data Extraction: Extract structured information (tables, forms)
- Entity Recognition: Identify and extract named entities
- Layout Analysis: Analyze document layout and structure
- Multi-modal Understanding: Advanced document understanding with visual and textual analysis
Input Configuration
Document Input
- Property:
msg.payload.document_path - Type: string
- Description: Path to the document image file
- Supported formats: .png, .jpg, .jpeg, .pdf, .tiff
Field Definitions
- Property:
msg.payload.fields - Type: array
- Description: List of fields to extract from the document
- Example:
[ { "name": "invoice_number", "type": "text", "description": "Invoice number" }, { "name": "total_amount", "type": "currency", "description": "Total amount" } ]
OCR Data (Optional)
- Property:
msg.payload.ocr_data - Type: object
- Description: Pre-extracted OCR text data
- Format: JSON object with text and confidence scores
Processing Options
Extraction Confidence
- Type: number
- Range: 0.0 to 1.0
- Default: 0.7
- Description: Minimum confidence threshold for field extraction
Output Format
- Type: string
- Options: ["json", "structured", "key_value"]
- Default: "json"
- Description: Format of the extracted data
Include Metadata
- Type: boolean
- Default: true
- Description: Include extraction metadata and confidence scores
Use Cases
Invoice Processing
Extract key fields from invoices:
Invoice image → Document Understander → Extracted fields → Database storageForm Processing
Extract information from forms:
Form image → Document Understander → Form data → Validation and processingContract Analysis
Extract key terms and information:
Contract document → Document Understander → Contract terms → Legal reviewReceipt Processing
Extract transaction details:
Receipt image → Document Understander → Transaction data → Expense trackingCommon Patterns
Basic Field Extraction
// Configuration
// Algorithm: Field Extraction
// Output Format: json
// Input message:
{
"payload": {
"document_path": "documents/invoice_001.pdf",
"fields": [
{
"name": "invoice_number",
"type": "text",
"description": "Invoice number"
},
{
"name": "total_amount",
"type": "currency",
"description": "Total amount"
},
{
"name": "due_date",
"type": "date",
"description": "Payment due date"
}
]
}
}
// Example flow:
// inject → Document Understander → debug (extracted fields)Structured Data Extraction
// Configuration
// Algorithm: Structured Data Extraction
// Include Metadata: true
// Input message:
{
"payload": {
"document_path": "forms/application.pdf",
"fields": [
{
"name": "applicant_name",
"type": "text",
"description": "Full name of applicant"
},
{
"name": "contact_info",
"type": "object",
"description": "Contact information object"
}
]
}
}
// Example flow:
// inject → Document Understander → debug (structured data)Multi-modal Understanding
// Configuration
// Algorithm: Multi-modal Understanding
// Output Format: structured
// Input message:
{
"payload": {
"document_path": "documents/contract.png",
"fields": [
{
"name": "contract_type",
"type": "text",
"description": "Type of contract"
},
{
"name": "parties",
"type": "array",
"description": "Contracting parties"
}
],
"ocr_data": {
"text": "CONTRACT AGREEMENT...",
"confidence": 0.95
}
}
}
// Example flow:
// OCR → Document Understander → debug (multi-modal extraction)Advanced Features
Custom Field Types
Define custom field types for specific extraction needs:
{
"fields": [
{
"name": "signature_present",
"type": "boolean",
"description": "Whether document contains a signature"
},
{
"name": "table_data",
"type": "table",
"description": "Extract table data as structured object"
},
{
"name": "contact_info",
"type": "contact",
"description": "Extract contact information object"
}
]
}Confidence Scoring
Detailed confidence analysis for each extracted field:
{
"extracted_fields": {
"invoice_number": {
"value": "INV-2024-001",
"confidence": 0.95,
"location": {
"page": 1,
"coordinates": [100, 200, 300, 250]
}
},
"total_amount": {
"value": "$1,250.00",
"confidence": 0.92,
"location": {
"page": 1,
"coordinates": [400, 300, 500, 350]
}
}
},
"overall_confidence": 0.94
}Layout Analysis
Understand document structure and layout:
{
"document_analysis": {
"layout_type": "invoice",
"sections": [
{
"type": "header",
"content": "Company information",
"confidence": 0.98
},
{
"type": "body",
"content": "Invoice details",
"confidence": 0.95
},
{
"type": "footer",
"content": "Payment information",
"confidence": 0.92
}
],
"tables_detected": 1,
"signatures_detected": 1
}
}Output Structure
Basic Field Extraction
{
"document_path": "documents/invoice_001.pdf",
"extracted_fields": {
"invoice_number": "INV-2024-001",
"total_amount": "$1,250.00",
"due_date": "2024-02-15"
},
"extraction_metadata": {
"algorithm_used": "Field Extraction",
"processing_time": 3.2,
"overall_confidence": 0.94,
"timestamp": "2024-01-15T10:30:00Z"
}
}Detailed Extraction with Confidence
{
"document_path": "forms/application.pdf",
"extracted_fields": {
"applicant_name": {
"value": "John Smith",
"confidence": 0.96,
"type": "text"
},
"contact_info": {
"value": {
"email": "[email protected]",
"phone": "+1-555-123-4567",
"address": "123 Main St, City, State 12345"
},
"confidence": 0.89,
"type": "contact"
}
},
"extraction_metadata": {
"algorithm_used": "Structured Data Extraction",
"fields_extracted": 2,
"fields_failed": 0,
"processing_time": 4.1
}
}Multi-modal Results
{
"document_path": "contracts/agreement.pdf",
"extracted_fields": {
"contract_type": {
"value": "Service Agreement",
"confidence": 0.92,
"sources": ["visual_analysis", "text_analysis"]
},
"parties": {
"value": ["ABC Company", "XYZ Corporation"],
"confidence": 0.88,
"sources": ["text_analysis"]
}
},
"document_analysis": {
"layout_type": "legal_contract",
"visual_elements": ["signatures", "company_logos"],
"text_quality": "high"
}
}Algorithm Details
Field Extraction
- Best for: Extracting specific, well-defined fields
- Features: Pattern recognition and text extraction
- Use cases: Invoice numbers, dates, amounts, names
Structured Data Extraction
- Best for: Complex structured information
- Features: Layout analysis and structured data parsing
- Use cases: Forms, tables, nested information
Entity Recognition
- Best for: Identifying named entities and relationships
- Features: NLP and entity extraction
- Use cases: Names, organizations, locations, dates
Layout Analysis
- Best for: Understanding document structure
- Features: Computer vision and layout recognition
- Use cases: Document classification, section identification
Multi-modal Understanding
- Best for: Complex documents requiring both visual and textual analysis
- Features: Combines computer vision and NLP
- Use cases: Complex forms, contracts, mixed-content documents
Tips for Best Results
Field Definition
- Be specific: Define fields with clear, specific descriptions
- Use appropriate types: Choose field types that match the expected data
- Provide examples: Include examples of expected field values
- Consider variations: Account for different formats and variations
Document Quality
- High resolution: Use clear, high-quality document images
- Good contrast: Ensure text is clearly readable
- Complete documents: Include all relevant pages
- Consistent format: Use consistent document layouts when possible
Configuration
- Choose appropriate algorithm: Select based on document complexity
- Set confidence thresholds: Adjust based on accuracy requirements
- Include metadata: Enable metadata for better understanding
- Use structured format: For complex extraction requirements
Common Issues and Solutions
Low Extraction Accuracy
- Issue: Fields not extracted correctly
- Solution: Improve field definitions, enhance document quality, adjust confidence thresholds
Missing Fields
- Issue: Some fields not found
- Solution: Check field definitions, verify document contains the information, try different algorithms
False Positives
- Issue: Incorrect field extractions
- Solution: Increase confidence thresholds, refine field definitions, use more specific descriptions
Slow Processing
- Issue: Long processing times
- Solution: Use simpler algorithms, optimize document size, process in batches
Related Blocks
- OCR - For text extraction before field extraction
- Document Question Answering - For interactive document querying
- Entity Extractor - For named entity extraction
- Table Structure Recognition - For table data extraction
- Document Classifier - For document type classification