RAP Logo
Blocks ReferenceMulti modal

Document Understander

Extract fields and structured information from documents (images) with optional OCR data using various AI algorithms.

Document Understander Block

The Document Understander block is designed for extracting fields from documents (images) with optional Optical Character Recognition (OCR) data. It uses various AI algorithms to understand document structure and extract specific information fields automatically.

Overview

The Document Understander combines document understanding, field extraction, and structured data processing to automatically identify and extract specific information from documents. It can handle various document types and extract structured data for further processing.

Configuration Options

Algorithm Selection

Select an operation from the Choose Algorithm dropdown:

  • Field Extraction: Extract specific fields from documents
  • Structured Data Extraction: Extract structured information (tables, forms)
  • Entity Recognition: Identify and extract named entities
  • Layout Analysis: Analyze document layout and structure
  • Multi-modal Understanding: Advanced document understanding with visual and textual analysis

Input Configuration

Document Input

  • Property: msg.payload.document_path
  • Type: string
  • Description: Path to the document image file
  • Supported formats: .png, .jpg, .jpeg, .pdf, .tiff

Field Definitions

  • Property: msg.payload.fields
  • Type: array
  • Description: List of fields to extract from the document
  • Example:
    [
      {
        "name": "invoice_number",
        "type": "text",
        "description": "Invoice number"
      },
      {
        "name": "total_amount",
        "type": "currency",
        "description": "Total amount"
      }
    ]

OCR Data (Optional)

  • Property: msg.payload.ocr_data
  • Type: object
  • Description: Pre-extracted OCR text data
  • Format: JSON object with text and confidence scores

Processing Options

Extraction Confidence

  • Type: number
  • Range: 0.0 to 1.0
  • Default: 0.7
  • Description: Minimum confidence threshold for field extraction

Output Format

  • Type: string
  • Options: ["json", "structured", "key_value"]
  • Default: "json"
  • Description: Format of the extracted data

Include Metadata

  • Type: boolean
  • Default: true
  • Description: Include extraction metadata and confidence scores

Use Cases

Invoice Processing

Extract key fields from invoices:

Invoice image → Document Understander → Extracted fields → Database storage

Form Processing

Extract information from forms:

Form image → Document Understander → Form data → Validation and processing

Contract Analysis

Extract key terms and information:

Contract document → Document Understander → Contract terms → Legal review

Receipt Processing

Extract transaction details:

Receipt image → Document Understander → Transaction data → Expense tracking

Common Patterns

Basic Field Extraction

// Configuration
// Algorithm: Field Extraction
// Output Format: json

// Input message:
{
  "payload": {
    "document_path": "documents/invoice_001.pdf",
    "fields": [
      {
        "name": "invoice_number",
        "type": "text",
        "description": "Invoice number"
      },
      {
        "name": "total_amount",
        "type": "currency",
        "description": "Total amount"
      },
      {
        "name": "due_date",
        "type": "date",
        "description": "Payment due date"
      }
    ]
  }
}

// Example flow:
// inject → Document Understander → debug (extracted fields)

Structured Data Extraction

// Configuration
// Algorithm: Structured Data Extraction
// Include Metadata: true

// Input message:
{
  "payload": {
    "document_path": "forms/application.pdf",
    "fields": [
      {
        "name": "applicant_name",
        "type": "text",
        "description": "Full name of applicant"
      },
      {
        "name": "contact_info",
        "type": "object",
        "description": "Contact information object"
      }
    ]
  }
}

// Example flow:
// inject → Document Understander → debug (structured data)

Multi-modal Understanding

// Configuration
// Algorithm: Multi-modal Understanding
// Output Format: structured

// Input message:
{
  "payload": {
    "document_path": "documents/contract.png",
    "fields": [
      {
        "name": "contract_type",
        "type": "text",
        "description": "Type of contract"
      },
      {
        "name": "parties",
        "type": "array",
        "description": "Contracting parties"
      }
    ],
    "ocr_data": {
      "text": "CONTRACT AGREEMENT...",
      "confidence": 0.95
    }
  }
}

// Example flow:
// OCR → Document Understander → debug (multi-modal extraction)

Advanced Features

Custom Field Types

Define custom field types for specific extraction needs:

{
  "fields": [
    {
      "name": "signature_present",
      "type": "boolean",
      "description": "Whether document contains a signature"
    },
    {
      "name": "table_data",
      "type": "table",
      "description": "Extract table data as structured object"
    },
    {
      "name": "contact_info",
      "type": "contact",
      "description": "Extract contact information object"
    }
  ]
}

Confidence Scoring

Detailed confidence analysis for each extracted field:

{
  "extracted_fields": {
    "invoice_number": {
      "value": "INV-2024-001",
      "confidence": 0.95,
      "location": {
        "page": 1,
        "coordinates": [100, 200, 300, 250]
      }
    },
    "total_amount": {
      "value": "$1,250.00",
      "confidence": 0.92,
      "location": {
        "page": 1,
        "coordinates": [400, 300, 500, 350]
      }
    }
  },
  "overall_confidence": 0.94
}

Layout Analysis

Understand document structure and layout:

{
  "document_analysis": {
    "layout_type": "invoice",
    "sections": [
      {
        "type": "header",
        "content": "Company information",
        "confidence": 0.98
      },
      {
        "type": "body",
        "content": "Invoice details",
        "confidence": 0.95
      },
      {
        "type": "footer",
        "content": "Payment information",
        "confidence": 0.92
      }
    ],
    "tables_detected": 1,
    "signatures_detected": 1
  }
}

Output Structure

Basic Field Extraction

{
  "document_path": "documents/invoice_001.pdf",
  "extracted_fields": {
    "invoice_number": "INV-2024-001",
    "total_amount": "$1,250.00",
    "due_date": "2024-02-15"
  },
  "extraction_metadata": {
    "algorithm_used": "Field Extraction",
    "processing_time": 3.2,
    "overall_confidence": 0.94,
    "timestamp": "2024-01-15T10:30:00Z"
  }
}

Detailed Extraction with Confidence

{
  "document_path": "forms/application.pdf",
  "extracted_fields": {
    "applicant_name": {
      "value": "John Smith",
      "confidence": 0.96,
      "type": "text"
    },
    "contact_info": {
      "value": {
        "email": "[email protected]",
        "phone": "+1-555-123-4567",
        "address": "123 Main St, City, State 12345"
      },
      "confidence": 0.89,
      "type": "contact"
    }
  },
  "extraction_metadata": {
    "algorithm_used": "Structured Data Extraction",
    "fields_extracted": 2,
    "fields_failed": 0,
    "processing_time": 4.1
  }
}

Multi-modal Results

{
  "document_path": "contracts/agreement.pdf",
  "extracted_fields": {
    "contract_type": {
      "value": "Service Agreement",
      "confidence": 0.92,
      "sources": ["visual_analysis", "text_analysis"]
    },
    "parties": {
      "value": ["ABC Company", "XYZ Corporation"],
      "confidence": 0.88,
      "sources": ["text_analysis"]
    }
  },
  "document_analysis": {
    "layout_type": "legal_contract",
    "visual_elements": ["signatures", "company_logos"],
    "text_quality": "high"
  }
}

Algorithm Details

Field Extraction

  • Best for: Extracting specific, well-defined fields
  • Features: Pattern recognition and text extraction
  • Use cases: Invoice numbers, dates, amounts, names

Structured Data Extraction

  • Best for: Complex structured information
  • Features: Layout analysis and structured data parsing
  • Use cases: Forms, tables, nested information

Entity Recognition

  • Best for: Identifying named entities and relationships
  • Features: NLP and entity extraction
  • Use cases: Names, organizations, locations, dates

Layout Analysis

  • Best for: Understanding document structure
  • Features: Computer vision and layout recognition
  • Use cases: Document classification, section identification

Multi-modal Understanding

  • Best for: Complex documents requiring both visual and textual analysis
  • Features: Combines computer vision and NLP
  • Use cases: Complex forms, contracts, mixed-content documents

Tips for Best Results

Field Definition

  • Be specific: Define fields with clear, specific descriptions
  • Use appropriate types: Choose field types that match the expected data
  • Provide examples: Include examples of expected field values
  • Consider variations: Account for different formats and variations

Document Quality

  • High resolution: Use clear, high-quality document images
  • Good contrast: Ensure text is clearly readable
  • Complete documents: Include all relevant pages
  • Consistent format: Use consistent document layouts when possible

Configuration

  • Choose appropriate algorithm: Select based on document complexity
  • Set confidence thresholds: Adjust based on accuracy requirements
  • Include metadata: Enable metadata for better understanding
  • Use structured format: For complex extraction requirements

Common Issues and Solutions

Low Extraction Accuracy

  • Issue: Fields not extracted correctly
  • Solution: Improve field definitions, enhance document quality, adjust confidence thresholds

Missing Fields

  • Issue: Some fields not found
  • Solution: Check field definitions, verify document contains the information, try different algorithms

False Positives

  • Issue: Incorrect field extractions
  • Solution: Increase confidence thresholds, refine field definitions, use more specific descriptions

Slow Processing

  • Issue: Long processing times
  • Solution: Use simpler algorithms, optimize document size, process in batches