OCR Block

The OCR (Optical Character Recognition) block extracts text from images and documents. It's one of the most commonly used blocks for document processing workflows, converting visual text into machine-readable format.

Overview

The OCR block uses advanced computer vision models to recognize and extract text from various image formats including scanned documents, photographs, and digital images. It supports multiple languages and can handle different document layouts and text orientations.

Configuration Options

Input Source

Image Source: Select from message payload, file upload, or specific message property
Supported Formats: PNG, JPG, JPEG, TIFF, PDF (first page), BMP

OCR Settings

Language: Select target language for text recognition
- English (default)
OCR Engine: Choose the recognition engine
Confidence Threshold: Minimum confidence level for text recognition (0.0 - 1.0)

Output Options

Text Only: Extract plain text without positioning
Structured Output: Include bounding boxes and confidence scores
Word Level: Extract individual words with coordinates
Line Level: Extract text lines with positioning

Processing Options

Image Preprocessing: Enhance image quality before OCR
Deskew: Automatically correct image rotation
Noise Reduction: Remove image noise for better recognition
Resolution Enhancement: Upscale low-resolution images

Input Message Format

The OCR block expects an image in the message payload:

{
    payload: /* image buffer or base64 string */,
    filename: "document.png", // optional
    mimetype: "image/png" // optional
}

Output Message Format

Text Only Output

{
    payload: {
        text: "Extracted text content from the image...",
        confidence: 0.92,
        language: "en"
    }
}

Structured Output

{
    payload: {
        text: "Complete extracted text",
        words: [
            {
                text: "Hello",
                confidence: 0.98,
                bbox: [10, 20, 45, 35]
            },
            {
                text: "World",
                confidence: 0.95,
                bbox: [50, 20, 85, 35]
            }
        ],
        lines: [
            {
                text: "Hello World",
                confidence: 0.96,
                bbox: [10, 20, 85, 35]
            }
        ],
        confidence: 0.93
    }
}

Bounding Box Format

Bounding boxes are provided as [x1, y1, x2, y2] coordinates:

x1, y1: Top-left corner
x2, y2: Bottom-right corner

Common Use Cases

Document Digitization

Convert scanned documents to editable text:

File Upload → OCR → Text Processor → Save Document

Invoice Processing

Extract text from invoice images:

http in → OCR → Entity Extractor → Template Matcher → http response

Form Processing

Extract data from filled forms:

Image Input → OCR → Field Extractor → Validation → Database Save

Multi-language Documents

Process documents in different languages:

Document → Language Detection → OCR (with language setting) → Translation

Best Practices

Image Quality: Ensure high-quality input images for better accuracy
Preprocessing: Use image enhancement for poor-quality scans
Language Setting: Set the correct language for better recognition
Confidence Filtering: Filter out low-confidence results
Post-processing: Clean extracted text for better downstream processing

Image Quality Tips

Optimal Input Images

Resolution: At least 300 DPI for scanned documents
Format: PNG or TIFF for best quality
Contrast: High contrast between text and background
Orientation: Properly oriented (not rotated)

Problematic Images

Very low resolution (< 150 DPI)
Blurry or out-of-focus images
Heavily skewed or rotated text
Poor lighting or shadows
Complex backgrounds

Common Flow Patterns

Basic OCR Pipeline

Image Input → OCR → Text Cleaning → Output

Document Processing Workflow

PDF Input → Page Extraction → OCR → Entity Extraction → Data Validation

Multi-page Document Processing

PDF Input → Split Pages → Array Loop → OCR → Combine Results

OCR with Quality Check

Image → Image Quality Check → OCR → Confidence Filter → Text Output
                            ↓
                    Low Quality → Image Enhancement → OCR

Error Handling

Common issues and solutions:

No Text Detected

Check image quality and resolution
Verify the image contains readable text
Try image preprocessing options

Low Confidence Scores

Improve image quality
Check language settings
Consider image enhancement preprocessing

Incorrect Text Recognition

Verify correct language setting
Check for image skew or rotation
Consider manual image correction

Performance Issues

Reduce image size while maintaining quality
Process pages individually for multi-page documents
Use appropriate image formats

Integration Examples

With Entity Extractor

// OCR output feeds into Entity Extractor
{
    payload: {
        text: "Invoice from ABC Corp dated January 15, 2024 for $1,500.00",
        confidence: 0.95
    }
}

With Template Matcher

// OCR provides text for template matching
{
    payload: {
        text: "Complete document text...",
        structure: "invoice" // detected document type
    }
}

With Document Classifier

// OCR text used for document classification
{
    payload: {
        text: "Extracted text content",
        metadata: {
            pages: 1,
            words: 245,
            confidence: 0.89
        }
    }
}

Performance Considerations

Image Size: Larger images take longer to process
Resolution: Higher resolution improves accuracy but increases processing time
Batch Processing: Process multiple images sequentially for better resource utilization
Memory Usage: Large images consume more memory during processing

Tips

Test with sample images to determine optimal settings
Use debug blocks to inspect OCR output structure
Consider combining with image preprocessing blocks for better results
Monitor confidence scores to ensure quality
Use structured output when you need positioning information

Enhance OCR results by combining with Image Processor for preprocessing and Entity Extractor for data extraction.

OCR

On this page