RAP Logo

OCR

Extract text from images and documents using Optical Character Recognition

OCR Block

The OCR (Optical Character Recognition) block extracts text from images and documents. It's one of the most commonly used blocks for document processing workflows, converting visual text into machine-readable format.

Overview

The OCR block uses advanced computer vision models to recognize and extract text from various image formats including scanned documents, photographs, and digital images. It supports multiple languages and can handle different document layouts and text orientations.

Configuration Options

Input Source

  • Image Source: Select from message payload, file upload, or specific message property
  • Supported Formats: PNG, JPG, JPEG, TIFF, PDF (first page), BMP

OCR Settings

  • Language: Select target language for text recognition
    • English (default)
  • OCR Engine: Choose the recognition engine
  • Confidence Threshold: Minimum confidence level for text recognition (0.0 - 1.0)

Output Options

  • Text Only: Extract plain text without positioning
  • Structured Output: Include bounding boxes and confidence scores
  • Word Level: Extract individual words with coordinates
  • Line Level: Extract text lines with positioning

Processing Options

  • Image Preprocessing: Enhance image quality before OCR
  • Deskew: Automatically correct image rotation
  • Noise Reduction: Remove image noise for better recognition
  • Resolution Enhancement: Upscale low-resolution images

Input Message Format

The OCR block expects an image in the message payload:

{
    payload: /* image buffer or base64 string */,
    filename: "document.png", // optional
    mimetype: "image/png" // optional
}

Output Message Format

Text Only Output

{
    payload: {
        text: "Extracted text content from the image...",
        confidence: 0.92,
        language: "en"
    }
}

Structured Output

{
    payload: {
        text: "Complete extracted text",
        words: [
            {
                text: "Hello",
                confidence: 0.98,
                bbox: [10, 20, 45, 35]
            },
            {
                text: "World",
                confidence: 0.95,
                bbox: [50, 20, 85, 35]
            }
        ],
        lines: [
            {
                text: "Hello World",
                confidence: 0.96,
                bbox: [10, 20, 85, 35]
            }
        ],
        confidence: 0.93
    }
}

Bounding Box Format

Bounding boxes are provided as [x1, y1, x2, y2] coordinates:

  • x1, y1: Top-left corner
  • x2, y2: Bottom-right corner

Common Use Cases

Document Digitization

Convert scanned documents to editable text:

File Upload → OCR → Text Processor → Save Document

Invoice Processing

Extract text from invoice images:

http in → OCR → Entity Extractor → Template Matcher → http response

Form Processing

Extract data from filled forms:

Image Input → OCR → Field Extractor → Validation → Database Save

Multi-language Documents

Process documents in different languages:

Document → Language Detection → OCR (with language setting) → Translation

Best Practices

  1. Image Quality: Ensure high-quality input images for better accuracy
  2. Preprocessing: Use image enhancement for poor-quality scans
  3. Language Setting: Set the correct language for better recognition
  4. Confidence Filtering: Filter out low-confidence results
  5. Post-processing: Clean extracted text for better downstream processing

Image Quality Tips

Optimal Input Images

  • Resolution: At least 300 DPI for scanned documents
  • Format: PNG or TIFF for best quality
  • Contrast: High contrast between text and background
  • Orientation: Properly oriented (not rotated)

Problematic Images

  • Very low resolution (< 150 DPI)
  • Blurry or out-of-focus images
  • Heavily skewed or rotated text
  • Poor lighting or shadows
  • Complex backgrounds

Common Flow Patterns

Basic OCR Pipeline

Image Input → OCR → Text Cleaning → Output

Document Processing Workflow

PDF Input → Page Extraction → OCR → Entity Extraction → Data Validation

Multi-page Document Processing

PDF Input → Split Pages → Array Loop → OCR → Combine Results

OCR with Quality Check

Image → Image Quality Check → OCR → Confidence Filter → Text Output

                    Low Quality → Image Enhancement → OCR

Error Handling

Common issues and solutions:

No Text Detected

  • Check image quality and resolution
  • Verify the image contains readable text
  • Try image preprocessing options

Low Confidence Scores

  • Improve image quality
  • Check language settings
  • Consider image enhancement preprocessing

Incorrect Text Recognition

  • Verify correct language setting
  • Check for image skew or rotation
  • Consider manual image correction

Performance Issues

  • Reduce image size while maintaining quality
  • Process pages individually for multi-page documents
  • Use appropriate image formats

Integration Examples

With Entity Extractor

// OCR output feeds into Entity Extractor
{
    payload: {
        text: "Invoice from ABC Corp dated January 15, 2024 for $1,500.00",
        confidence: 0.95
    }
}

With Template Matcher

// OCR provides text for template matching
{
    payload: {
        text: "Complete document text...",
        structure: "invoice" // detected document type
    }
}

With Document Classifier

// OCR text used for document classification
{
    payload: {
        text: "Extracted text content",
        metadata: {
            pages: 1,
            words: 245,
            confidence: 0.89
        }
    }
}

Performance Considerations

  • Image Size: Larger images take longer to process
  • Resolution: Higher resolution improves accuracy but increases processing time
  • Batch Processing: Process multiple images sequentially for better resource utilization
  • Memory Usage: Large images consume more memory during processing

Tips

  • Test with sample images to determine optimal settings
  • Use debug blocks to inspect OCR output structure
  • Consider combining with image preprocessing blocks for better results
  • Monitor confidence scores to ensure quality
  • Use structured output when you need positioning information

Enhance OCR results by combining with Image Processor for preprocessing and Entity Extractor for data extraction.