OCR
Extract text from images and documents using Optical Character Recognition
OCR Block
The OCR (Optical Character Recognition) block extracts text from images and documents. It's one of the most commonly used blocks for document processing workflows, converting visual text into machine-readable format.
Overview
The OCR block uses advanced computer vision models to recognize and extract text from various image formats including scanned documents, photographs, and digital images. It supports multiple languages and can handle different document layouts and text orientations.
Configuration Options
Input Source
- Image Source: Select from message payload, file upload, or specific message property
- Supported Formats: PNG, JPG, JPEG, TIFF, PDF (first page), BMP
OCR Settings
- Language: Select target language for text recognition
- English (default)
- OCR Engine: Choose the recognition engine
- Confidence Threshold: Minimum confidence level for text recognition (0.0 - 1.0)
Output Options
- Text Only: Extract plain text without positioning
- Structured Output: Include bounding boxes and confidence scores
- Word Level: Extract individual words with coordinates
- Line Level: Extract text lines with positioning
Processing Options
- Image Preprocessing: Enhance image quality before OCR
- Deskew: Automatically correct image rotation
- Noise Reduction: Remove image noise for better recognition
- Resolution Enhancement: Upscale low-resolution images
Input Message Format
The OCR block expects an image in the message payload:
{
payload: /* image buffer or base64 string */,
filename: "document.png", // optional
mimetype: "image/png" // optional
}Output Message Format
Text Only Output
{
payload: {
text: "Extracted text content from the image...",
confidence: 0.92,
language: "en"
}
}Structured Output
{
payload: {
text: "Complete extracted text",
words: [
{
text: "Hello",
confidence: 0.98,
bbox: [10, 20, 45, 35]
},
{
text: "World",
confidence: 0.95,
bbox: [50, 20, 85, 35]
}
],
lines: [
{
text: "Hello World",
confidence: 0.96,
bbox: [10, 20, 85, 35]
}
],
confidence: 0.93
}
}Bounding Box Format
Bounding boxes are provided as [x1, y1, x2, y2] coordinates:
x1, y1: Top-left cornerx2, y2: Bottom-right corner
Common Use Cases
Document Digitization
Convert scanned documents to editable text:
File Upload → OCR → Text Processor → Save DocumentInvoice Processing
Extract text from invoice images:
http in → OCR → Entity Extractor → Template Matcher → http responseForm Processing
Extract data from filled forms:
Image Input → OCR → Field Extractor → Validation → Database SaveMulti-language Documents
Process documents in different languages:
Document → Language Detection → OCR (with language setting) → TranslationBest Practices
- Image Quality: Ensure high-quality input images for better accuracy
- Preprocessing: Use image enhancement for poor-quality scans
- Language Setting: Set the correct language for better recognition
- Confidence Filtering: Filter out low-confidence results
- Post-processing: Clean extracted text for better downstream processing
Image Quality Tips
Optimal Input Images
- Resolution: At least 300 DPI for scanned documents
- Format: PNG or TIFF for best quality
- Contrast: High contrast between text and background
- Orientation: Properly oriented (not rotated)
Problematic Images
- Very low resolution (< 150 DPI)
- Blurry or out-of-focus images
- Heavily skewed or rotated text
- Poor lighting or shadows
- Complex backgrounds
Common Flow Patterns
Basic OCR Pipeline
Image Input → OCR → Text Cleaning → OutputDocument Processing Workflow
PDF Input → Page Extraction → OCR → Entity Extraction → Data ValidationMulti-page Document Processing
PDF Input → Split Pages → Array Loop → OCR → Combine ResultsOCR with Quality Check
Image → Image Quality Check → OCR → Confidence Filter → Text Output
↓
Low Quality → Image Enhancement → OCRError Handling
Common issues and solutions:
No Text Detected
- Check image quality and resolution
- Verify the image contains readable text
- Try image preprocessing options
Low Confidence Scores
- Improve image quality
- Check language settings
- Consider image enhancement preprocessing
Incorrect Text Recognition
- Verify correct language setting
- Check for image skew or rotation
- Consider manual image correction
Performance Issues
- Reduce image size while maintaining quality
- Process pages individually for multi-page documents
- Use appropriate image formats
Integration Examples
With Entity Extractor
// OCR output feeds into Entity Extractor
{
payload: {
text: "Invoice from ABC Corp dated January 15, 2024 for $1,500.00",
confidence: 0.95
}
}With Template Matcher
// OCR provides text for template matching
{
payload: {
text: "Complete document text...",
structure: "invoice" // detected document type
}
}With Document Classifier
// OCR text used for document classification
{
payload: {
text: "Extracted text content",
metadata: {
pages: 1,
words: 245,
confidence: 0.89
}
}
}Performance Considerations
- Image Size: Larger images take longer to process
- Resolution: Higher resolution improves accuracy but increases processing time
- Batch Processing: Process multiple images sequentially for better resource utilization
- Memory Usage: Large images consume more memory during processing
Tips
- Test with sample images to determine optimal settings
- Use debug blocks to inspect OCR output structure
- Consider combining with image preprocessing blocks for better results
- Monitor confidence scores to ensure quality
- Use structured output when you need positioning information
Enhance OCR results by combining with Image Processor for preprocessing and Entity Extractor for data extraction.