Document Understander

Extracts specific fields and their values from document images. It identifies and labels entities like invoice numbers, amounts, dates, and other structured fields based on the trained model.

Quick Start

To get started:

Choose a trained model from the Model to use dropdown
Select Output Type: Collated Fields (structured) or Uncollated Fields (raw labels)
Send document image via msg.payload.image_path
Optionally provide OCR data via msg.payload.words_and_bboxes
Receive extracted fields in msg.payload

Configuration

Model to use (required)

Select a pre-trained model from the dropdown menu. Models must be trained beforehand using the sequence labeler trainer block.

Output Type (required)

Choose how extracted fields should be formatted:

Collated Fields: Returns structured field-value pairs with coordinates and confidence scores
Uncollated Fields: Returns raw word-level labels with individual bounding boxes

Common Input Format (All Algorithms)

msg.payload.image_path (string)

Relative path of the document image file on shared storage.

Example: "documents/invoice.png"

Supported formats: .png, .jpg, .jpeg (case insensitive)

msg.payload.words_and_bboxes (array, optional)

Optional OCR data containing words and bounding boxes. Can improve extraction accuracy.

Format: [[[x1, y1, x2, y2], "word"], ...]

Example: [[[100, 100, 300, 150], "Invoice"], [[100, 200, 200, 250], "Total"]]

Output by Output Type

Collated Fields

msg.payload contains an output array of field objects:

{
  "output": [
    {
      "field_name": "invoice_number",
      "co_ordinate": [470, 100, 180, 50],
      "field_value": "INV-001",
      "confidence_score": null
    },
    {
      "field_name": "total_amount",
      "co_ordinate": [220, 200, 130, 50],
      "field_value": "$500.00",
      "confidence_score": null
    }
  ]
}

Note: co_ordinate is in [x, y, width, height] format. confidence_score is currently not computed.

Uncollated Fields

msg.payload contains word-level data:

{
  "words": ["Invoice", "Number:", "INV-001", "Total:", "$500.00"],
  "normalized_bboxes": [[...], [...], ...],
  "unormalized_bboxes": [
    [100, 100, 300, 150],
    [320, 100, 450, 150],
    [470, 100, 650, 150],
    [100, 200, 200, 250],
    [220, 200, 350, 250]
  ],
  "labels": ["other", "other", "invoice_number", "other", "total_amount"]
}

Example

Input (msg.payload)

{
  "image_path": "documents/invoice.png",
  "words_and_bboxes": [
    [[100, 100, 300, 150], "Invoice"],
    [[470, 100, 650, 150], "INV-001"]
  ]
}

Output (msg.payload)

{
  "output": [
    {
      "field_name": "invoice_number",
      "co_ordinate": [470, 100, 180, 50],
      "field_value": "INV-001",
      "confidence_score": null
    }
  ]
}

Errors

When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.

Common mistakes

Missing image path: msg.payload.image_path is required and must point to a file on shared storage.
Wrong OCR structure: If provided, msg.payload.words_and_bboxes must follow [[[x1, y1, x2, y2], "word"], ...].
Wrong output type: Use Collated Fields for structured field-value pairs, and Uncollated Fields for raw labels.

Best Practices

Use clear, well-scanned document images for better field extraction accuracy
Provide OCR data (words_and_bboxes) whenever available to improve extraction results
Use Collated Fields for most use cases as it provides structured, ready-to-use field-value pairs
Use Uncollated Fields when you need raw word-level labels or want to implement custom collation logic
Ensure your training data includes diverse examples of each field type you want to extract
Regularly retrain models as document layouts or field types evolve
Always validate extracted fields in production applications, especially for critical data

Document Understander

On this page