Table Structure Recognition

Detects and extracts table structures from document images. It identifies table cells, rows, columns, and can output the table data in HTML or JSON format.

Quick Start

To get started:

Select a trained model from the Model to use dropdown
Choose Output Type: HTML or JSON
Configure header settings (Has Header and Header Type)
Send table image path via msg.payload.image_path
Provide detected table boxes via msg.payload.tables
Provide OCR words via msg.payload.ocr
Receive extracted table structure in msg.payload

Configuration

Model to use (required)

Select a pre-trained model from the dropdown menu. Models must be trained beforehand.

Output Type (required)

Choose between HTML or JSON output format.

HTML: Returns table as HTML markup
JSON: Returns structured JSON with cell data and coordinates

Has Header (required)

Specify if the table has header rows/columns. Options: True, False

When Has Header is True and Output Type is JSON, you must provide msg.payload.header_detail (and msg.payload.keyname_detail for dual-header layouts).

Header Type (required)

Specify header orientation. Options: horizontal_header, vertical_header.

Common Input Format

msg.payload.image_path (string)

Relative path of the table image on shared storage.

Example: "documents/table_image.png"

msg.payload.tables (array)

List of table bounding boxes in the image.

Format: [[x1, y1, x2, y2], ...]

msg.payload.ocr (array)

OCR text with bounding boxes.

Format: [[[x1, y1, x2, y2], "text"], ...]

msg.payload.header_detail (object, required for JSON with headers)

Header mapping used when output_type is json and has_header is True.

msg.payload.keyname_detail (object, required for some header layouts)

Key name mapping used for dual-header layouts.

Common Output Format

msg.payload contains an output field with the extracted table data.

HTML Output:

{"output": "<table><tr><th>Header</th></tr><tr><td>Data</td></tr></table>"}

JSON Output:

{"output": {"Header A": ["Cell 1", "Cell 2"], "Header B": ["Cell 3"]}}

Example

Input (msg.payload)

{
  "image_path": "documents/table_image.png",
  "tables": [[10, 20, 620, 380]],
  "ocr": [[[15, 25, 60, 45], "Header"], [[20, 60, 80, 90], "Value"]]
}

Output (msg.payload) - JSON

{
  "output": {
    "Header": ["Value"]
  }
}

Errors

When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.

Common mistakes

Model not found: The selected model doesn't exist. Ensure the model is available.
Invalid image path: The table image path is incorrect or file doesn't exist.
No table detected: Image doesn't contain a recognizable table structure.
Invalid output type: Output type must be either "html" or "json".
Service unavailable: The service is unavailable or unreachable.

Best Practices

Use clear, well-scanned table images for better extraction accuracy
Ensure table borders are visible if possible
Use JSON output when you need cell coordinates for further processing
Use HTML output for direct display or simple data extraction
Set Has Header correctly to improve extraction accuracy
Use auto_detect for Header Type when header orientation is uncertain
Test with sample tables to verify extraction quality
Preprocess images (deskew, enhance contrast) for better results

Table Structure Recognition

On this page