Table Structure Recognition

Detects and extracts table structures from document images. It identifies table cells, rows, columns, and can output the table data in HTML or JSON format.

Quick Start

To get started:

  • Select a trained model from the Model to use dropdown
  • Choose Output Type: HTML or JSON
  • Configure header settings (Has Header and Header Type)
  • Send table image path via msg.payload.image_path
  • Provide detected table boxes via msg.payload.tables
  • Provide OCR words via msg.payload.ocr
  • Receive extracted table structure in msg.payload

Configuration

Table Structure Recognition configuration with HTML output format Table Structure Recognition configuration with JSON output format

Model to use (required)

Select a pre-trained model from the dropdown menu. Models must be trained beforehand.

Output Type (required)

Choose between HTML or JSON output format.

  • HTML: Returns table as HTML markup
  • JSON: Returns structured JSON with cell data and coordinates

Has Header (required)

Specify if the table has header rows/columns. Options: True, False

When Has Header is True and Output Type is JSON, you must provide msg.payload.header_detail (and msg.payload.keyname_detail for dual-header layouts).

Header Type (required)

Specify header orientation. Options: horizontal_header, vertical_header.

Common Input Format

msg.payload.image_path (string)

Relative path of the table image on shared storage.

Example: "documents/table_image.png"

msg.payload.tables (array)

List of table bounding boxes in the image.

Format: [[x1, y1, x2, y2], ...]

msg.payload.ocr (array)

OCR text with bounding boxes.

Format: [[[x1, y1, x2, y2], "text"], ...]

msg.payload.header_detail (object, required for JSON with headers)

Header mapping used when output_type is json and has_header is True.

msg.payload.keyname_detail (object, required for some header layouts)

Key name mapping used for dual-header layouts.

Common Output Format

msg.payload contains an output field with the extracted table data.

HTML Output:

{"output": "<table><tr><th>Header</th></tr><tr><td>Data</td></tr></table>"}

JSON Output:

{"output": {"Header A": ["Cell 1", "Cell 2"], "Header B": ["Cell 3"]}}

Example

Input (msg.payload)

{
  "image_path": "documents/table_image.png",
  "tables": [[10, 20, 620, 380]],
  "ocr": [[[15, 25, 60, 45], "Header"], [[20, 60, 80, 90], "Value"]]
}

Output (msg.payload) - JSON

{
  "output": {
    "Header": ["Value"]
  }
}

Errors

When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.

Common mistakes

  • Model not found: The selected model doesn't exist. Ensure the model is available.
  • Invalid image path: The table image path is incorrect or file doesn't exist.
  • No table detected: Image doesn't contain a recognizable table structure.
  • Invalid output type: Output type must be either "html" or "json".
  • Service unavailable: The service is unavailable or unreachable.

Best Practices

  • Use clear, well-scanned table images for better extraction accuracy
  • Ensure table borders are visible if possible
  • Use JSON output when you need cell coordinates for further processing
  • Use HTML output for direct display or simple data extraction
  • Set Has Header correctly to improve extraction accuracy
  • Use auto_detect for Header Type when header orientation is uncertain
  • Test with sample tables to verify extraction quality
  • Preprocess images (deskew, enhance contrast) for better results