Recognize and extract table structures from documents using advanced computer vision techniques.

Table Structure Recognition Block

The Table Structure Recognition block is designed to identify, analyze, and extract table structures from documents and images. It uses advanced computer vision and machine learning techniques to detect table boundaries, rows, columns, and cell relationships.

Overview

The Table Structure Recognition block provides comprehensive table analysis capabilities, including table detection, structure recognition, and data extraction. It's essential for processing tabular data in documents, forms, and images.

Configuration Options

Recognition Mode

Choose the type of table processing:

Table Detection: Locate table regions in documents
Structure Recognition: Identify rows, columns, and cell boundaries
Data Extraction: Extract text content from table cells
Complete Analysis: Full table structure and content analysis

Input Processing

Document Type: Specify the input document type (PDF, image, etc.)
Page Range: Select specific pages for processing
Table Region: Define specific areas to analyze (optional)

Output Format

Choose the output format for extracted table data:

JSON: Structured JSON with table metadata
CSV: Comma-separated values format
HTML: HTML table format
Excel: Excel-compatible format

Use Cases

Document Processing

Extract tables from business documents:

PDF input → Table Structure Recognition → CSV output → Data processing

Form Analysis

Process structured forms and applications:

Form image → Table Structure Recognition → JSON structure → Validation

Data Migration

Convert tabular data between formats:

Legacy document → Table Structure Recognition → Modern format → Database import

Financial Document Processing

Extract financial data from reports:

Financial report → Table Structure Recognition → Structured data → Analysis

Common Patterns

Basic Table Extraction

// Configuration
// Recognition Mode: Complete Analysis
// Output Format: JSON

// Example flow:
// document input → Table Structure Recognition → debug (table structure)

Multi-format Output

// Configuration
// Recognition Mode: Data Extraction
// Output Format: CSV

// Example flow:
// image input → Table Structure Recognition → CSV output → file storage

Batch Processing

// Configuration
// Recognition Mode: Table Detection
// Page Range: All pages

// Example flow:
// document batch → Table Structure Recognition → individual table files

Advanced Features

Table Quality Assessment

The block can assess table quality and structure:

// Output includes quality metrics
{
  "table_quality": {
    "structure_confidence": 0.95,
    "cell_detection_accuracy": 0.92,
    "text_extraction_quality": 0.88
  }
}

Multi-table Handling

Process documents with multiple tables:

// Configuration
// Recognition Mode: Complete Analysis
// Multi-table: Enabled

// Output includes array of tables
{
  "tables": [
    {
      "table_id": 1,
      "structure": {...},
      "data": [...]
    },
    {
      "table_id": 2,
      "structure": {...},
      "data": [...]
    }
  ]
}

Custom Table Templates

Define custom table structures for specific document types:

// Configuration
// Template Mode: Custom
// Template: Financial Report Template

// Optimized recognition for specific table types

Output Structure

Table Structure JSON

{
  "table_id": "table_001",
  "page_number": 1,
  "bounding_box": {
    "x": 100,
    "y": 200,
    "width": 800,
    "height": 400
  },
  "structure": {
    "rows": 5,
    "columns": 4,
    "cells": [
      {
        "row": 0,
        "column": 0,
        "text": "Header 1",
        "confidence": 0.95
      }
    ]
  },
  "data": [
    ["Header 1", "Header 2", "Header 3", "Header 4"],
    ["Row 1 Col 1", "Row 1 Col 2", "Row 1 Col 3", "Row 1 Col 4"]
  ],
  "metadata": {
    "processing_time": 1.2,
    "quality_score": 0.92
  }
}

Tips

High-quality input: Ensure clear, high-resolution images for better recognition
Consistent formatting: Tables with clear borders and consistent spacing work best
Batch processing: Use batch mode for processing multiple documents efficiently
Quality validation: Check confidence scores in the output to assess recognition quality
Post-processing: Combine with other blocks for data validation and formatting

OCR - For text extraction from table cells
PDF Processor - For PDF document preprocessing
Template Matcher - For template-based table recognition
csv - For CSV output formatting

Table Structure Recognition

On this page