RAP Logo
Blocks ReferenceComputer vision

Table Structure Recognition

Recognize and extract table structures from documents using advanced computer vision techniques.

Table Structure Recognition Block

The Table Structure Recognition block is designed to identify, analyze, and extract table structures from documents and images. It uses advanced computer vision and machine learning techniques to detect table boundaries, rows, columns, and cell relationships.

Overview

The Table Structure Recognition block provides comprehensive table analysis capabilities, including table detection, structure recognition, and data extraction. It's essential for processing tabular data in documents, forms, and images.

Configuration Options

Recognition Mode

Choose the type of table processing:

  • Table Detection: Locate table regions in documents
  • Structure Recognition: Identify rows, columns, and cell boundaries
  • Data Extraction: Extract text content from table cells
  • Complete Analysis: Full table structure and content analysis

Input Processing

  • Document Type: Specify the input document type (PDF, image, etc.)
  • Page Range: Select specific pages for processing
  • Table Region: Define specific areas to analyze (optional)

Output Format

Choose the output format for extracted table data:

  • JSON: Structured JSON with table metadata
  • CSV: Comma-separated values format
  • HTML: HTML table format
  • Excel: Excel-compatible format

Use Cases

Document Processing

Extract tables from business documents:

PDF input → Table Structure Recognition → CSV output → Data processing

Form Analysis

Process structured forms and applications:

Form image → Table Structure Recognition → JSON structure → Validation

Data Migration

Convert tabular data between formats:

Legacy document → Table Structure Recognition → Modern format → Database import

Financial Document Processing

Extract financial data from reports:

Financial report → Table Structure Recognition → Structured data → Analysis

Common Patterns

Basic Table Extraction

// Configuration
// Recognition Mode: Complete Analysis
// Output Format: JSON

// Example flow:
// document input → Table Structure Recognition → debug (table structure)

Multi-format Output

// Configuration
// Recognition Mode: Data Extraction
// Output Format: CSV

// Example flow:
// image input → Table Structure Recognition → CSV output → file storage

Batch Processing

// Configuration
// Recognition Mode: Table Detection
// Page Range: All pages

// Example flow:
// document batch → Table Structure Recognition → individual table files

Advanced Features

Table Quality Assessment

The block can assess table quality and structure:

// Output includes quality metrics
{
  "table_quality": {
    "structure_confidence": 0.95,
    "cell_detection_accuracy": 0.92,
    "text_extraction_quality": 0.88
  }
}

Multi-table Handling

Process documents with multiple tables:

// Configuration
// Recognition Mode: Complete Analysis
// Multi-table: Enabled

// Output includes array of tables
{
  "tables": [
    {
      "table_id": 1,
      "structure": {...},
      "data": [...]
    },
    {
      "table_id": 2,
      "structure": {...},
      "data": [...]
    }
  ]
}

Custom Table Templates

Define custom table structures for specific document types:

// Configuration
// Template Mode: Custom
// Template: Financial Report Template

// Optimized recognition for specific table types

Output Structure

Table Structure JSON

{
  "table_id": "table_001",
  "page_number": 1,
  "bounding_box": {
    "x": 100,
    "y": 200,
    "width": 800,
    "height": 400
  },
  "structure": {
    "rows": 5,
    "columns": 4,
    "cells": [
      {
        "row": 0,
        "column": 0,
        "text": "Header 1",
        "confidence": 0.95
      }
    ]
  },
  "data": [
    ["Header 1", "Header 2", "Header 3", "Header 4"],
    ["Row 1 Col 1", "Row 1 Col 2", "Row 1 Col 3", "Row 1 Col 4"]
  ],
  "metadata": {
    "processing_time": 1.2,
    "quality_score": 0.92
  }
}

Tips

  • High-quality input: Ensure clear, high-resolution images for better recognition
  • Consistent formatting: Tables with clear borders and consistent spacing work best
  • Batch processing: Use batch mode for processing multiple documents efficiently
  • Quality validation: Check confidence scores in the output to assess recognition quality
  • Post-processing: Combine with other blocks for data validation and formatting
  • OCR - For text extraction from table cells
  • PDF Processor - For PDF document preprocessing
  • Template Matcher - For template-based table recognition
  • csv - For CSV output formatting