Table Structure Recognition
Recognize and extract table structures from documents using advanced computer vision techniques.
Table Structure Recognition Block
The Table Structure Recognition block is designed to identify, analyze, and extract table structures from documents and images. It uses advanced computer vision and machine learning techniques to detect table boundaries, rows, columns, and cell relationships.
Overview
The Table Structure Recognition block provides comprehensive table analysis capabilities, including table detection, structure recognition, and data extraction. It's essential for processing tabular data in documents, forms, and images.
Configuration Options
Recognition Mode
Choose the type of table processing:
- Table Detection: Locate table regions in documents
- Structure Recognition: Identify rows, columns, and cell boundaries
- Data Extraction: Extract text content from table cells
- Complete Analysis: Full table structure and content analysis
Input Processing
- Document Type: Specify the input document type (PDF, image, etc.)
- Page Range: Select specific pages for processing
- Table Region: Define specific areas to analyze (optional)
Output Format
Choose the output format for extracted table data:
- JSON: Structured JSON with table metadata
- CSV: Comma-separated values format
- HTML: HTML table format
- Excel: Excel-compatible format
Use Cases
Document Processing
Extract tables from business documents:
PDF input → Table Structure Recognition → CSV output → Data processingForm Analysis
Process structured forms and applications:
Form image → Table Structure Recognition → JSON structure → ValidationData Migration
Convert tabular data between formats:
Legacy document → Table Structure Recognition → Modern format → Database importFinancial Document Processing
Extract financial data from reports:
Financial report → Table Structure Recognition → Structured data → AnalysisCommon Patterns
Basic Table Extraction
// Configuration
// Recognition Mode: Complete Analysis
// Output Format: JSON
// Example flow:
// document input → Table Structure Recognition → debug (table structure)Multi-format Output
// Configuration
// Recognition Mode: Data Extraction
// Output Format: CSV
// Example flow:
// image input → Table Structure Recognition → CSV output → file storageBatch Processing
// Configuration
// Recognition Mode: Table Detection
// Page Range: All pages
// Example flow:
// document batch → Table Structure Recognition → individual table filesAdvanced Features
Table Quality Assessment
The block can assess table quality and structure:
// Output includes quality metrics
{
"table_quality": {
"structure_confidence": 0.95,
"cell_detection_accuracy": 0.92,
"text_extraction_quality": 0.88
}
}Multi-table Handling
Process documents with multiple tables:
// Configuration
// Recognition Mode: Complete Analysis
// Multi-table: Enabled
// Output includes array of tables
{
"tables": [
{
"table_id": 1,
"structure": {...},
"data": [...]
},
{
"table_id": 2,
"structure": {...},
"data": [...]
}
]
}Custom Table Templates
Define custom table structures for specific document types:
// Configuration
// Template Mode: Custom
// Template: Financial Report Template
// Optimized recognition for specific table typesOutput Structure
Table Structure JSON
{
"table_id": "table_001",
"page_number": 1,
"bounding_box": {
"x": 100,
"y": 200,
"width": 800,
"height": 400
},
"structure": {
"rows": 5,
"columns": 4,
"cells": [
{
"row": 0,
"column": 0,
"text": "Header 1",
"confidence": 0.95
}
]
},
"data": [
["Header 1", "Header 2", "Header 3", "Header 4"],
["Row 1 Col 1", "Row 1 Col 2", "Row 1 Col 3", "Row 1 Col 4"]
],
"metadata": {
"processing_time": 1.2,
"quality_score": 0.92
}
}Tips
- High-quality input: Ensure clear, high-resolution images for better recognition
- Consistent formatting: Tables with clear borders and consistent spacing work best
- Batch processing: Use batch mode for processing multiple documents efficiently
- Quality validation: Check confidence scores in the output to assess recognition quality
- Post-processing: Combine with other blocks for data validation and formatting
Related Blocks
- OCR - For text extraction from table cells
- PDF Processor - For PDF document preprocessing
- Template Matcher - For template-based table recognition
- csv - For CSV output formatting