Answer questions about documents (images) using various AI algorithms and document understanding techniques.

Document Question Answering Block

The Document Question Answering block is designed for answering questions about documents (images) using various algorithms. It combines document understanding, OCR, and natural language processing to provide accurate answers to questions about document content.

Overview

The Document Question Answering block enables interactive querying of document content. It can understand both visual and textual elements of documents to answer questions about their content, structure, and information.

Configuration Options

Algorithm Selection

Select an operation from the Choose Algorithm dropdown:

Visual Question Answering: Answer questions based on visual content
Text-based Q&A: Answer questions using extracted text content
Hybrid Q&A: Combine visual and textual understanding
Structured Data Q&A: Answer questions about tables and structured content
Multi-modal Q&A: Advanced multi-modal document understanding

Input Configuration

Document Input

Property: msg.payload.document_path
Type: string
Description: Path to the document image file
Supported formats: .png, .jpg, .jpeg, .pdf, .tiff

Question Input

Property: msg.payload.question
Type: string
Description: The question to ask about the document
Examples:
- "What is the total amount on this invoice?"
- "Who is the recipient of this letter?"
- "What date is shown on this document?"

Context (Optional)

Property: msg.payload.context
Type: string
Description: Additional context to help with answering
Example: "This is a financial document from 2023"

Processing Options

Answer Confidence

Type: number
Range: 0.0 to 1.0
Default: 0.7
Description: Minimum confidence threshold for answers

Answer Format

Type: string
Options: ["short", "detailed", "structured"]
Default: "short"
Description: Format of the answer output

Include Evidence

Type: boolean
Default: true
Description: Include supporting evidence with answers

Use Cases

Invoice Processing

Answer questions about invoice details:

Invoice image → Document Q&A → "What is the total amount?" → Answer: "$1,250.00"

Contract Analysis

Extract specific information from contracts:

Contract document → Document Q&A → "What is the contract end date?" → Answer: "December 31, 2024"

Report Analysis

Query financial or business reports:

Report document → Document Q&A → "What was the revenue for Q3?" → Answer: "$2.5M"

Form Processing

Extract information from forms:

Form image → Document Q&A → "What is the applicant's name?" → Answer: "John Smith"

Common Patterns

Basic Question Answering

// Configuration
// Algorithm: Text-based Q&A
// Answer Format: short

// Input message:
{
  "payload": {
    "document_path": "documents/invoice_001.pdf",
    "question": "What is the invoice number?"
  }
}

// Example flow:
// inject → Document Q&A → debug (answer)

Visual Question Answering

// Configuration
// Algorithm: Visual Question Answering
// Include Evidence: true

// Input message:
{
  "payload": {
    "document_path": "scans/contract.png",
    "question": "Is there a signature on this document?",
    "context": "This is a legal contract"
  }
}

// Example flow:
// inject → Document Q&A → debug (visual answer)

Multi-question Processing

// Configuration
// Algorithm: Hybrid Q&A
// Answer Format: structured

// Process multiple questions about the same document
// Example flow:
// document → Document Q&A (question 1) → Document Q&A (question 2) → combined results

Advanced Features

Structured Answer Format

When using structured answer format:

{
  "question": "What is the total amount on this invoice?",
  "answer": "$1,250.00",
  "confidence": 0.95,
  "evidence": {
    "text_evidence": "Total Amount: $1,250.00",
    "location": {
      "page": 1,
      "coordinates": [100, 200, 300, 250]
    }
  },
  "supporting_facts": ["Subtotal: $1,000.00", "Tax: $250.00"]
}

Combines visual and textual analysis:

{
  "question": "What type of document is this?",
  "answer": "This is an invoice from ABC Company",
  "confidence": 0.92,
  "analysis": {
    "visual_features": {
      "document_type": "invoice",
      "company_logo": "ABC Company",
      "layout_type": "standard_invoice"
    },
    "textual_features": {
      "keywords": ["invoice", "payment", "due date"],
      "entities": ["ABC Company", "$1,250.00", "2024-01-15"]
    }
  }
}

Evidence-based Answers

Include supporting evidence:

{
  "question": "Who is the recipient?",
  "answer": "John Smith",
  "confidence": 0.88,
  "evidence": {
    "source_text": "Bill To: John Smith",
    "confidence": 0.88,
    "location": "top right section"
  },
  "alternative_answers": [
    {
      "answer": "J. Smith",
      "confidence": 0.75,
      "evidence": "Signature line"
    }
  ]
}

Output Structure

Basic Answer

{
  "document_path": "documents/invoice_001.pdf",
  "question": "What is the total amount?",
  "answer": "$1,250.00",
  "confidence": 0.95,
  "algorithm_used": "Text-based Q&A",
  "processing_time": 2.1,
  "timestamp": "2024-01-15T10:30:00Z"
}

Detailed Answer with Evidence

{
  "document_path": "documents/contract.pdf",
  "question": "What is the contract duration?",
  "answer": "12 months starting from January 1, 2024",
  "confidence": 0.92,
  "evidence": {
    "text_evidence": "Contract Term: 12 months commencing January 1, 2024",
    "location": {
      "page": 2,
      "section": "Terms and Conditions",
      "coordinates": [150, 300, 400, 350]
    }
  },
  "supporting_information": [
    "Start Date: January 1, 2024",
    "End Date: December 31, 2024",
    "Duration: 12 months"
  ],
  "algorithm_used": "Hybrid Q&A"
}

Multi-question Results

{
  "document_path": "documents/report.pdf",
  "questions_answered": [
    {
      "question": "What is the revenue?",
      "answer": "$2.5M",
      "confidence": 0.94
    },
    {
      "question": "What is the profit margin?",
      "answer": "15%",
      "confidence": 0.87
    }
  ],
  "overall_confidence": 0.91,
  "processing_time": 4.2
}