RAP Logo
Blocks ReferenceMulti modal

Document Question Answering

Answer questions about documents (images) using various AI algorithms and document understanding techniques.

Document Question Answering Block

The Document Question Answering block is designed for answering questions about documents (images) using various algorithms. It combines document understanding, OCR, and natural language processing to provide accurate answers to questions about document content.

Overview

The Document Question Answering block enables interactive querying of document content. It can understand both visual and textual elements of documents to answer questions about their content, structure, and information.

Configuration Options

Algorithm Selection

Select an operation from the Choose Algorithm dropdown:

  • Visual Question Answering: Answer questions based on visual content
  • Text-based Q&A: Answer questions using extracted text content
  • Hybrid Q&A: Combine visual and textual understanding
  • Structured Data Q&A: Answer questions about tables and structured content
  • Multi-modal Q&A: Advanced multi-modal document understanding

Input Configuration

Document Input

  • Property: msg.payload.document_path
  • Type: string
  • Description: Path to the document image file
  • Supported formats: .png, .jpg, .jpeg, .pdf, .tiff

Question Input

  • Property: msg.payload.question
  • Type: string
  • Description: The question to ask about the document
  • Examples:
    • "What is the total amount on this invoice?"
    • "Who is the recipient of this letter?"
    • "What date is shown on this document?"

Context (Optional)

  • Property: msg.payload.context
  • Type: string
  • Description: Additional context to help with answering
  • Example: "This is a financial document from 2023"

Processing Options

Answer Confidence

  • Type: number
  • Range: 0.0 to 1.0
  • Default: 0.7
  • Description: Minimum confidence threshold for answers

Answer Format

  • Type: string
  • Options: ["short", "detailed", "structured"]
  • Default: "short"
  • Description: Format of the answer output

Include Evidence

  • Type: boolean
  • Default: true
  • Description: Include supporting evidence with answers

Use Cases

Invoice Processing

Answer questions about invoice details:

Invoice image → Document Q&A → "What is the total amount?" → Answer: "$1,250.00"

Contract Analysis

Extract specific information from contracts:

Contract document → Document Q&A → "What is the contract end date?" → Answer: "December 31, 2024"

Report Analysis

Query financial or business reports:

Report document → Document Q&A → "What was the revenue for Q3?" → Answer: "$2.5M"

Form Processing

Extract information from forms:

Form image → Document Q&A → "What is the applicant's name?" → Answer: "John Smith"

Common Patterns

Basic Question Answering

// Configuration
// Algorithm: Text-based Q&A
// Answer Format: short

// Input message:
{
  "payload": {
    "document_path": "documents/invoice_001.pdf",
    "question": "What is the invoice number?"
  }
}

// Example flow:
// inject → Document Q&A → debug (answer)

Visual Question Answering

// Configuration
// Algorithm: Visual Question Answering
// Include Evidence: true

// Input message:
{
  "payload": {
    "document_path": "scans/contract.png",
    "question": "Is there a signature on this document?",
    "context": "This is a legal contract"
  }
}

// Example flow:
// inject → Document Q&A → debug (visual answer)

Multi-question Processing

// Configuration
// Algorithm: Hybrid Q&A
// Answer Format: structured

// Process multiple questions about the same document
// Example flow:
// document → Document Q&A (question 1) → Document Q&A (question 2) → combined results

Advanced Features

Structured Answer Format

When using structured answer format:

{
  "question": "What is the total amount on this invoice?",
  "answer": "$1,250.00",
  "confidence": 0.95,
  "evidence": {
    "text_evidence": "Total Amount: $1,250.00",
    "location": {
      "page": 1,
      "coordinates": [100, 200, 300, 250]
    }
  },
  "supporting_facts": ["Subtotal: $1,000.00", "Tax: $250.00"]
}

Multi-modal Understanding

Combines visual and textual analysis:

{
  "question": "What type of document is this?",
  "answer": "This is an invoice from ABC Company",
  "confidence": 0.92,
  "analysis": {
    "visual_features": {
      "document_type": "invoice",
      "company_logo": "ABC Company",
      "layout_type": "standard_invoice"
    },
    "textual_features": {
      "keywords": ["invoice", "payment", "due date"],
      "entities": ["ABC Company", "$1,250.00", "2024-01-15"]
    }
  }
}

Evidence-based Answers

Include supporting evidence:

{
  "question": "Who is the recipient?",
  "answer": "John Smith",
  "confidence": 0.88,
  "evidence": {
    "source_text": "Bill To: John Smith",
    "confidence": 0.88,
    "location": "top right section"
  },
  "alternative_answers": [
    {
      "answer": "J. Smith",
      "confidence": 0.75,
      "evidence": "Signature line"
    }
  ]
}

Output Structure

Basic Answer

{
  "document_path": "documents/invoice_001.pdf",
  "question": "What is the total amount?",
  "answer": "$1,250.00",
  "confidence": 0.95,
  "algorithm_used": "Text-based Q&A",
  "processing_time": 2.1,
  "timestamp": "2024-01-15T10:30:00Z"
}

Detailed Answer with Evidence

{
  "document_path": "documents/contract.pdf",
  "question": "What is the contract duration?",
  "answer": "12 months starting from January 1, 2024",
  "confidence": 0.92,
  "evidence": {
    "text_evidence": "Contract Term: 12 months commencing January 1, 2024",
    "location": {
      "page": 2,
      "section": "Terms and Conditions",
      "coordinates": [150, 300, 400, 350]
    }
  },
  "supporting_information": [
    "Start Date: January 1, 2024",
    "End Date: December 31, 2024",
    "Duration: 12 months"
  ],
  "algorithm_used": "Hybrid Q&A"
}

Multi-question Results

{
  "document_path": "documents/report.pdf",
  "questions_answered": [
    {
      "question": "What is the revenue?",
      "answer": "$2.5M",
      "confidence": 0.94
    },
    {
      "question": "What is the profit margin?",
      "answer": "15%",
      "confidence": 0.87
    }
  ],
  "overall_confidence": 0.91,
  "processing_time": 4.2
}

Algorithm Details

Visual Question Answering

  • Best for: Questions about visual elements, layouts, charts
  • Features: Computer vision and image understanding
  • Use cases: "Is there a signature?", "What type of chart is shown?"

Text-based Q&A

  • Best for: Questions about textual content
  • Features: OCR and natural language processing
  • Use cases: "What is the invoice number?", "Who is the sender?"

Hybrid Q&A

  • Best for: Complex questions requiring both visual and textual understanding
  • Features: Combines computer vision and NLP
  • Use cases: "What type of document is this?", "Is this a valid contract?"

Structured Data Q&A

  • Best for: Questions about tables, forms, structured content
  • Features: Table recognition and structured data extraction
  • Use cases: "What is the value in row 3?", "How many items are listed?"

Tips for Best Results

Question Formulation

  • Be specific: Ask clear, specific questions
  • Use natural language: Write questions as you would ask a human
  • Provide context: Include relevant context when helpful
  • Avoid ambiguous questions: Be clear about what information you need

Document Quality

  • High resolution: Use clear, high-quality document images
  • Good contrast: Ensure text is clearly readable
  • Complete documents: Include all relevant pages
  • Proper orientation: Ensure documents are correctly oriented

Configuration

  • Choose appropriate algorithm: Select based on question type
  • Set confidence thresholds: Adjust based on accuracy requirements
  • Include evidence: Enable evidence for better understanding
  • Use structured format: For complex answers requiring details

Common Issues and Solutions

Low Answer Confidence

  • Issue: Answers with low confidence scores
  • Solution: Improve document quality, rephrase questions, provide more context

Incorrect Answers

  • Issue: Wrong or irrelevant answers
  • Solution: Use more specific questions, check document quality, try different algorithms

No Answer Found

  • Issue: System cannot find relevant information
  • Solution: Verify document contains the information, try different question phrasing

Slow Processing

  • Issue: Long processing times
  • Solution: Use simpler algorithms, optimize document size, process in batches