Document Question Answering
Answer questions about documents (images) using various AI algorithms and document understanding techniques.
Document Question Answering Block
The Document Question Answering block is designed for answering questions about documents (images) using various algorithms. It combines document understanding, OCR, and natural language processing to provide accurate answers to questions about document content.
Overview
The Document Question Answering block enables interactive querying of document content. It can understand both visual and textual elements of documents to answer questions about their content, structure, and information.
Configuration Options
Algorithm Selection
Select an operation from the Choose Algorithm dropdown:
- Visual Question Answering: Answer questions based on visual content
- Text-based Q&A: Answer questions using extracted text content
- Hybrid Q&A: Combine visual and textual understanding
- Structured Data Q&A: Answer questions about tables and structured content
- Multi-modal Q&A: Advanced multi-modal document understanding
Input Configuration
Document Input
- Property:
msg.payload.document_path - Type: string
- Description: Path to the document image file
- Supported formats: .png, .jpg, .jpeg, .pdf, .tiff
Question Input
- Property:
msg.payload.question - Type: string
- Description: The question to ask about the document
- Examples:
- "What is the total amount on this invoice?"
- "Who is the recipient of this letter?"
- "What date is shown on this document?"
Context (Optional)
- Property:
msg.payload.context - Type: string
- Description: Additional context to help with answering
- Example: "This is a financial document from 2023"
Processing Options
Answer Confidence
- Type: number
- Range: 0.0 to 1.0
- Default: 0.7
- Description: Minimum confidence threshold for answers
Answer Format
- Type: string
- Options: ["short", "detailed", "structured"]
- Default: "short"
- Description: Format of the answer output
Include Evidence
- Type: boolean
- Default: true
- Description: Include supporting evidence with answers
Use Cases
Invoice Processing
Answer questions about invoice details:
Invoice image → Document Q&A → "What is the total amount?" → Answer: "$1,250.00"Contract Analysis
Extract specific information from contracts:
Contract document → Document Q&A → "What is the contract end date?" → Answer: "December 31, 2024"Report Analysis
Query financial or business reports:
Report document → Document Q&A → "What was the revenue for Q3?" → Answer: "$2.5M"Form Processing
Extract information from forms:
Form image → Document Q&A → "What is the applicant's name?" → Answer: "John Smith"Common Patterns
Basic Question Answering
// Configuration
// Algorithm: Text-based Q&A
// Answer Format: short
// Input message:
{
"payload": {
"document_path": "documents/invoice_001.pdf",
"question": "What is the invoice number?"
}
}
// Example flow:
// inject → Document Q&A → debug (answer)Visual Question Answering
// Configuration
// Algorithm: Visual Question Answering
// Include Evidence: true
// Input message:
{
"payload": {
"document_path": "scans/contract.png",
"question": "Is there a signature on this document?",
"context": "This is a legal contract"
}
}
// Example flow:
// inject → Document Q&A → debug (visual answer)Multi-question Processing
// Configuration
// Algorithm: Hybrid Q&A
// Answer Format: structured
// Process multiple questions about the same document
// Example flow:
// document → Document Q&A (question 1) → Document Q&A (question 2) → combined resultsAdvanced Features
Structured Answer Format
When using structured answer format:
{
"question": "What is the total amount on this invoice?",
"answer": "$1,250.00",
"confidence": 0.95,
"evidence": {
"text_evidence": "Total Amount: $1,250.00",
"location": {
"page": 1,
"coordinates": [100, 200, 300, 250]
}
},
"supporting_facts": ["Subtotal: $1,000.00", "Tax: $250.00"]
}Multi-modal Understanding
Combines visual and textual analysis:
{
"question": "What type of document is this?",
"answer": "This is an invoice from ABC Company",
"confidence": 0.92,
"analysis": {
"visual_features": {
"document_type": "invoice",
"company_logo": "ABC Company",
"layout_type": "standard_invoice"
},
"textual_features": {
"keywords": ["invoice", "payment", "due date"],
"entities": ["ABC Company", "$1,250.00", "2024-01-15"]
}
}
}Evidence-based Answers
Include supporting evidence:
{
"question": "Who is the recipient?",
"answer": "John Smith",
"confidence": 0.88,
"evidence": {
"source_text": "Bill To: John Smith",
"confidence": 0.88,
"location": "top right section"
},
"alternative_answers": [
{
"answer": "J. Smith",
"confidence": 0.75,
"evidence": "Signature line"
}
]
}Output Structure
Basic Answer
{
"document_path": "documents/invoice_001.pdf",
"question": "What is the total amount?",
"answer": "$1,250.00",
"confidence": 0.95,
"algorithm_used": "Text-based Q&A",
"processing_time": 2.1,
"timestamp": "2024-01-15T10:30:00Z"
}Detailed Answer with Evidence
{
"document_path": "documents/contract.pdf",
"question": "What is the contract duration?",
"answer": "12 months starting from January 1, 2024",
"confidence": 0.92,
"evidence": {
"text_evidence": "Contract Term: 12 months commencing January 1, 2024",
"location": {
"page": 2,
"section": "Terms and Conditions",
"coordinates": [150, 300, 400, 350]
}
},
"supporting_information": [
"Start Date: January 1, 2024",
"End Date: December 31, 2024",
"Duration: 12 months"
],
"algorithm_used": "Hybrid Q&A"
}Multi-question Results
{
"document_path": "documents/report.pdf",
"questions_answered": [
{
"question": "What is the revenue?",
"answer": "$2.5M",
"confidence": 0.94
},
{
"question": "What is the profit margin?",
"answer": "15%",
"confidence": 0.87
}
],
"overall_confidence": 0.91,
"processing_time": 4.2
}Algorithm Details
Visual Question Answering
- Best for: Questions about visual elements, layouts, charts
- Features: Computer vision and image understanding
- Use cases: "Is there a signature?", "What type of chart is shown?"
Text-based Q&A
- Best for: Questions about textual content
- Features: OCR and natural language processing
- Use cases: "What is the invoice number?", "Who is the sender?"
Hybrid Q&A
- Best for: Complex questions requiring both visual and textual understanding
- Features: Combines computer vision and NLP
- Use cases: "What type of document is this?", "Is this a valid contract?"
Structured Data Q&A
- Best for: Questions about tables, forms, structured content
- Features: Table recognition and structured data extraction
- Use cases: "What is the value in row 3?", "How many items are listed?"
Tips for Best Results
Question Formulation
- Be specific: Ask clear, specific questions
- Use natural language: Write questions as you would ask a human
- Provide context: Include relevant context when helpful
- Avoid ambiguous questions: Be clear about what information you need
Document Quality
- High resolution: Use clear, high-quality document images
- Good contrast: Ensure text is clearly readable
- Complete documents: Include all relevant pages
- Proper orientation: Ensure documents are correctly oriented
Configuration
- Choose appropriate algorithm: Select based on question type
- Set confidence thresholds: Adjust based on accuracy requirements
- Include evidence: Enable evidence for better understanding
- Use structured format: For complex answers requiring details
Common Issues and Solutions
Low Answer Confidence
- Issue: Answers with low confidence scores
- Solution: Improve document quality, rephrase questions, provide more context
Incorrect Answers
- Issue: Wrong or irrelevant answers
- Solution: Use more specific questions, check document quality, try different algorithms
No Answer Found
- Issue: System cannot find relevant information
- Solution: Verify document contains the information, try different question phrasing
Slow Processing
- Issue: Long processing times
- Solution: Use simpler algorithms, optimize document size, process in batches
Related Blocks
- OCR - For text extraction before Q&A
- Document Understander - For document analysis
- LLM Query - For advanced language understanding
- Table Structure Recognition - For table-based Q&A
- Document Classifier - For document type classification