Document Classifier
Automatically classify documents into predefined categories using AI
Document Classifier Block
The Document Classifier block automatically categorizes documents into predefined types using AI. It analyzes document content, layout, and visual features to determine the document type, making it essential for automated document processing workflows.
Overview
The Document Classifier uses machine learning models to identify document types such as invoices, contracts, receipts, forms, reports, and more. It can process both text content and visual document features to provide accurate classification with confidence scores.
Configuration Options
Input Source
- Document Source: Select from message payload, file upload, or specific property
- Input Type:
- Document file (PDF, images)
- Extracted text
- Combined (text + visual features)
Classification Settings
- Model Type: Choose classification model
- Standard Document Types
- Custom Trained Model
- Industry-specific Models
- Confidence Threshold: Minimum confidence for classification (0.0 - 1.0)
- Return Top N: Number of top predictions to return
Document Categories
Common document types supported:
- Financial: Invoices, receipts, bank statements
- Legal: Contracts, agreements, legal documents
- Forms: Applications, surveys, questionnaires
- Reports: Business reports, analysis documents
- Correspondence: Letters, emails, memos
- Identification: IDs, passports, licenses
Output Options
- Single Prediction: Return only the top prediction
- Multiple Predictions: Return ranked list of predictions
- Include Confidence: Include confidence scores
- Include Features: Return extracted features used for classification
Input Message Format
Document File Input
{
payload: /* document buffer */,
filename: "document.pdf",
mimetype: "application/pdf"
}Text Input
{
payload: {
text: "Invoice from ABC Company...",
metadata: {
pages: 2,
words: 350
}
}
}Combined Input
{
payload: {
text: "Document text content",
image: /* document image buffer */,
layout: /* layout information */
}
}Output Message Format
Single Prediction
{
payload: {
classification: {
type: "invoice",
confidence: 0.94,
category: "financial"
},
input_info: {
pages: 1,
text_length: 245,
has_tables: true
}
}
}Multiple Predictions
{
payload: {
predictions: [
{
type: "invoice",
confidence: 0.94,
category: "financial"
},
{
type: "receipt",
confidence: 0.78,
category: "financial"
},
{
type: "contract",
confidence: 0.23,
category: "legal"
}
],
top_prediction: "invoice",
features: {
has_company_header: true,
has_line_items: true,
has_totals: true,
layout_type: "structured"
}
}
}Document Types
Financial Documents
- Invoices: Bills and billing documents
- Receipts: Purchase receipts and proof of payment
- Bank Statements: Financial account statements
- Purchase Orders: Procurement documents
- Credit Notes: Credit and refund documents
Legal Documents
- Contracts: Legal agreements and contracts
- Terms of Service: Service agreements
- Privacy Policies: Data protection documents
- Legal Notices: Official legal communications
Business Forms
- Applications: Various application forms
- Surveys: Questionnaires and feedback forms
- Registration Forms: Sign-up and registration documents
- Compliance Forms: Regulatory and compliance documents
Reports & Analysis
- Business Reports: Analytical reports
- Financial Reports: Financial analysis documents
- Research Papers: Academic and research documents
- Presentations: Slide presentations and summaries
Common Use Cases
Automated Document Routing
Route different document types to appropriate processing workflows:
Document Input → Document Classifier → Switch Node → Type-specific ProcessingInvoice Processing System
Identify invoices for specialized processing:
File Upload → Document Classifier → Invoice Processor → Entity ExtractionMulti-type Document Processing
Handle mixed document batches:
Batch Upload → Document Classifier → Route by Type → Process AccordinglyDocument Validation
Verify document types match expected categories:
Expected Type → Document Classifier → Type Validation → Process or RejectBest Practices
- Quality Input: Provide clear, well-scanned documents for better accuracy
- Confidence Thresholds: Set appropriate confidence levels for your use case
- Fallback Handling: Handle uncertain classifications gracefully
- Type-specific Routing: Use classification results to route to specialized processors
- Model Selection: Choose appropriate models for your document domain
Integration Patterns
With Document Processing Pipeline
Document → Classifier → OCR (if needed) → Entity Extractor → Data ValidationWith Conditional Processing
Document → Classifier → Switch → [Invoice Flow | Contract Flow | Form Flow]With Quality Assurance
Document → Classifier → Confidence Check → Human Review (if low confidence)With Batch Processing
Document Batch → Array Loop → Classifier → Group by Type → Process BatchesConfidence Score Interpretation
High Confidence (0.8 - 1.0)
- Very reliable classification
- Proceed with automated processing
- Document clearly matches known patterns
Medium Confidence (0.5 - 0.8)
- Reasonable classification
- Consider additional validation
- May benefit from human review
Low Confidence (0.0 - 0.5)
- Uncertain classification
- Recommend human review
- Document may be edge case or new type
Error Handling
Common Issues
Unrecognized Document Types
- Document type not in training data
- Poor quality or corrupted document
- Mixed or composite document types
Low Confidence Scores
- Unusual document layout
- Poor image quality
- Ambiguous document content
Processing Errors
- Unsupported file format
- Corrupted or encrypted documents
- Oversized documents
Solutions
// Error handling flow
{
payload: {
error: "classification_failed",
message: "Document type could not be determined",
confidence: 0.12,
suggestions: ["manual_review", "image_enhancement"]
}
}Flow Examples
Smart Document Router
Upload → Document Classifier → Switch:
├─ invoice → Invoice Processor
├─ contract → Contract Analyzer
├─ receipt → Receipt Processor
└─ unknown → Manual Review QueueQuality-based Processing
Document → Classifier → Confidence Check:
├─ High (>0.8) → Auto Process
├─ Medium (0.5-0.8) → Quick Review
└─ Low (<0.5) → Full ReviewMulti-step Validation
Document → Classifier → Expected Type Check → Process if Match
→ Flag if MismatchPerformance Considerations
- Document Size: Larger documents take longer to classify
- Model Complexity: More sophisticated models provide better accuracy but slower processing
- Batch Size: Process documents individually for real-time needs
- Caching: Cache results for frequently processed document types
Tips
- Start with broad categories and refine based on results
- Use debug blocks to examine classification features
- Monitor confidence distributions to tune thresholds
- Combine with other blocks for comprehensive document understanding
- Consider custom training for domain-specific document types
Use Document Classifier with OCR for text extraction and Entity Extractor for data extraction workflows.