BlocksLLM Guard Blocks
LLM Judge
Evaluates text with three modes: input scanning, output scanning, and response evaluation using Large Language Models.
Quick Start
To get started:
- Choose a mode from Choose Operation
- Provide input fields for that mode
- Receive evaluation results in
msg.payload
Configuration
Model to use (required)
Select an LLM model for judging/evaluation.
Input by Mode
Input Scanners
msg.payload.user_input(string) - text to scanmsg.payload.ban_substrings(array, optional) - required when ban-substring scanning is enabled
Output Scanners
msg.payload.model_output(string) - model response to scanmsg.payload.ban_substrings(array, optional) - required when ban-substring scanning is enabled
Judge Metric
msg.payload.user_input(string) - original user querymsg.payload.model_output(string) - model response to evaluatemsg.payload.retrieval_context(array, optional) - retrieved context passagesmsg.payload.rubric(array, optional) - list of rubric items, each withscore_rangeandinstructionmsg.payload.evaluation_steps(array, optional) - ordered evaluation steps
Output by Mode
msg.payload contains an output field with the results.
Input Scanners / Output Scanners
msg.payload.output is an object with:
is_valid(boolean)scan_results(object)
Judge Metric
msg.payload.output is an object with:
status(string)evaluation_results(object)
Example
Input (msg.payload)
{
"user_input": "Summarize the contract terms.",
"model_output": "The contract lasts for 12 months and renews automatically.",
"retrieval_context": ["Contract duration is 12 months with auto-renewal."]
}Output (msg.payload)
{
"output": {
"status": "completed",
"evaluation_results": {
"g_eval": { "score": 0.78, "reason": "Relevant and consistent." },
"context_relevancy": { "score": 0.92, "reason": "Matches retrieved context." }
}
}
}Errors
When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.
Common mistakes
- Missing required field: Provide
user_inputormodel_outputbased on the selected mode. - Invalid type: Ensure input fields are strings and arrays are arrays.
Best Practices
- Provide clear evaluation criteria for consistent results
- Use appropriate models based on evaluation complexity
- Test with known examples to calibrate expectations
- Use LLM Judge for quality assurance in content generation workflows
- Combine with human review for critical evaluations
- Document evaluation criteria for reproducibility
- Monitor evaluation consistency over time