LLM Query v2

Allows you to interact with various Large Language Models (LLMs) to generate text-based responses for a wide range of tasks. It supports multiple providers including OpenAI, Azure OpenAI, Anyscale, Vertex AI, and In-House Hosted models with provider-specific configurations.

Basic Configuration

Note: Field names vary by provider. See the Provider-Specific Details section below for exact requirements.

Model to Use (string)

Select the specific model you want to use from the available options.

Query (string)

The main question or input you want answered. Set via msg.payload.query. Note: Used by OpenAI, Azure OpenAI, and Anyscale. In-House Hosted and Vertex AI use prompt instead.

Prompt / System Prompt (string)

Instructions or context that guide the model's behavior (e.g., tone, rules, output format).

  • In-House Hosted and Vertex AI: Use msg.payload.prompt
  • OpenAI, Azure OpenAI, and Anyscale: Use msg.payload.system_prompt

A basic guide for creating prompts:

  • Define the role of LLM and task of LLM. For example "You are a helpful assistant who knows about instruction manuals of parts. Your task is to answer the User's query from the provided context.
  • Try to breakdown the task and add direct points to explain as headings, for example you will be given a context to answer or if you are not confident about the answer please say I don't know.
  • Give a proper output format, for example I want the answer in points manner, or generate a JSON with XYZ keys
  • If you still don't get the answer try adding warnings like make sure answer from context, do not hallucinate.

Number of Responses (number)

Specify how many different responses you want the model to generate. Default: auto. Note: Only available for In-House Hosted models.

Max Output Tokens (number)

Set the maximum tokens of the generated response. Default: auto.

History Configuration (Use this feature to enable chat-mode in LLMs)

The following options are available in an expandable "History" section:

History Name (string)

Optional: A name to identify and manage conversation history.

History (array)

Optional: Previous conversation history to maintain context in multi-turn interactions. This is typically set via msg.payload.history.

Conversation ID (string)

Optional: A unique identifier for the conversation, useful for tracking multiple conversations. This is typically set via msg.payload.conversation_id.

Advanced Configuration

The following options are available in an expandable "Advanced Config" section:

Loading Type (string)

Choose the quantization level for model loading:

  • 4-bit: Least Infrastructure Requirements
  • 8-bit: Low Infrastructure Requirements
  • 16-bit: Moderate Infrastructure Requirements
  • 32-bit: High Infrastructure Requirements

Repetition Penalty (number)

Control how much the model avoids repeating the same phrases. Higher values reduce repetition. Default: auto.

Temperature (number)

Control the randomness of the output. Higher values (e.g., 0.8) make the output more random and creative, while lower values (e.g., 0.2) make it more focused and deterministic. Default: auto.

Top P (number)

Limits the pool of words considered at each step to the most likely ones until the sum of their probabilities reaches this value. Use to balance quality and variety. Default: auto.

Top K (number)

Limits the pool of words considered at each step to the top K most likely options. Lower values make outputs more focused. Default: auto.

Provider-Specific Details

In‑House Hosted Models

LLM Query v2 In-House Hosted models configuration (Part 1) LLM Query v2 In-House Hosted models configuration (Part 2)

Run models managed by your organization. Useful when you need full control over data and tuning.

Required in message: prompt (instructions or context that guide the model's behavior).

Key settings: Loading Type, Max Output Tokens, Number of Responses, Temperature, Repetition Penalty, Top P, Top K.

Chat history: Optional. Provide msg.payload.history to maintain context across turns. Note: History is not supported for trained models.

Auto defaults: If you leave temperature, repetition_penalty, top_p, top_k, number_of_responses, or max_output_tokens as auto, the system chooses sensible values for you.

Model Config: Set these in the block settings: loading type, max output tokens, number of responses, temperature, repetition penalty, top_p, top_k.

Example (In‑House):

{
  "payload": {
    "prompt": "You are a helpful assistant. Answer briefly. What is the warranty period?",
    "history": [
      {"user_query": "Hello", "llm_response": "Hi! How can I help?"}
    ]
  }
}

OpenAI

LLM Query v2 OpenAI configuration (Part 1) LLM Query v2 OpenAI configuration (Part 2) LLM Query v2 OpenAI function block configuration

Two ways to use OpenAI:

  • Model: Send a system prompt and query to a chosen model. Supports images and documents as inputs.
  • Assistant: Talk to an existing assistant using its ID.

Where credentials are read from: Provide your OpenAI API key in msg.payload.api_key.

Where to put model configuration: Set generation options in the block settings for Model mode. Do not add model_config to the payload.

Model Config (per model): These depend on the provider/model documentation.

  • GPT‑4 family example:
{
  "temperature": 0.7,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0
}
  • GPT‑5 family example:
{
  "text": {"verbosity": "low"},
  "reasoning": {"effort": "minimal"}
}

Only these keys are accepted for GPT‑5 style models when available.

Example (OpenAI - Model):

{
  "payload": {
    "system_prompt": "You write concise answers.",
    "query": "Summarize: <text here>",
    "api_key": "sk-...",
    "model_name": "gpt-4o-mini",
    "history": [],
    "image_path": "images/input.jpg",
    "document_path": "docs/manual.pdf"
  }
}

Use Temperature, Max Output Tokens, Presence/Frequency penalties in the block settings. Optional: Leave these as auto to let the service choose.

Example (OpenAI - Assistant):

{
  "payload": {
    "query": "What can you do for data cleanup?",
    "history": [],
    "assistant_id": "asst_123...",
    "api_key": "sk-..."
  }
}

Structured JSON output (Model mode):

{
  "payload": {
    "system_prompt": "Return only the requested JSON.",
    "query": "Summarize the text into the schema.",
    "api_key": "sk-...",
    "model_name": "gpt-4o-mini",
    "response_schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "points": {"type": "array", "items": {"type": "string"}}
      },
      "required": ["title", "points"]
    }
  }
}

Azure OpenAI

Use OpenAI models hosted on Azure.

Required in message: system_prompt, query, api_key, azure_endpoint (must start with https://), api_version.

Model choice: Provide either model_name or azure_deployment (at least one is required). If both are provided, model_name takes priority.

Optional authentication: azure_ad_token or azure_ad_token_provider for Azure AD authentication.

Where credentials and URLs are read from: Provide api_key, azure_endpoint, and api_version in msg.payload. The block passes these to the service.

Where to put model configuration: Set generation options in the block settings. The block passes them as model_config to the service. Do not add model_config directly to the payload.

Example (Azure OpenAI):

{
  "payload": {
    "system_prompt": "Follow the style guide.",
    "query": "Draft a 3‑point summary for this note.",
    "history": [],
    "model_name": "gpt-4o-mini",
    "api_key": "az-...",
    "azure_endpoint": "https://<your-resource>.openai.azure.com/",
    "api_version": "2024-12-01-preview"
  }
}

You can also attach images/documents via image_path or document_path when your model supports them. Optional: Most generation parameters can be left as auto.

Anyscale

Use hosted models with options similar to OpenAI.

Required in message: system_prompt, query, api_key, model_name, base_url.

Key settings: Temperature, Max Output Tokens, Presence Penalty, Frequency Penalty, Chat Mode.

Chat mode: Set in block settings. Turn on to keep the tone conversational across turns.

Chat history: Optional. Provide msg.payload.history to maintain context across turns.

Where credentials are read from: Provide your Anyscale API key in msg.payload.api_key and base URL in msg.payload.base_url.

Example (Anyscale):

{
  "payload": {
    "system_prompt": "Be brief and friendly.",
    "query": "Explain the return policy.",
    "api_key": "anyscale-...",
    "model_name": "meta-llama/Llama-2-7b-chat-hf",
    "base_url": "https://api.endpoints.anyscale.com/v1",
    "history": []
  }
}

Vertex AI

Use Google's models with your project credentials.

Required in message: prompt (instructions or context), project, location, service_account_info (JSON object), and model_name.

Optional: history, image_path, document_path, and response_schema if you want JSON‑shaped output.

Where to put model configuration: Set generation options in the block settings. The block passes them as model_config to the service. Do not add model_config directly to the payload.

Files and Images with Vertex AI: Optional. Use image_path / document_path with single path or list of paths.

Example (Vertex AI):

{
  "payload": {
    "prompt": "Answer with actionable steps.",
    "model_name": "gemini-1.5-pro",
    "project": "my-gcp-project",
    "location": "us-central1",
    "service_account_info": {
      "type": "service_account",
      "client_email": "[email protected]",
      "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
    },
    "history": [],
    "image_path": ["images/sample.jpg"],
    "document_path": ["docs/context.pdf"],
    "response_schema": {
      "type": "object",
      "properties": {"summary": {"type": "string"}},
      "required": ["summary"]
    }
  }
}

Files and Images

Attach inputs by passing file paths in your message:

  • image_path: Optional. A single path or a list of paths.
  • document_path: Optional. A single path or a list of paths.

Optional: Use absolute paths or ensure your storage directory is configured so relative paths can be resolved.

Example:

{
  "payload": {
    "system_prompt": "Describe what you see in the image.",
    "query": "What's in this picture?",
    "image_path": ["images/sample.jpg"],
    "document_path": ["docs/context.pdf"]
  }
}

Example

Input (msg.payload)

{
  "system_prompt": "You write concise answers.",
  "query": "What is the warranty period?",
  "history": []
}

Output (msg.payload)

{
  "output": {
    "response": "The warranty period is 12 months.",
    "input": "What is the warranty period?"
  }
}

Errors

When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.

Common mistakes

  • Using the wrong message field for the provider: Some providers expect query, others expect prompt. Follow the provider section you selected.
  • Missing credentials: Provide required keys/endpoints via the message fields shown in the provider section.
  • History shape mismatch: If you pass history, keep it as a list of { user_query, llm_response } items.

Output

msg.payload contains an output field with the generated text response(s) from the selected language model.

Tips for Best Results

  • Be clear and specific in your prompts; include desired format and constraints.
  • Optional: Use History and Conversation ID to keep context across turns.
  • Tune Model Config in the block settings. Match keys to the chosen model's documentation (e.g., GPT‑4 uses temperature/penalties; GPT‑5 uses text.verbosity and reasoning.effort).
  • Start with defaults; change one parameter at a time to see impact.
  • Optional: When sending images/documents, ensure paths are accessible and within allowed size limits.

Note

The performance and capabilities of the LLM Query block may vary depending on the selected model and configuration. Always review and validate the generated outputs, especially for critical applications.