Entity Extractor

Extracts entities from text using the model option you choose in the block UI.

Quick Start

To get started:

  • Choose a trained model from the Model to use dropdown
  • Send text via msg.payload.text
  • Receive extracted entities in msg.payload

Configuration

Model to use (required)

Select a pre-trained model from the dropdown menu. Models must be trained beforehand using the entity extraction trainer block.

Entity List to Extract (required for Low Infra - Low Accuracy)

Provide the list of entity labels to extract when using the deprecated low-accuracy option.

Common Input Format (All Algorithms)

msg.payload.text (string | array)

The input text from which to extract entities. Can be a single string or an array of strings.

Example: "John Smith works as an engineer at Microsoft in Seattle since January 2020."

Output by Algorithm Option

Low Infra - Low Accuracy (Deprecated)

msg.payload is a list of [label, text] pairs.

Example: [["PERSON", "John Smith"], ["ORG", "Microsoft"]]

Low Infra - Good Accuracy - v1 / Low Infra - Good Accuracy - v2

msg.payload contains an output object mapping token to entity label.

Example:

{
  "output": {
    "John": "B-PER",
    "Smith": "I-PER",
    "Microsoft": "B-ORG"
  }
}

Example

Input (msg.payload)

{
  "text": "John Smith works as an engineer at Microsoft in Seattle since January 2020."
}

Output (msg.payload)

[["PERSON", "John Smith"], ["ORG", "Microsoft"], ["GPE", "Seattle"], ["DATE", "January 2020"]]

Errors

When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.

Common mistakes

  • Empty text: msg.payload.text is required and must be a non-empty string.

Best Practices

  • Use clear, well-structured text for more accurate entity extraction
  • Ensure your training data includes diverse examples of each entity type you want to extract
  • Use domain-specific models when working with specialized text (medical, legal, technical, etc.)
  • Consider text preprocessing (removing noise, fixing encoding issues) before extraction
  • Regularly retrain models as entity types and language patterns evolve
  • Always validate extracted entities in production applications, especially for critical data