Entity Extractor

Quick Start

To get started:

Select a pre-trained model from the dropdown menu. Models must be trained beforehand using the entity extraction trainer block.

Provide the list of entity labels to extract when using the deprecated low-accuracy option.

The input text from which to extract entities. Can be a single string or an array of strings.

Example: "John Smith works as an engineer at Microsoft in Seattle since January 2020."

msg.payload is a list of [label, text] pairs.

Example: [["PERSON", "John Smith"], ["ORG", "Microsoft"]]

msg.payload contains an output object mapping token to entity label.

Example:

{
  "output": {
    "John": "B-PER",
    "Smith": "I-PER",
    "Microsoft": "B-ORG"
  }
}

{
  "text": "John Smith works as an engineer at Microsoft in Seattle since January 2020."
}

[["PERSON", "John Smith"], ["ORG", "Microsoft"], ["GPE", "Seattle"], ["DATE", "January 2020"]]

When the block fails, it raises an error. Use a Catch block in your flow to handle failures and inspect the error payload.

Use clear, well-structured text for more accurate entity extraction
Ensure your training data includes diverse examples of each entity type you want to extract
Use domain-specific models when working with specialized text (medical, legal, technical, etc.)
Consider text preprocessing (removing noise, fixing encoding issues) before extraction
Regularly retrain models as entity types and language patterns evolve
Always validate extracted entities in production applications, especially for critical data