What is the DIETClassifier?

In the previous blog, we explored CRFEntityExtractor, a sequence-labeling model that learns how entities appear in context using statistical features.

CRF represented a major step forward from pure rule-based extraction.
But as conversational systems evolved, maintaining separate models for intent classification and entity extraction started to show its limits.

Modern NLU pipelines favor shared representations, joint learning, and deep learning–based generalization.

That’s where DIETClassifier comes in.

Contents of this blog

  • What is DIETClassifier
  • Why DIET was introduced
  • How DIET works at a high level
  • Intent classification with DIET
  • Entity extraction with DIET
  • Training data format
  • When to use DIETClassifier

What is the DIETClassifier?

DIET stands for Dual Intent and Entity Transformer.

It is a single neural network that performs:

  • Intent classification
  • Entity extraction

…at the same time.

Unlike CRFEntityExtractor, which focuses only on entities, DIET jointly learns:

  • The meaning of the full sentence (intent)
  • The role of each token (entity labels)

This shared learning allows the model to use intent-level context to improve entity prediction, and vice versa.

Why was DIET introduced?

Traditional pipelines looked like this:
Intent classifier → predicts intent
Entity extractor → predicts entities independently

This separation has drawbacks:

  1. Duplicate feature computation
  2. No shared understanding between intent and entities
  3. More models to train, tune, and maintain

DIET solves this by using one model to learn shared embeddings and optimise both tasks together.

This leads to better performance, especially when training data is limited.

How DIET works:

DIET is based on a Transformer architecture.
At a high level, it:

Tokenizes the input text
Converts tokens into embeddings
Applies transformer layers to model context

and predicts:

A sentence embedding → intent
Token-level labels → entities
Instead of hand-engineered features (as in CRF), DIET learns features automatically.

Intent classification with DIET

For intent classification, DIET:

  • Embeds the entire sentence
  • Compares it against learned intent embeddings
  • Uses similarity scoring to choose the best intent

Example:

“Book a flight to Paris.”

The model learns that this sentence embedding is closest to the book_flight intent. This approach allows DIET to generalize well to paraphrases and unseen phrasing.

Entity extraction with DIET

For entities, DIET performs token-level classification, similar to CRF. Each token receives labels like B-entity, I-entity, O, etc.

Book    O
a       O
flight  O
from    O
New     B-location
York    I-location
to      O
Paris   B-location

The difference is that DIET uses contextual embeddings produced by transformers instead of manually designed features.

Training data format

DIET uses the same annotated NLU data as CRF.

version: "3.1"

nlu:
  - intent: book_flight
    examples: |
      - Book a flight from [New York](location) to [Paris](location)
      - Fly from [Berlin](location) to [London](location)

There is no separate configuration for intent vs entity training. DIET learns both from the same data.

Internal working (simplified)

At runtime, DIET:

  1. Tokenizes the message
  2. Generates embeddings
  3. Applies transformer layers
  4. Predicts:
    • Intent with confidence
    • Entity labels per token
  5. Groups entity tokens
  6. Outputs structured NLU results

Example output:

{
  "intent": {
    "name": "book_flight",
    "confidence": 0.92
  },
  "entities": [
    {
      "entity": "location",
      "value": "Paris",
      "start": 23,
      "end": 28
    }
  ]
}

When should you use DIETClassifier?

DIETClassifier is the default choice when you want a single model for intents and entities, when the language is flexible and conversational,
and when you care about long-term scalability or are building production-grade assistants.

CRFEntityExtractor and RegexEntityExtractor still have value, especially for highly structured or deterministic entities, but DIET is the backbone of modern Rasa NLU pipelines.

With this, we have completed most of the major entity and intent mappers. Following this, we shall begin to see how bots are developed using code.

Until next time.

Leave a Reply