How AI Generates Brand Names: The Real Pipeline

I spent three weeks trying to name a side project last year. Three weeks. I had a spreadsheet with 200 entries, half of them portmanteaus that sounded like prescription medications. That’s when I got curious about how AI name generators actually work under the hood.

Turns out the problem is far more interesting than “just ask GPT for a name.” Let me walk you through the real engineering.

Why Name Generation Is Deceptively Hard

Think of a good brand name off the top of your head. Got one? Now check if the .com is available. It’s not. That, in miniature, is the whole problem.

But it goes deeper than domain squatting. A name generator has to satisfy constraints that fight each other:

Phonetic quality: The name needs to be pronounceable, memorable, and pleasant to say. “Spotify” rolls off the tongue. “Qwrtyp” does not.
Semantic relevance: It should hint at what the product does, or at least not contradict it.
Uniqueness: It can’t sound like an existing trademark. Call your fintech startup “Paypel” and see what happens.
Cross-language safety: “Nova” means “doesn’t go” in Spanish (the Chevy Nova legend is actually a myth, but the concern is real). “Siri” means something unfortunate in Georgian.
Domain availability: There are roughly 350 million registered domains. Your perfect five-letter .com is taken.

The combinatorial space of possible names is enormous. English has about 44 phonemes. A two-syllable name is 4-6 phonemes, giving you millions of combinations. Most of them are garbage. The engineering challenge is generating candidates that land in the narrow band between “that sounds like a real word” and “that’s already trademarked.”

Approach 1: Markov Chains (The Simple Baseline)

The oldest trick in the book. Train a character-level Markov chain on a corpus of existing brand names, then sample from it. Each character prediction depends only on the previous N characters.

Here’s a minimal implementation:

from collections import defaultdict
import random

def build_chain(names, order=3):
    chain = defaultdict(list)
    for name in names:
        padded = "^" * order + name.lower() + "$"
        for i in range(len(padded) - order):
            key = padded[i:i + order]
            chain[key].append(padded[i + order])
    return chain

def generate(chain, order=3, max_len=12):
    result = ""
    key = "^" * order
    for _ in range(max_len):
        if key not in chain:
            break
        next_char = random.choice(chain[key])
        if next_char == "$":
            break
        result += next_char
        key = key[1:] + next_char
    return result.capitalize()

# Train on real brand names
brands = ["spotify", "shopify", "stripe", "slack", "notion",
          "figma", "vercel", "linear", "retool", "supabase"]
chain = build_chain(brands, order=2)

for _ in range(5):
    print(generate(chain, order=2))
python

Run that and you’ll get output like “Slace”, “Notify”, “Supa”. Some are surprisingly good. Most are not.

The fatal flaw? Markov chains have no understanding of what makes a name good. They learn character co-occurrence patterns, nothing more. Set the order too low and you get random nonsense. Set it too high and you just recombine chunks of your training data. There’s a sweet spot around order 2-3 for brand names, but even then you’re playing a numbers game where maybe 1 in 50 outputs is usable.

Approach 2: Neural Language Models

RNNs and LSTMs were the first real upgrade. Instead of a fixed context window, recurrent models maintain a hidden state that (theoretically) captures long-range dependencies across the entire name.

You train a character-level LSTM on your brand name corpus, and it learns subtler phonotactic patterns. It picks up that “str-” is a strong opening cluster in English, that names rarely end in “-gk”, and that doubled vowels like “oo” give a name a friendly feel (Google, Yahoo, Voodoo).

The practical difference from Markov chains: LSTMs generate names that sound more like real words because they’re better at learning the statistical structure of English phonology. The tradeoff is training time, model complexity, and the need for a larger corpus. For a hobby project, Markov chains are fine. For a production system, you want something with more capacity.

Approach 3: Transformer-Based Generation

This is where things get interesting. Fine-tune a GPT-style model on brand names, and you unlock something Markov chains and LSTMs can’t do: conditional generation.

Want a name that sounds techy? Playful? Premium? You can encode those attributes into your prompt or training data and the model learns to steer its outputs. Here’s where temperature and sampling strategy become critical:

import numpy as np

def sample_with_temperature(logits, temperature=1.0, top_k=10):
    """
    Lower temperature = more conservative, predictable names
    Higher temperature = more creative, risky names
    top_k limits sampling to the K most likely next tokens
    """
    # Apply temperature scaling
    scaled = logits / temperature

    # Top-k filtering
    if top_k > 0:
        threshold = np.sort(scaled)[-top_k]
        scaled[scaled < threshold] = -np.inf

    # Convert to probabilities
    probs = np.exp(scaled) / np.sum(np.exp(scaled[scaled > -np.inf]))

    return np.random.choice(len(probs), p=probs)

# In practice:
# temperature=0.3 → safe names like "Bluecore", "Datafy"  
# temperature=0.7 → balanced like "Zentiq", "Colvara"
# temperature=1.2 → wild like "Xyphora", "Quenbi"
python

Temperature is the single most important hyperparameter in name generation. Too low and every output sounds like every other B2B SaaS startup. Too high and you get names that look like someone fell asleep on a keyboard. Most production systems let users control this indirectly through a “creativity slider” or style selector.

Approach 4: Generator-Discriminator Architectures

Here’s a pattern borrowed from GANs but adapted for discrete text. One model generates name candidates. A separate model scores them. The scorer is trained on human preference data, and you use its signal to improve the generator over time.

The scorer typically evaluates multiple dimensions:

Phonetic quality (does it sound good?)
Semantic fit (does it match the industry?)
Memorability (how easy is it to recall after one exposure?)
Visual balance (does it look good written down?)

This is closer to how RLHF works for chat models, but applied to a much narrower domain. The advantage is that your generator keeps improving as you collect more human feedback. The downside: you need that feedback data, and collecting it is slow.

Phonetic Scoring: The Secret Weapon

Ask any naming professional what separates forgettable names from sticky ones, and the answer usually involves sound symbolism. This isn’t mysticism. It’s backed by decades of linguistics research.

Certain sound patterns trigger consistent associations across languages:

Front vowels (like /i/ in “sweet”) feel small, fast, light
Back vowels (like /u/ in “brute”) feel large, heavy, powerful
Plosive consonants (/b/, /k/, /t/) feel strong and decisive
Fricatives (/f/, /s/, /v/) feel soft and sophisticated

You can encode this into a scoring function:

PHONETIC_FEATURES = {
    'b': {'strength': 0.8, 'softness': 0.1, 'energy': 0.7},
    'k': {'strength': 0.9, 'softness': 0.0, 'energy': 0.8},
    's': {'strength': 0.2, 'softness': 0.9, 'energy': 0.4},
    'f': {'strength': 0.1, 'softness': 0.8, 'energy': 0.3},
    'i': {'brightness': 0.9, 'warmth': 0.3, 'weight': 0.1},
    'o': {'brightness': 0.4, 'warmth': 0.8, 'weight': 0.7},
}

def score_name_phonetics(name, target_profile):
    """
    Score how well a name's phonetic features match
    a desired brand personality profile.

    target_profile example: {'strength': 0.7, 'softness': 0.3}
    """
    scores = []
    for char in name.lower():
        if char in PHONETIC_FEATURES:
            features = PHONETIC_FEATURES[char]
            for trait, target_val in target_profile.items():
                if trait in features:
                    diff = abs(features[trait] - target_val)
                    scores.append(1.0 - diff)

    return sum(scores) / len(scores) if scores else 0.0

# "Kraft" scores high on strength. "Silvia" scores high on softness.
print(score_name_phonetics("kraft", {"strength": 0.8}))   # ~0.87
print(score_name_phonetics("silvia", {"softness": 0.8}))  # ~0.85
python

This is simplified, obviously. Production systems use IPA transcription, syllable stress patterns, and cross-language phoneme databases. But the core idea holds: you can computationally score how a name feels before any human ever reads it.

The Filtering Pipeline

Raw generation is maybe 20% of the work. The real engineering is in filtering. A production name generator pushes candidates through a pipeline like this:

Stage 1: Linguistic filtering. Check for profanity (including in other languages), slang, or unfortunate double meanings. This is harder than it sounds. “Therapist” contains an unfortunate substring. “Pen Island” is a classic. Your filter needs to catch both exact matches and embedded patterns.

Stage 2: Domain availability. DNS lookups are fast but rate-limited. Most systems check multiple TLDs (.com, .io, .co, .ai) and surface available options. Some use WHOIS APIs or registrar APIs like Namecheap or GoDaddy to check in bulk.

Stage 3: Trademark screening. The USPTO has a free API (TESS), and EUIPO has one for European marks. You’re looking for exact matches and phonetic similarity. “Gogle” would fail even though it’s spelled differently. Levenshtein distance and phonetic hashing (Soundex, Metaphone) handle the fuzzy matching.

Stage 4: Phonetic deduplication. If your generator produced “Zentiq” and “Zentik”, you probably only want to show one. Metaphone encoding or phonetic distance scoring collapses these into equivalence classes.

Stage 5: Human scoring. The best systems incorporate user feedback loops. Every name a user saves, dismisses, or edits becomes training data for the next iteration.

How It All Fits Together

In a production system, these components form a pipeline:

User specifies keywords, industry, and style preferences
Multiple generators run in parallel (transformer, Markov, phonetic assembly)
Candidates merge into a pool (typically 500-2,000 raw names)
The filtering pipeline removes roughly 90% of candidates
A ranking model scores survivors on phonetic quality, semantic relevance, and user preference alignment
The top 20-50 names reach the user, with domain availability and basic trademark status attached

Tools like nametastic.com combine several of these techniques in their pipeline. If you’ve used any AI name generator recently, you’ve seen this architecture in action even if the specific model choices vary between products.

The interesting engineering challenge isn’t any single component. It’s the orchestration. How do you balance generation diversity against quality? How aggressively should you filter? Too aggressive and you show 5 boring safe names. Too loose and you bury the gems in noise.

What’s Coming Next

Two emerging approaches are worth watching:

Reinforcement learning from human preferences. Instead of training a separate scorer, you fine-tune the generator directly on preference data. Every time a user picks “Zentiq” over “Blandco”, that signal flows back into the model. This is the same idea behind RLHF in ChatGPT, applied to a constrained generation task. The smaller output space (short strings vs. paragraphs) actually makes this more tractable.

Diffusion models for discrete text. Models like D3PM and MDLM are adapting the diffusion framework from image generation to text. Instead of denoising a blurry image, you iteratively refine a corrupted token sequence into a clean name. Early results are promising for short text generation because the fixed-length output structure maps well to the diffusion paradigm. This is still research-stage, but the name generation use case is almost tailor-made for it.

And honestly? The biggest unsolved problem is cultural sensitivity at scale. You can check a name against five or ten languages. But there are 7,000 languages worldwide, and a name that’s perfect in English might be offensive in a language your product expands into three years later. No model handles that well yet.

Wrapping Up

If you’re thinking about building a name generator yourself, start with a Markov chain. Seriously. Get the filtering pipeline right first, because that’s where the real value lives. Then swap in increasingly sophisticated generators as your needs grow.

The gap between a basic Markov chain generator and a production system like nametastic.com isn’t just the model. It’s the filtering, the phonetic scoring, the domain checking, and the preference learning loop that turns user behavior into better outputs over time.

For the ML engineers in the room: this is a weirdly satisfying problem space. The outputs are short enough to iterate fast, the evaluation is immediate (does this name sound good?), and you get to combine NLP, information retrieval, and human-computer interaction in a single system.

And if nothing else, you’ll never look at a startup name the same way again.