Synthetic Personas & Data

AI can generate user personas in minutes. That’s genuinely useful – in the right context. The problem is that every team is drawing from the same source, and without your own data the output is the same as everyone else’s.

Key Takeaways

Synthetic personas are fast and cheap – good for early exploration and quick concept tests. The issue is that all AI tools draw from the same public knowledge base. Without your own user data, the output is generic. The teams getting real value are the ones enriching AI with their own research, customer data, and session insights.

In this article

What Synthetic Personas Are
The Commoditisation Problem
When Synthetic Personas Actually Make Sense
The Cultural Blind Spot
Tools Worth Knowing
The Risk of Overconfidence
Where Real Research Matters Most

Synthetic personas have entered the UX toolkit. The idea is simple: describe a user type to an AI tool, and it generates a realistic-sounding profile. The persona reacts to your product, answers questions, and flags potential problems – without any real users being involved.

The speed and cost advantages are real. What used to take weeks of research can be sketched out in an afternoon. For early concepts or workshop prep, that matters.

The problem is what most teams are actually feeding into these tools.

What Synthetic Personas Are

A synthetic persona is a simulated user profile generated by an AI. You describe who the user is, and the AI responds as if it were that person – reacting to your product, answering questions, pointing out friction.

Used well, they work as a starting point. Good for testing assumptions before committing to real research, exploring a concept quickly, or generating different viewpoints in a workshop. They are not a substitute for talking to real users. But as a faster first step before doing that, they have genuine value.

Where they deliver real value:

Quick hypothesis testing – explore an assumption about user behaviour before starting a full research round
Early concept checks – find obvious gaps in an idea before spending time on user recruiting
Workshop coverage – generate perspectives across several user types simultaneously
Faster iteration – shorten the gap between a design decision and a first signal
Low-cost exploration – qualify which research questions are worth the investment of real user studies

The problem is not that synthetic personas don’t work. The problem is how most teams are using them.

The Commoditisation Problem: Same AI, Same Output

The quality of a synthetic persona depends entirely on what the AI was trained on.

AI tools are trained on public data: websites, research papers, articles, forum posts. They don’t know your users. They know what the internet says about users like yours.

If you and a competitor both use the same AI tool to generate a persona for a European online shopper, you get outputs built from the same information. The personas might look different. The assumptions inside them are largely the same.

The more teams adopt the same tools, the more the results converge. AI-generated personas become Einheitsbrei – a uniform product that doesn’t reflect any real user, just a shared average of publicly available data. There’s no competitive advantage in something every other team already has access to.

The commoditisation problem gets worse the further you move from the English-speaking West.

AI tools are trained on what the internet has written down. The internet is not evenly distributed. English content dominates. Western European contexts come second. Arabic, Japanese, Thai, Vietnamese, Hindi – these are represented at a fraction of the volume, and much of the nuance around behaviour, context, and local norms is simply missing.

A synthetic persona generated for a Saudi Arabian user is built from a fraction of the training data that shapes a German or American one. Less data means thinner signal – and a higher risk that the model fills the gaps with assumptions pulled from Western defaults.

This bias has a name. Researchers call it the WEIRD problem – Western, Educated, Industrialized, Rich, Democratic. First documented in behavioural science by Henrich, Heine & Norenzayan (2010), who showed that the overwhelming majority of psychological and behavioural research was conducted on WEIRD populations – and then generalised as universal human behaviour. The same structural problem now runs through AI training data. LLMs inherit the bias of the corpus they were trained on. A model trained predominantly on English-language Western web content will produce outputs that reflect WEIRD assumptions – even when asked to simulate users from entirely different cultural contexts.

It’s not just text direction

The most visible signal of this gap is reading direction. Arabic, Hebrew, Urdu, and Farsi run right to left. Japanese can run vertically. But the implications go far beyond layout.

Spatial cognition and scanning patterns

RTL readers don’t just read from the opposite side – they scan interfaces differently. Navigation hierarchies, primary CTAs, trust signals, error messages: the entire spatial grammar of a UI shifts. A synthetic persona trained on Western UX conventions will consistently misjudge where attention lands and where friction occurs.

Information density and visual hierarchy

Japanese and Chinese interfaces routinely pack information that Western design conventions would call overloaded. What reads as chaotic to a Northern European user is functional and expected to someone used to dense kanji-heavy layouts. A Western-trained persona will flag density as a usability problem that isn’t one.

Trust signals and social proof

What builds confidence varies significantly by region. In many Southeast Asian markets, messaging app integration (LINE, WhatsApp, Zalo) is a primary trust signal – more so than a polished website or an SSL certificate. In the Gulf, brand presence and official endorsements carry different weight than in Germany. A generic AI persona won’t model this correctly.

Infrastructure and device context

In large parts of South and Southeast Asia, the primary internet device is a mid-range Android on a variable mobile connection – not a laptop on a broadband home network. Load time tolerance, navigation depth, offline behaviour, and payment flow expectations are fundamentally different. Western-trained personas assume infrastructure that doesn’t exist.

The data gap compounds the problem

When an AI model has less data about a market, it doesn’t become less confident – it becomes less accurate at the same confidence level. The persona still sounds detailed and plausible. The assumptions inside it are just wrong more often.

This is where the risk of overconfidence hits hardest. A team designing a product for the Indonesian market and using a generic AI persona as input is working with a profile that was partly constructed from guesswork. Not labelled as guesswork. Presented as user insight.

The less data a model has about a market, the more it guesses. The output looks the same either way.

The real consequence: teams that would never skip user research for a German product launch go to market in Thailand or the UAE on a persona the AI invented from thin coverage. Real research matters everywhere – but it matters most where synthetic shortcuts are least reliable.

When Synthetic Personas Actually Make Sense

AI-supported personas only deliver real value under one condition: you bring your own data.

A team that feeds its AI tool with years of interview transcripts, session recordings, CRM segments, and support data is not generating personas from public knowledge. It’s synthesising its own insight into a usable format. The AI becomes a pattern-recognition tool applied to proprietary data – not a generator of shared assumptions.

The second condition is focus. Generic personas (“mobile user”, “budget shopper”) produce generic output. The teams getting real value are working targeted, not broad – asking specific questions based on what they already know about their users.

Not “what does a 30-year-old urban shopper want?” but “why do users in our highest-value cohort leave at checkout step 3?”

That question only gets a useful answer if you bring the data that’s specific to your product.

What useful proprietary data looks like:

Interview archives tagged by theme and behaviour – searchable, not buried in project folders
Session recordings and click data from your own product
CRM segments based on actual behaviour, not just demographics
Support tickets and complaints as a signal for unmet needs
Data over time – not just how users behave today, but how they’ve changed

Teams that have built this kind of internal knowledge base can produce personas no competitor can copy – because no competitor has the same data. Everyone can access the same AI. Not everyone has built the data that makes it useful.

The model is a commodity. The research archive is not.

Tools worth knowing

Three tools that represent the current state of synthetic persona generation in UX practice – each with a different approach and a different relationship to your own data.

🧪

Synthetic Users

Purpose-built for AI-generated user research. Simulates interviews, concept tests, and usability sessions at scale. The most cited dedicated tool in the UX research community – and a useful reference point for understanding both the potential and the limits of the category.

📊

Delve.ai

Generates personas from your own data sources – website analytics, CRM segments, social media behaviour. One of the few tools that moves away from generic public-data output and toward proprietary signal. Closest to the “bring your own data” model described in this article.

🗺️

UXPressia

Persona and customer journey mapping platform with AI-assisted creation and team collaboration. Strong for connecting personas to journey maps and cross-functional alignment – useful when the goal is not just generating a profile but actually working with it across a team.

All three tools are worth exploring with a critical eye. The question to ask of any synthetic persona tool is always the same: what data is this output actually built on? If the answer is “public web content,” the limitations described in this article apply.

The Risk of Overconfidence

There’s one more issue with synthetic personas that’s easy to miss: AI output looks confident.

The results are detailed, internally consistent, and easy to present. They don’t come with the natural uncertainty of real research – no sample size, no caveats about what participants said versus what they actually did.

When a researcher shares interview findings, the limits are visible. When an AI generates a persona, it reads like fact.

This is where teams get into trouble. A synthetic persona built on generic public data gets treated as a real picture of real users. Decisions get made against it. By the time actual user behaviour reveals the gap, a lot has already been built on a wrong assumption.

The answer isn’t to avoid synthetic personas. It’s to know what they can and can’t tell you – and to feed them with data that’s actually yours.

Where Real Research Matters Most

Synthetic shortcuts are least reliable exactly where the stakes are highest: non-Western markets and users with disabilities. These are not edge cases. Together they represent the majority of the global population.

1.3 bn.

people worldwide live with some form of disability – around 16 % of the global population (WHO, 2023)

~30 %

of real WCAG issues are caught by automated tools – the rest only surface through manual testing with actual Assistive Technology users (Deque Research)

55 %

of all web content is in English – yet only 16 % of the world’s population speak English as a first or second language

Accessibility is not a WCAG checklist problem

WCAG defines technical minimum requirements. Whether a product is genuinely usable for people with disabilities only becomes clear through real testing.

Synthetic personas default to an able-bodied, neurotypical user. Disability only appears when you explicitly ask for it – and even then the output stays shallow. This is not an accident: training data from real Assistive Technology users is thin.

Screen reader users navigate sequentially

NVDA, JAWS, VoiceOver – no mouse, no visual scanning. Heading structure, ARIA labels, and focus order determine whether an interface works at all. No synthetic persona tool reliably simulates this interaction pattern.

Motor impairments mean different paths

Switch access, keyboard-only, eye-tracking: interaction paths, timeout behaviour, and focus management are fundamentally different from mouse and touch usage. An AI persona not trained on these realities cannot correctly anticipate the experience.

Cognitive accessibility is the hardest to simulate

Dyslexia, ADHD, low literacy, cognitive overload – these user realities depend on line length, contrast, language complexity, distraction-free structure, and pacing. AI systematically underestimates how much is decided at this level of detail.

Real research matters most exactly where synthetic shortcuts are least reliable – in non-Western markets and with users with disabilities. These are not edge cases. Together they represent the majority of the world.

Teams that write these users out of their personas – deliberately or simply by choosing a tool that never included them – build products that don’t work for a significant share of their actual audience. No audit, no automated tool, and no synthetic persona replaces direct contact with these people.

Sources & Further Reading

Nielsen Norman Group. Synthetic Users: AI-Generated Research Participants. Assessment of where AI-generated personas work and where they introduce risk in UX practice.
Nielsen Norman Group. International Usability. Research on how cultural context shapes interface expectations, reading patterns, and user behaviour across markets.
World Wide Web Consortium (W3C). Text Direction and Internationalisation. Technical and design implications of RTL, bidirectional, and vertical text in web and product interfaces.
Hofstede Insights. National Culture Model. The foundational framework for cross-cultural dimensions that influence decision-making, trust, and communication patterns in UX research contexts.
Harvard Business Review. The New Rules of Data Privacy. On proprietary customer data as a strategic asset and the advantage it confers over generic market intelligence.
Gartner. What Is Synthetic Data?. On the growing use of synthetic data and the conditions under which it adds versus subtracts value in enterprise AI.
Rosenfeld Media. Research Practice. On building cumulative research knowledge within organisations – the infrastructure that makes proprietary insight possible.
UX Collective. UX Collective. Ongoing practitioner analysis of where synthetic research methods supplement versus compromise real research.
Henrich, J., Heine, S. J., & Norenzayan, A. The Weirdest People in the World? (2010). The foundational paper establishing the WEIRD bias in behavioural research – the structural problem that now applies equally to AI training data.
World Health Organization. Disability and Health. Global prevalence data: 1.3 billion people – 16% of the world’s population – experience significant disability.
Deque Systems. Automated Accessibility Testing Study. Research showing automated tools detect approximately 30–40% of real WCAG issues; the remainder require manual and user testing.
W3C Web Accessibility Initiative. Web Content Accessibility Guidelines (WCAG). The international standard for digital accessibility – and why compliance alone does not guarantee usability for people with disabilities.

René Manikofski is a Senior UX Designer with 10+ years of experience in e-commerce and digital product design across Europe. All articles are based on personal professional experience and supported by AI in writing.