Skip to main content
BlogWhy AI Text Gets Flagged
Technical primer
Why AI Text Gets Flagged: The Technical Reason Detectors Work

AI detectors do not check a database. They measure perplexity, burstiness, and signature phrasing. Here is what those metrics actually mean and how to move them.

By the Humanize AI editorial team14 min read

What detectors actually do

There is a common misunderstanding: that AI detectors check submitted text against a database of known AI output, the way plagiarism detectors check for matches against published sources. That is not how they work. Modern AI detectors are supervised classifiers trained on millions of examples labelled human or AI. They learn the statistical fingerprint of machine-generated text, then they score new submissions against that fingerprint.

The classifier does not need to have seen your specific text before. It only needs to recognize the pattern. Two numbers do most of the work.

1-9%
False positive rate
across major detectors, varies by genre
300+
Words minimum
below this, verdicts get noisy
2x-5x
Signature word skew
vs human prose for words like "delve"
0
Database lookups
detectors are statistical, not search-based

Perplexity

Perplexity comes from information theory. Given a sentence, you ask a language model: how surprised were you by each word? A model that finds the next word obvious has low perplexity. A model that finds the next word unexpected has high perplexity. AI writing is low-perplexity by construction. Human writing scores higher.

Perplexity →DensityAI textHuman text
Perplexity distribution: AI text clusters low and tight; human text spreads higher with a wider tail.
Low perplexity (AI)
"Photosynthesis is the process by which plants convert sunlight into energy. This complex biochemical pathway involves chlorophyll, which absorbs light and converts carbon dioxide and water into glucose and oxygen."
Higher perplexity (human)
"Plants run on sunlight, but the chemistry is more interesting than that sounds. Chlorophyll grabs photons, kicks electrons loose, and uses the energy to staple carbon dioxide and water into sugar. Oxygen is the leftover."
Same content. The second version uses words a model would not have predicted: grabs, kicks loose, staple, leftover.

Burstiness

Burstiness measures the variance in your writing rhythm. Specifically: how much do sentence lengths and structures change from one sentence to the next? Human writing is bursty. We write a four-word sentence then a forty-word one. We toss in a fragment. We start with And sometimes.

AI writing has low burstiness. Sentences cluster around the same length. Paragraphs tend to be three or four sentences each. Lists have parallel construction. The rhythm is metronomic.

AI rhythmHuman rhythmSentences →Sentences →
Sentence length sequence in a 10-sentence sample. AI clusters around the mean. Human varies wildly.

Signature phrasing

Beyond the two big statistical signals, classifiers also pick up on tokens that disproportionately show up in model output. These signature words are not unique to AI but appear at rates several times higher than in equivalent human writing.

Words that flag at multiples of human-baseline rates
delveembarknavigatefosterleveragerobustintricatetapestryrealmlandscapecomprehensivemultifacetedpivotalnuancedunderscoreelucidateencompassparamount

Punctuation matters too. Modern OpenAI models love em dashes. They use them where a human writer would pick a comma or a period. Heavy use of em dashes in a paragraph is a strong AI signal in 2026. See our ChatGPT humanizer page for a fuller list of GPT-4o signature patterns.

How the major detectors compare

DetectorTuned forPrimary signalFalse-positive rate
Turnitin AIStudent essays, academic proseDocument-level classifier on academic corpus<1% (claimed); higher for non-native
GPTZeroGeneral writing, journalismPerplexity + burstiness1-3% in independent tests
Originality.aiSEO and content marketingTransformer + heuristics for web copy2-5% in independent tests
CopyleaksEnterprise + multilingualDeep model + structural analyzervaries by language, 1-9%
Each detector is tuned for a specific genre, which is why testing humanization across multiple detectors matters.

Why false positives happen

AI detectors flag human writing as AI more often than they should. Several things drive this: non-native English speakers tend to write with simpler vocabulary and more uniform sentence structure, which raises false positive rate sharply. Formal academic writing is supposed to be uniform and well-structured. Technical writing has narrow vocabulary by necessity. Short passages of 200 words or less give the classifier insufficient signal.

The ethical wrinkle
A detector that flags 9% of legitimate human writing as AI in a class of 30 students will wrongly accuse roughly three of them in a single assignment. Detector verdicts should be one signal in a process, not a forensic conclusion.

What humanization actually changes

To move text from AI to human in a classifier's view, you are moving four numbers. A real humanizer does all four steps in one pass:

1
Raise perplexity
Substitute high-probability words for unexpected ones. Replace utilize with use, delve into with look at.
2
Raise burstiness
Vary sentence length aggressively. Mix three-word sentences into long-sentence paragraphs. Add fragments.
3
Strip signatures
Replace high-frequency AI vocabulary. Cut em dashes. Break parallel constructions in lists.
4
Preserve meaning
Keep the message, ideas, and facts intact. This is the hard part and where automated humanizers earn their keep.

What this means for your work

The right mental model: AI detection is a probabilistic measurement of statistical signatures, not a search of a database. Substitution-only humanizers (the kind that swap synonyms one word at a time) leave perplexity and burstiness almost untouched and are easy to detect.

A humanizer that actually works restructures sentences, varies length, and removes signature phrasing. That is what we built into the free Humanize AI tool. For specific guidance by detector, see our walkthroughs for Turnitin, GPTZero, Originality.ai, and Copyleaks. By source model, see ChatGPT, Claude, and Gemini.

Frequently asked questions

Do AI detectors check your text against a database?

No. Modern AI detectors are statistical classifiers. They estimate whether the patterns of word choice and sentence structure match what a language model would produce.

What is perplexity in AI detection?

Perplexity measures how surprising the next word is given the previous words, scored by a language model. AI text scores low. Human text scores higher.

What is burstiness?

Burstiness is the variance in sentence length and complexity. Humans write in bursts. AI tends toward uniform sentence length, which produces low burstiness.

Can a detector tell which model produced the text?

Sometimes, but not reliably. Each model has signature words. ChatGPT favors delve, embark, navigate. Claude favors longer flowing prose. Detectors usually answer the binary AI-or-human question.

Why does humanization work?

Humanization rewrites text to raise its perplexity and burstiness. It substitutes high-probability words for less expected ones, varies sentence length, and removes signature AI phrasing.

Sources and further reading

The technical claims in this article draw on the following primary sources. We link them directly so readers can verify and dig deeper.

Try the free humanizer

Paste your AI-generated text. Get back something that reads naturally and moves the perplexity and burstiness numbers in the right direction. No signup, no word limit.

Open the free tool