How are tokens counted in ChatGPT?

ChatGPT uses the same tokenizer as the underlying model: cl100k_base for GPT-3.5, o200k_base for GPT-4o. A token is approximately 4 English characters or 0.75 words. Punctuation, numbers, code, and non-English text tokenize differently. Use OpenAI tiktoken library for exact counts.

How much does GPT-4o cost per million tokens in 2026?

Approximately $2.50 per million input tokens and $10 per million output tokens. Output tokens are 4x more expensive than input. For a typical chat application generating long responses, output cost dominates the bill. Check live pricing for current rates.

What is a context window and how does it affect cost?

The context window is the maximum tokens the model can read in one request (currently 128K-200K for major models). Every token in the context costs input rate per request — even if you send the same long system prompt every turn. Prompt caching is the main cost reduction lever.

Back AI

AI Token Counter

A fast AI token counter and cost estimator for the major LLM APIs — GPT-4o, GPT-4 Turbo, GPT-3.5, Claude Opus 4, Claude Sonnet 4, and Claude Haiku 4. Paste or type into the textarea and see live estimates of token count, character count, and word count, plus a per-million-token cost breakdown for input, output, and total spend. Pick a model, optionally specify expected output tokens, and get an instant budget for any prompt. Ideal for sizing up long prompts before you send them and for estimating batch-job costs without having to run them first.

Last updated: May 2026 · Written by Anees Ur Rehman, full-stack developer

Estimate only. Actual token counts depend on the exact tokenizer per model. For precise values, use OpenAI's tiktoken library or Anthropic's count_tokens endpoint. This tool uses the industry-standard 4-characters-per-token heuristic for English text.

Short prompt (<500) Long prompt (500-2000) Essay (2000-8000) Full article (8000+)

Est. Tokens

Characters

Words

Lines

Estimated Cost

Input (0 tokens @ $3.00/M)$0.000000

Output (0 tokens @ $15.00/M)$0.000000

Total per request$0.000000

Copied!

An LLM token is the unit of text a model processes — typically 4 characters of English text or roughly 0.75 words. Each model family (GPT, Claude, Gemini, Mistral) uses a different tokenizer, so the same text produces different counts across providers. This free AI token counter estimates tokens and per-request cost across GPT-4o, GPT-4 Turbo, GPT-3.5, Claude Opus 4, Sonnet 4, and Haiku 4 side by side — useful for budgeting LLM workloads before you ship.

Examples

GPT-4o vs Claude Sonnet 4§§Same 100-word English prompt: GPT-4o uses ~125 tokens, Claude Sonnet 4 uses ~138 tokens (~10% more). The same input, different cost.§§Cost of one chat turn§§A 2,000-token system prompt sent on every request costs ~$0.005 per request on GPT-4o (input). At 100,000 requests/month that is $500 in input alone — before output tokens.§§Output tokens cost 3-5x more§§Output tokens are priced 3-5x higher than input. A long-form summarization feature can have output dominate the bill, even though the input feels bigger.

Tokens, tokenizers, and the math of LLM cost — what every AI engineer needs in their head

"Token" is the unit of work for every modern LLM. You pay per token, you fit prompts into a context window measured in tokens, you get rate-limited per token. Yet most developers using these APIs never look under the hood at how text becomes tokens, why the same prompt costs differently across providers, or what actually drives output cost. This guide walks through the byte-pair-encoding mechanics, the per-model tokenizer differences in 2026, the cost-modeling math you should use for budgets, and the specific patterns (prompt caching, batch APIs, smaller models for routing) that cut bills by 50–90%.

What a "token" actually is

An LLM does not see characters or words; it sees integer IDs. Behind that is a fixed vocabulary — typically 50,000 to 200,000 entries — built once during model training, where each entry maps to a sequence of bytes. Tokenization is the algorithm that splits incoming text into the longest matching pieces from that vocabulary.

Almost every modern LLM uses Byte-Pair Encoding (BPE) or a close variant (SentencePiece, WordPiece). BPE works bottom-up:

Start with the byte-level alphabet (256 entries: every possible UTF-8 byte).
Find the most frequent adjacent pair in the training corpus; merge it into a new token.
Repeat 50,000–200,000 times.

The result: common words become single tokens (" the", " and", " of"), rarer words split into 2–4 sub-tokens ("unbelievable" → "un" + "believable"), and unusual sequences fall back to per-byte tokens (a long random hex string can become 30+ tokens).

The 2026 tokenizer landscape — same text, different counts

Model	Tokenizer	Vocabulary	Avg. chars/token (English)
GPT-4 / GPT-4 Turbo / GPT-3.5	cl100k_base (BPE)	100 277	~4.0
GPT-4o / GPT-4o-mini / o3	o200k_base (BPE)	200 019	~4.4 (English) / much better for non-English
Claude 3 / 3.5 / Opus 4 / Sonnet 4 / Haiku 4	Anthropic proprietary BPE	~65 000	~3.7
Gemini 1.5 / 2.0	SentencePiece	~256 000	~4.5
Llama 3 / 3.1	SentencePiece (tiktoken-compat for code)	128 256	~4.1
Mistral Large / Mixtral	SentencePiece	32 768 / 32 000	~3.9

Practical impact: the same 1,000-word English article runs about 1,250 tokens on GPT-4o (cl100k_base + o200k upgrades) but ~1,400 tokens on Claude. A 10,000-word document can differ by 1,500+ tokens — meaningful for both cost and context-window planning.

What inflates token count beyond the 4-chars rule

Code: indentation, punctuation, and identifiers chew through tokens. JavaScript and Python both run roughly 30% denser than English prose.
JSON: structural {/}/"/,/: punctuation each becomes its own token. JSON is typically 50% more tokens-per-character than equivalent prose.
Long random strings: UUIDs, hashes, base64 — each character can become its own token. A 36-char UUID is often 25+ tokens.
Non-English text: older tokenizers (cl100k_base) split CJK characters into 2-3 byte tokens each. Newer tokenizers (o200k_base, Gemini's SentencePiece) handle this better.
Emoji and Unicode: each emoji is typically 2-4 tokens because they're multi-byte.
Tool-call schemas: the JSON schema for a function definition is sent every turn. A 500-token tool definition × 10 tools × 20-turn conversation = 100K wasted tokens.

Pricing landscape — January 2026 reference

Model	Input ($/M tokens)	Output ($/M tokens)	Context window
GPT-4o	$2.50	$10	128 K
GPT-4o-mini	$0.15	$0.60	128 K
GPT-4 Turbo	$10	$30	128 K
GPT-3.5 Turbo	$0.50	$1.50	16 K
Claude Opus 4	$15	$75	200 K
Claude Sonnet 4	$3	$15	200 K
Claude Haiku 4	$1	$5	200 K
Gemini 2.0 Flash	$0.075	$0.30	1 M
Gemini 2.0 Pro	$1.25	$5	2 M

The 3:1 to 5:1 input-to-output price ratio is consistent across providers. This is because generation is more expensive than prefill: the model has to autoregress one token at a time, while reading input runs in parallel.

Prompt caching — the biggest cost lever in 2026

Both OpenAI and Anthropic now support prompt caching — pay full price the first time you send a long system prompt, then up to 90% off for repeat reads in the next 5 minutes. The math:

Anthropic: 90% discount on cache reads, 25% premium on cache writes. Break-even after 2 reads. Use it when the same system prompt fires 3+ times per session.
OpenAI: 50% discount on cached tokens, no write premium, automatic for prompts ≥ 1024 tokens. No opt-in required.
Bedrock / Vertex: provider-specific; check the AWS/GCP docs.

If your system prompt is 5,000 tokens and you fire 100 conversations against it per hour, caching saves about $1.10/hour on Sonnet 4 (5,000 × 0.9 × $3/M × 100 = $1.35 saved on input).

Batch APIs — 50% off for non-urgent work

Both OpenAI and Anthropic offer Batch APIs at 50% discount for jobs that don't need real-time responses. The API accepts a JSONL file of prompts and returns results within 24 hours. Use cases:

Embedding large document corpora.
Bulk classification (10K product descriptions, 1M support tickets).
Synthetic data generation for fine-tuning.
Backfilling AI summaries on historical data.

Combined with caching, this can cut bills 60–70% on batch workloads.

Counting tokens precisely — language by language

Python (OpenAI):

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Hello, world!")
print(len(tokens))   # exact count

Python (Anthropic):

import anthropic
client = anthropic.Anthropic()
result = client.messages.count_tokens(
  model="claude-sonnet-4-latest",
  messages=[{"role": "user", "content": "Hello"}]
)
print(result.input_tokens)

Node.js (OpenAI):

import { encoding_for_model } from 'tiktoken';
const enc = encoding_for_model('gpt-4o');
const tokens = enc.encode('Hello, world!');
console.log(tokens.length);
enc.free();   // tiktoken uses WASM, free to release memory

Node.js (Anthropic):

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const result = await client.messages.countTokens({
  model: 'claude-sonnet-4-latest',
  messages: [{ role: 'user', content: 'Hello' }]
});
console.log(result.input_tokens);

Go (OpenAI):

import "github.com/pkoukk/tiktoken-go"
enc, _ := tiktoken.EncodingForModel("gpt-4o")
tokens := enc.Encode("Hello, world!", nil, nil)
fmt.Println(len(tokens))

Cost optimization patterns that actually work

Smaller model for routing. Route 80% of requests to Haiku/4o-mini ($0.15-1/M); reserve Opus/GPT-4o for the 20% that need it. Savings: 5-10× on routing-eligible traffic.
Cache the system prompt. If 90%+ of your input is identical across requests, caching alone cuts input cost by ~80%.
Trim conversation history. Don't re-send the full transcript every turn. Use sliding window (last N turns) or semantic compaction (LLM-summarized history).
Cap max_tokens. Output is 3-5× more expensive than input. A 500-token cap saves money even when the model could keep going.
Use Batch API for non-interactive workloads. 50% off, ships in < 24 h.
Strip tool schemas you don't need. Each tool definition is sent every turn. Conditionally include only the tools relevant to the current step.
Move from JSON-mode to structured output schemas. Schema-mode (OpenAI's JSON Schema, Anthropic's tool-call schema) reduces malformed outputs and saves retry tokens.
Pre-tokenize and reject overlong prompts. A user prompt that exceeds the context window costs you a failed call and the wasted prompt input bytes.

Common token-counting mistakes

Trusting the 4-chars rule for non-English text. CJK languages run 1.5-3 chars/token on older tokenizers. Use the actual tokenizer.
Forgetting tool calls and function schemas. They count as input tokens every turn.
Counting only the user message and missing the system prompt. Long system prompts (instructions, persona, examples) often dominate.
Counting input tokens but not estimating output. Output dominates cost on most chat workloads.
Comparing prices without comparing models. Claude Haiku 4 ($1/$5) is not equivalent quality to GPT-4o ($2.50/$10). Run quality eval before optimizing on price.
Sending whole files as context. A 10 MB log file dumped into a prompt is > 2 M tokens. Pre-process: chunk + retrieve + summarize.

Frequently Asked Questions

How accurate is the 4-characters-per-token estimate?

The 4 characters/token heuristic is an approximation OpenAI publishes as a rule of thumb for English prose. Actual counts depend on the exact tokenizer — GPT-4o uses o200k_base, GPT-3.5 and GPT-4 Turbo use cl100k_base, and Claude uses Anthropic's proprietary BPE tokenizer. Expect accuracy within 10–15% for typical English text. Code, JSON, non-English languages, unusual punctuation, and long unique identifiers can all produce counts that differ by 30% or more from the estimate.

How do I get exact token counts?

For OpenAI models, install the tiktoken Python library (pip install tiktoken) and call encoding_for_model("gpt-4o") to get the exact tokenizer. For Anthropic, use the count_tokens endpoint on the Messages API which returns exact token counts without consuming quota. For JavaScript/Node, gpt-tokenizer and @anthropic-ai/tokenizer provide client-side equivalents. Run these in your prompt pipeline before calling the API if precise counting matters for your budget.

Why does the same text count differently across models?

Each model family uses a different tokenizer. GPT-3.5 and GPT-4 Turbo use cl100k_base (100,277 tokens), GPT-4o uses the newer o200k_base (200,019 tokens) which is roughly 20% more efficient on many non-English languages, and Claude uses its own BPE variant. The same sentence can produce different token counts across providers, which affects both cost (billed per token) and context window usage (tokens consumed against the model's max context).

What are input tokens vs output tokens?

Input tokens are everything you send to the model: the system prompt, user message, prior conversation history, and any tool-call schemas. Output tokens are what the model generates in response. Output tokens are almost always priced 3–5× higher than input, because autoregressive generation is more computationally expensive than prefill. This is why strategies like limiting max_tokens, requesting terse responses, and caching long system prompts have an outsized effect on cost.

How do I reduce my AI API costs?

Four main strategies: (1) Use smaller models for simple tasks — Haiku or GPT-4o-mini handle most classification and extraction jobs as well as their larger siblings at 10× lower cost; (2) Enable prompt caching to reuse repeated system prompts at up to 90% discount (both OpenAI and Anthropic support this); (3) Trim conversation history so you're not re-sending the whole transcript every turn; (4) Use the Batch API for non-urgent workloads — 50% discount on OpenAI, similar on Anthropic. Careful prompt engineering to strip unnecessary context is the single biggest lever in most apps.

Are the prices on this page current?

The prices are illustrative values reflecting published rates as of January 2026: GPT-4o at $2.50/$10, GPT-4 Turbo at $10/$30, GPT-3.5 at $0.50/$1.50, Claude Opus 4 at $15/$75, Claude Sonnet 4 at $3/$15, and Claude Haiku 4 at $1/$5 per million input/output tokens respectively. Providers adjust pricing periodically — always check OpenAI's and Anthropic's pricing pages before committing to a cost model for a production budget.

Browse all 50 free developer tools

All tools run in your browser, no signup required, nothing sent to a server.

b64

Encoding & Conversion

11 tools

{ }

Formatting & Generators

13 tools

Minifiers & DevOps

6 tools

Security & Hashing

3 tools

Code & Text

8 tools

Network & APIs

3 tools

⏱

Time & Dates

3 tools

SEO

SEO & Meta

3 tools

AI Token Counter

Estimated Cost

Examples

Tokens, tokenizers, and the math of LLM cost — what every AI engineer needs in their head

What a "token" actually is

The 2026 tokenizer landscape — same text, different counts

What inflates token count beyond the 4-chars rule

Pricing landscape — January 2026 reference

Prompt caching — the biggest cost lever in 2026

Batch APIs — 50% off for non-urgent work

Counting tokens precisely — language by language

Cost optimization patterns that actually work

Common token-counting mistakes

Frequently Asked Questions

Related Tools

Browse all 50 free developer tools

Encoding & Conversion

Formatting & Generators

Minifiers & DevOps

Security & Hashing

Code & Text

Network & APIs

Time & Dates

SEO & Meta