Back AI

AI Token Counter

A fast AI token counter and cost estimator for the major LLM APIs — GPT-4o, GPT-4 Turbo, GPT-3.5, Claude Opus 4, Claude Sonnet 4, and Claude Haiku 4. Paste or type into the textarea and see live estimates of token count, character count, and word count, plus a per-million-token cost breakdown for input, output, and total spend. Pick a model, optionally specify expected output tokens, and get an instant budget for any prompt. Ideal for sizing up long prompts before you send them and for estimating batch-job costs without having to run them first.

Last updated: May 2026 · Reviewed by FreeDevTool AI engineering team
Estimate only. Actual token counts depend on the exact tokenizer per model. For precise values, use OpenAI's tiktoken library or Anthropic's count_tokens endpoint. This tool uses the industry-standard 4-characters-per-token heuristic for English text.
Short prompt (<500) Long prompt (500-2000) Essay (2000-8000) Full article (8000+)
0
Est. Tokens
0
Characters
0
Words
0
Lines

Estimated Cost

Input (0 tokens @ $3.00/M)$0.000000
Output (0 tokens @ $15.00/M)$0.000000
Total per request$0.000000
Copied!

Tokens, tokenizers, and the math of LLM cost — what every AI engineer needs in their head

"Token" is the unit of work for every modern LLM. You pay per token, you fit prompts into a context window measured in tokens, you get rate-limited per token. Yet most developers using these APIs never look under the hood at how text becomes tokens, why the same prompt costs differently across providers, or what actually drives output cost. This guide walks through the byte-pair-encoding mechanics, the per-model tokenizer differences in 2026, the cost-modeling math you should use for budgets, and the specific patterns (prompt caching, batch APIs, smaller models for routing) that cut bills by 50–90%.

What a "token" actually is

An LLM does not see characters or words; it sees integer IDs. Behind that is a fixed vocabulary — typically 50,000 to 200,000 entries — built once during model training, where each entry maps to a sequence of bytes. Tokenization is the algorithm that splits incoming text into the longest matching pieces from that vocabulary.

Almost every modern LLM uses Byte-Pair Encoding (BPE) or a close variant (SentencePiece, WordPiece). BPE works bottom-up:

  1. Start with the byte-level alphabet (256 entries: every possible UTF-8 byte).
  2. Find the most frequent adjacent pair in the training corpus; merge it into a new token.
  3. Repeat 50,000–200,000 times.

The result: common words become single tokens (" the", " and", " of"), rarer words split into 2–4 sub-tokens ("unbelievable""un" + "believable"), and unusual sequences fall back to per-byte tokens (a long random hex string can become 30+ tokens).

The 2026 tokenizer landscape — same text, different counts

ModelTokenizerVocabularyAvg. chars/token (English)
GPT-4 / GPT-4 Turbo / GPT-3.5cl100k_base (BPE)100 277~4.0
GPT-4o / GPT-4o-mini / o3o200k_base (BPE)200 019~4.4 (English) / much better for non-English
Claude 3 / 3.5 / Opus 4 / Sonnet 4 / Haiku 4Anthropic proprietary BPE~65 000~3.7
Gemini 1.5 / 2.0SentencePiece~256 000~4.5
Llama 3 / 3.1SentencePiece (tiktoken-compat for code)128 256~4.1
Mistral Large / MixtralSentencePiece32 768 / 32 000~3.9

Practical impact: the same 1,000-word English article runs about 1,250 tokens on GPT-4o (cl100k_base + o200k upgrades) but ~1,400 tokens on Claude. A 10,000-word document can differ by 1,500+ tokens — meaningful for both cost and context-window planning.

What inflates token count beyond the 4-chars rule

Pricing landscape — January 2026 reference

ModelInput ($/M tokens)Output ($/M tokens)Context window
GPT-4o$2.50$10128 K
GPT-4o-mini$0.15$0.60128 K
GPT-4 Turbo$10$30128 K
GPT-3.5 Turbo$0.50$1.5016 K
Claude Opus 4$15$75200 K
Claude Sonnet 4$3$15200 K
Claude Haiku 4$1$5200 K
Gemini 2.0 Flash$0.075$0.301 M
Gemini 2.0 Pro$1.25$52 M

The 3:1 to 5:1 input-to-output price ratio is consistent across providers. This is because generation is more expensive than prefill: the model has to autoregress one token at a time, while reading input runs in parallel.

Prompt caching — the biggest cost lever in 2026

Both OpenAI and Anthropic now support prompt caching — pay full price the first time you send a long system prompt, then up to 90% off for repeat reads in the next 5 minutes. The math:

If your system prompt is 5,000 tokens and you fire 100 conversations against it per hour, caching saves about $1.10/hour on Sonnet 4 (5,000 × 0.9 × $3/M × 100 = $1.35 saved on input).

Batch APIs — 50% off for non-urgent work

Both OpenAI and Anthropic offer Batch APIs at 50% discount for jobs that don't need real-time responses. The API accepts a JSONL file of prompts and returns results within 24 hours. Use cases:

Combined with caching, this can cut bills 60–70% on batch workloads.

Counting tokens precisely — language by language

Python (OpenAI):
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Hello, world!")
print(len(tokens))   # exact count
Python (Anthropic):
import anthropic
client = anthropic.Anthropic()
result = client.messages.count_tokens(
  model="claude-sonnet-4-latest",
  messages=[{"role": "user", "content": "Hello"}]
)
print(result.input_tokens)
Node.js (OpenAI):
import { encoding_for_model } from 'tiktoken';
const enc = encoding_for_model('gpt-4o');
const tokens = enc.encode('Hello, world!');
console.log(tokens.length);
enc.free();   // tiktoken uses WASM, free to release memory
Node.js (Anthropic):
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const result = await client.messages.countTokens({
  model: 'claude-sonnet-4-latest',
  messages: [{ role: 'user', content: 'Hello' }]
});
console.log(result.input_tokens);
Go (OpenAI):
import "github.com/pkoukk/tiktoken-go"
enc, _ := tiktoken.EncodingForModel("gpt-4o")
tokens := enc.Encode("Hello, world!", nil, nil)
fmt.Println(len(tokens))

Cost optimization patterns that actually work

  1. Smaller model for routing. Route 80% of requests to Haiku/4o-mini ($0.15-1/M); reserve Opus/GPT-4o for the 20% that need it. Savings: 5-10× on routing-eligible traffic.
  2. Cache the system prompt. If 90%+ of your input is identical across requests, caching alone cuts input cost by ~80%.
  3. Trim conversation history. Don't re-send the full transcript every turn. Use sliding window (last N turns) or semantic compaction (LLM-summarized history).
  4. Cap max_tokens. Output is 3-5× more expensive than input. A 500-token cap saves money even when the model could keep going.
  5. Use Batch API for non-interactive workloads. 50% off, ships in < 24 h.
  6. Strip tool schemas you don't need. Each tool definition is sent every turn. Conditionally include only the tools relevant to the current step.
  7. Move from JSON-mode to structured output schemas. Schema-mode (OpenAI's JSON Schema, Anthropic's tool-call schema) reduces malformed outputs and saves retry tokens.
  8. Pre-tokenize and reject overlong prompts. A user prompt that exceeds the context window costs you a failed call and the wasted prompt input bytes.

Common token-counting mistakes

Frequently Asked Questions

How accurate is the 4-characters-per-token estimate?
The 4 characters/token heuristic is an approximation OpenAI publishes as a rule of thumb for English prose. Actual counts depend on the exact tokenizer — GPT-4o uses o200k_base, GPT-3.5 and GPT-4 Turbo use cl100k_base, and Claude uses Anthropic's proprietary BPE tokenizer. Expect accuracy within 10–15% for typical English text. Code, JSON, non-English languages, unusual punctuation, and long unique identifiers can all produce counts that differ by 30% or more from the estimate.
How do I get exact token counts?
For OpenAI models, install the tiktoken Python library (pip install tiktoken) and call encoding_for_model("gpt-4o") to get the exact tokenizer. For Anthropic, use the count_tokens endpoint on the Messages API which returns exact token counts without consuming quota. For JavaScript/Node, gpt-tokenizer and @anthropic-ai/tokenizer provide client-side equivalents. Run these in your prompt pipeline before calling the API if precise counting matters for your budget.
Why does the same text count differently across models?
Each model family uses a different tokenizer. GPT-3.5 and GPT-4 Turbo use cl100k_base (100,277 tokens), GPT-4o uses the newer o200k_base (200,019 tokens) which is roughly 20% more efficient on many non-English languages, and Claude uses its own BPE variant. The same sentence can produce different token counts across providers, which affects both cost (billed per token) and context window usage (tokens consumed against the model's max context).
What are input tokens vs output tokens?
Input tokens are everything you send to the model: the system prompt, user message, prior conversation history, and any tool-call schemas. Output tokens are what the model generates in response. Output tokens are almost always priced 3–5× higher than input, because autoregressive generation is more computationally expensive than prefill. This is why strategies like limiting max_tokens, requesting terse responses, and caching long system prompts have an outsized effect on cost.
How do I reduce my AI API costs?
Four main strategies: (1) Use smaller models for simple tasks — Haiku or GPT-4o-mini handle most classification and extraction jobs as well as their larger siblings at 10× lower cost; (2) Enable prompt caching to reuse repeated system prompts at up to 90% discount (both OpenAI and Anthropic support this); (3) Trim conversation history so you're not re-sending the whole transcript every turn; (4) Use the Batch API for non-urgent workloads — 50% discount on OpenAI, similar on Anthropic. Careful prompt engineering to strip unnecessary context is the single biggest lever in most apps.
Are the prices on this page current?
The prices are illustrative values reflecting published rates as of January 2026: GPT-4o at $2.50/$10, GPT-4 Turbo at $10/$30, GPT-3.5 at $0.50/$1.50, Claude Opus 4 at $15/$75, Claude Sonnet 4 at $3/$15, and Claude Haiku 4 at $1/$5 per million input/output tokens respectively. Providers adjust pricing periodically — always check OpenAI's and Anthropic's pricing pages before committing to a cost model for a production budget.

Browse all 50 free developer tools

All tools run in your browser, no signup required, nothing sent to a server.