What does "character count" actually mean?
"Character count" sounds simple — count the letters — but it's deceptively complex once you account for Unicode, emoji, and platform-specific quirks. The same 30-letter sentence can register as 30, 60, or even 90 "characters" depending on whether you measure code points, UTF-16 code units, UTF-8 bytes, or graphemes (visible character clusters). Picking the right metric depends on where the text is going.
| Metric | What it counts | When to use |
|---|---|---|
| Code points (Unicode chars) | Each Unicode character, regardless of byte size | Most "human" character counts — Word, Google Docs, blog editors |
| UTF-16 code units | JavaScript's .length — counts each 16-bit unit | Default in JavaScript, Java, .NET. Emoji often count as 2. |
| UTF-8 bytes | Bytes when stored in UTF-8 | Database column sizing, SMS billing, network payloads |
| Graphemes (visible chars) | What humans see as "one character" | Cursor positioning, deletion behavior, what users intuitively expect |
| Words | Sequences separated by whitespace | Reading time, content sizing, Word/Docs counts |
Take the family emoji 👨👩👧👦. To a human, it's one character. JavaScript's "👨👩👧👦".length returns 11 (UTF-16 code units). Stored as UTF-8, it's 25 bytes. Twitter counts it as 2 weighted units. All five answers are technically correct — just for different questions.
Platform-specific character limits in 2026
| Platform | Limit | Counting rules |
|---|---|---|
| Twitter / X | 280 (Free) / 25,000 (Premium) | BMP chars = 1, CJK + emoji = 2. URLs always 23 (t.co shortener). |
| SMS (GSM-7) | 160 single / 153 per segment when concatenated | Latin alphabet only. Extended chars (€, [, ], ~) consume 2 slots. |
| SMS (UCS-2) | 70 single / 67 per segment when concatenated | Triggered by any non-GSM-7 char (emoji, Cyrillic, Arabic, Chinese) |
| Email subject (RFC 5322) | 998 hard / ~78 recommended | 78 chars before wrapping; mobile previews truncate ~50. |
| Email body line | 998 hard / 78 recommended | RFC 5322 §2.1.1. Many MTAs reject > 998. |
HTML <title> | ~60 chars visible in SERPs | Google truncates around 580 pixels (≈55–60 chars on desktop) |
| Meta description | ~155 chars mobile / 160 desktop | Truncated with "…" beyond. |
| Open Graph title | 60 visible / 70 truncate | Facebook, LinkedIn, Slack previews |
| Open Graph description | 200 visible / 297 truncate | Slack collapses to 1 line; Facebook expands. |
| YouTube title | 100 char limit / 70 visible | Mobile truncates around 50. |
| YouTube description | 5,000 chars total / first 157 visible "above the fold" | Use the first 157 wisely — that's the SERP snippet too. |
| Instagram caption | 2,200 chars | Truncated with "...more" after 125 chars. |
| LinkedIn post | 3,000 chars | Truncated after 210 chars on feed; click "see more" to expand. |
| Reddit title | 300 chars | Most subreddits enforce shorter custom limits. |
| HN title | 80 chars | Hard limit. Concise wins. |
| Slack message | 40,000 chars | Effectively unlimited; mobile collapses long messages. |
| WhatsApp message | 65,536 chars | Effectively unlimited. |
Tweet character counting — the surprising rules
Twitter/X's "280 character" limit isn't 280 raw characters. It's 280 weighted units after Unicode NFC normalization, with two important rules:
- Most characters = 1 weight. Latin letters, digits, common punctuation, currency symbols, even emoji modifiers like skin tones.
- CJK characters = 2 weight. Chinese, Japanese, Korean characters in specific Unicode blocks count double. This is intentional — they convey more information per glyph.
- URLs = 23 weight always. Twitter wraps every URL in its
t.coshortener, sohttps://example.com/an/extremely/long/url?with=paramsstill costs only 23 units. This is why URL placement strategy matters. - Mentions and hashtags = standard char count.
@usernameis 9 chars — no discount.
SMS character counting — GSM-7, UCS-2, and the segment break
SMS is even trickier than tweets. The 160-character limit comes from a 1980s decision: 7 bits per character × 140 bytes per SMS = 160 chars per single message.
GSM-7 (default for Latin-alphabet languages)
The standard SMS encoding fits exactly 160 chars in one SMS. But it has a quirky character set — only 128 chars from a specific Latin/Greek table. Common symbols like €, [, ], {, }, ~, \, | aren't in the basic set; they live in the extended table and consume 2 slots each. So a message with 80 letters + 5 Euro signs = 80 + 10 = 90 effective chars.
UCS-2 (when GSM-7 isn't enough)
Any character outside the GSM-7 table (a single emoji, a Cyrillic letter, a Chinese char, an em dash) automatically switches the entire message to UCS-2 — 2 bytes per char — capping single messages at 70 chars. One emoji can drop your limit from 160 to 70 in a single message.
Concatenated messages
Long messages get split into segments with a 6-byte User Data Header (UDH). Each segment fits 153 GSM-7 chars or 67 UCS-2 chars. The receiver's phone reassembles them. Billing is per segment — sending a 200-char SMS = 2 segments = 2× the cost.
Reading and speaking time math
| Activity | Average rate | Use case |
|---|---|---|
| Silent reading (adult, English prose) | 200–250 wpm | Blog posts, articles. Use 200 wpm for skim-friendly content. |
| Silent reading (technical content) | 50–125 wpm | Code-heavy or jargon-dense docs. Slower because comprehension matters. |
| Speaking (conversational) | 120–150 wpm | Conversation, presentations |
| Speaking (broadcast / podcast) | 140–160 wpm | Pro speakers; what most podcasts trend toward |
| Speaking (news anchor) | 150–170 wpm | Faster than conversational; trained pace |
| Speaking (auctioneer) | 250+ wpm | The extreme; not useful for content estimation |
Practical estimates: 3-minute blog post = ~600 words. 5-minute podcast monologue = ~700 words. 10-minute conference talk = ~1,300 words spoken at relaxed pace. Use these to size content before writing.
Counting characters in 8 programming languages
JavaScript
// Naive — counts UTF-16 code units, NOT graphemes
"hello".length; // 5 ✓
"👨👩👧👦".length; // 11 ✗ (one family, 11 code units)
// Correct grapheme count (modern browsers)
[...new Intl.Segmenter().segment("👨👩👧👦")].length; // 1 ✓
// UTF-8 byte count
new TextEncoder().encode("café").length; // 5 (é = 2 bytes)
// Word count
text.trim().split(/\s+/).filter(Boolean).length;
Python
# Code-point count (Python 3 default — usually what you want)
len("café") # 4 ✓
# UTF-8 byte count
len("café".encode("utf-8")) # 5
# Grapheme count (uses regex package, not stdlib)
import regex
len(regex.findall(r'\X', "👨👩👧👦")) # 1
# Word count
len(text.split())
PHP
// strlen() returns BYTES (not chars) — common bug for UTF-8
strlen("café"); // 5 ✗ (counts bytes)
// mb_strlen — code-point count
mb_strlen("café", "UTF-8"); // 4 ✓
// Word count
str_word_count($text);
Go
// len() returns BYTES on strings
len("café") // 5 (bytes)
// Rune count (code points)
import "unicode/utf8"
utf8.RuneCountInString("café") // 4 ✓
// Range over runes
for _, r := range s { ... }
Rust
// .len() returns bytes
"café".len(); // 5
// .chars().count() — code points
"café".chars().count(); // 4 ✓
// Grapheme count via unicode-segmentation crate
use unicode_segmentation::UnicodeSegmentation;
"👨👩👧👦".graphemes(true).count(); // 1
Java
// .length() returns UTF-16 code units
"café".length(); // 4
"👨👩👧👦".length(); // 11
// Code-point count
"👨👩👧👦".codePointCount(0, "👨👩👧👦".length()); // 7
// UTF-8 byte count
"café".getBytes(StandardCharsets.UTF_8).length; // 5
Ruby
"café".length # 4 (chars in default encoding)
"café".bytesize # 5 (bytes)
"café".chars.length # 4
# Grapheme cluster count (Ruby 2.5+)
"👨👩👧👦".grapheme_clusters.length # 1
Bash
# wc — word/char/byte/line count
echo "hello world" | wc -c # bytes (12 — includes newline!)
echo -n "hello world" | wc -c # bytes (11)
echo -n "hello world" | wc -m # chars (locale-aware)
echo "hello world" | wc -w # words (2)
echo -e "line1\nline2" | wc -l # lines (2)
# String length in bash
str="café"
echo "${#str}" # 4 (chars)
Best character counter for 2026 — what to compare
Search results for "character counter online", "word counter", and "tweet character count" return many tools but most fail on real-world counts: they count UTF-16 code units instead of characters (so an emoji counts as 2), they ignore Twitter's CJK double-counting rule, or they don't surface SMS GSM-7 vs UCS-2 segment math. Here's how the most-used counters compare in 2026:
| Tool | Unicode-correct | Tweet rule + CJK | SMS GSM-7 / UCS-2 | Reading time | Cost |
|---|---|---|---|---|---|
| FreeDevTool Character Counter | NFC normalized + grapheme clusters | Yes (with CJK 2-unit rule) | Both with segment count | 200 WPM read + 130 WPM speak | Free |
| charactercount.online | Code units only | No | Generic only | Yes | Free, ad-funded |
| wordcounter.net | Code units | No | No | Yes | Free, ad-heavy |
| twittercount.com | Tweet-specific | Yes | No | No | Free |
| Microsoft Word "Word Count" | Code units | No | No | Yes | Built into Office |
How do I count characters for a tweet correctly (with CJK and emoji)?
Twitter's character count is NOT a simple JavaScript str.length. The rules: 1) apply Unicode NFC normalization first; 2) each Basic Multilingual Plane glyph counts as 1 unit; 3) characters in certain Chinese, Japanese, Korean, and emoji ranges count as 2 units (so a single Chinese ideograph eats 2 of your 280 budget); 4) URLs are always shortened to a fixed 23-character t.co length regardless of original length, even though the visible URL stays full. This counter applies all four rules — paste any tweet draft and the displayed count matches what twitter.com will count. Most generic counters get this wrong by 20-50% on multi-script content.
What's the difference between UTF-8 bytes, characters, and grapheme clusters?
Three distinct quantities frequently confused: UTF-8 bytes = how the text is encoded on disk or in transit. ASCII = 1 byte, accented Latin = 2 bytes, most CJK = 3 bytes, emoji = 4 bytes. Database varchar columns and HTTP payloads measure this. Characters (code points) = abstract Unicode characters. The letter "é" is 1 character but 2 bytes in UTF-8. Grapheme clusters = what humans perceive as one character. The emoji 👨👩👧 (family) is 1 grapheme cluster, 5 code points, 17 UTF-8 bytes. Word processors and Twitter count grapheme clusters; databases count bytes; JavaScript's str.length counts UTF-16 code units (a surrogate-pair emoji like 🚀 reads as 2). This counter shows all three so you can match whichever your downstream system measures.
Character counter alternative to wordcounter.net — 4 reasons writers switched
- Unicode-correct counting. Emoji 🚀 counts as 1 grapheme (correct), not 2 (UTF-16 code units). Critical for any social, marketing, or i18n copy work.
- Platform limit progress bars. Tweet (280), SMS (160/70), Meta description (155), SEO title (60), Google Ads headline (30), Instagram caption (2,200) — all visible simultaneously with overflow warnings.
- SMS GSM-7 vs UCS-2 segment math. Drop a single emoji into an SMS and it switches from 160-char GSM-7 to 70-char UCS-2 — multi-segment cost balloons. This counter shows the segment count + per-segment cost in real time.
- No ads, no popups, no upload. Tools indexed for "character counter online" almost universally inject ads. This page is browser-only and persists nothing.
Pair the character counter with the Lorem Ipsum Generator for placeholder copy, the Case Converter for naming-convention transforms, the String Escape Tool for character-level transformations, and the Code & Text Tools hub for the broader text toolkit.
Character counter best practices
- Pick the right metric for the destination. Twitter wants weighted units; SMS wants encoded bytes; databases want UTF-8 bytes; humans want graphemes.
- For UTF-8 columns, size by bytes. A "VARCHAR(255)" can hold 255 ASCII chars or 63 emoji. Plan accordingly.
- Test with real-world content. Your average user has at least one accented character or emoji somewhere. Lorem ipsum doesn't catch encoding bugs.
- Beware of
String.lengthin JavaScript and Java. Both count UTF-16 code units, not characters. UseIntl.Segmenteror grapheme libraries for user-facing counts. - Strip URLs before counting tweets. They cost a fixed 23 units regardless of length — your "real" content has more room than the raw count suggests.
- Reading-time estimates are rough. Use 200 wpm as a default; show "5-min read" not "4 min 47 sec." Precision implies false confidence.
- For SMS, test with a single emoji. One emoji drops your limit from 160 to 70. Marketing messages are often optimized for GSM-7 only.
- Display "X / 280" not just "X". Users want to see the limit too. Color the counter red as it approaches the limit.