SCRIPTA Learn

Learn Unicode

The agreement to contain all human characters — learn codepoints, encoding, and decomposition step by step.

◈ How Characters Are Stored

Computers store all information as numbers. Images, sounds, and text are all ultimately sequences of 0s and 1s. Characters are no exception — each character displayed on screen is remembered internally as a unique number.

For example, the letter A is stored as 65, and B as 66. A table that maps characters to numbers is called a character encoding.

🔬 Try it yourself

AU+0041Basic Latin

BU+0042Basic Latin

CU+0043Basic Latin

◈ Why Text Gets Corrupted

The problem is that different countries and companies created different encoding tables. Korea used EUC-KR, Japan used Shift-JIS, and Western Europe used ISO-8859.

⚠️Korean text saved as EUC-KR, when read as Shift-JIS, is interpreted as completely different characters. This is the cause of "mojibake" (garbled text).

To resolve this chaos, a project began in 1991 to create one unified code table for all characters in the world — Unicode.

◈ What Is a Codepoint?

Unicode assigns each character a unique number called a codepoint, written as U+ followed by a hexadecimal number. For example, the Korean character 한 has codepoint U+D55C.

Char	Codepoint	Decimal	Description
A	U+0041	65	Latin Capital Letter A
한	U+D55C	54620	Hangul Syllable HAN
α	U+03B1	945	Greek Small Letter Alpha
🌏	U+1F30F	127759	Earth Globe Asia-Australia

Unicode currently contains over 150,000 characters, covering nearly all human writing systems from ancient scripts to emoji.

🔬 Check codepoints

한U+D55CHangul Syllables

ᄒU+1112초성

ᅡU+1161중성

ᆫU+11AB종성

글U+AE00Hangul Syllables

ᄀU+1100초성

ᅳU+1173중성

ᆯU+11AF종성