SCRIPTA Learn

Learn Unicode

The agreement to contain all human characters — learn codepoints, encoding, and decomposition step by step.

How Characters Are Stored

Computers store all information as numbers. Images, sounds, and text are all ultimately sequences of 0s and 1s. Characters are no exception — each character displayed on screen is remembered internally as a unique number.

For example, the letter A is stored as 65, and B as 66. A table that maps characters to numbers is called a character encoding.

🔬 Try it yourself
AU+0041Basic Latin
BU+0042Basic Latin
CU+0043Basic Latin

Why Text Gets Corrupted

The problem is that different countries and companies created different encoding tables. Korea used EUC-KR, Japan used Shift-JIS, and Western Europe used ISO-8859.

⚠️Korean text saved as EUC-KR, when read as Shift-JIS, is interpreted as completely different characters. This is the cause of "mojibake" (garbled text).

To resolve this chaos, a project began in 1991 to create one unified code table for all characters in the world — Unicode.

What Is a Codepoint?

Unicode assigns each character a unique number called a codepoint, written as U+ followed by a hexadecimal number. For example, the Korean character has codepoint U+D55C.

CharCodepointDecimalDescription
AU+004165Latin Capital Letter A
U+D55C54620Hangul Syllable HAN
αU+03B1945Greek Small Letter Alpha
🌏U+1F30F127759Earth Globe Asia-Australia

Unicode currently contains over 150,000 characters, covering nearly all human writing systems from ancient scripts to emoji.

🔬 Check codepoints
U+D55CHangul Syllables
U+1112초성
U+1161중성
U+11AB종성
U+AE00Hangul Syllables
U+1100초성
U+1173중성
U+11AF종성