Learn Unicode
The agreement to contain all human characters — learn codepoints, encoding, and decomposition step by step.
◈ How Characters Are Stored
Computers store all information as numbers. Images, sounds, and text are all ultimately sequences of 0s and 1s. Characters are no exception — each character displayed on screen is remembered internally as a unique number.
For example, the letter A is stored as 65, and B as 66. A table that maps characters to numbers is called a character encoding.
◈ Why Text Gets Corrupted
The problem is that different countries and companies created different encoding tables. Korea used EUC-KR, Japan used Shift-JIS, and Western Europe used ISO-8859.
To resolve this chaos, a project began in 1991 to create one unified code table for all characters in the world — Unicode.
◈ What Is a Codepoint?
Unicode assigns each character a unique number called a codepoint, written as U+ followed by a hexadecimal number. For example, the Korean character 한 has codepoint U+D55C.
| Char | Codepoint | Decimal | Description |
|---|---|---|---|
| A | U+0041 | 65 | Latin Capital Letter A |
| 한 | U+D55C | 54620 | Hangul Syllable HAN |
| α | U+03B1 | 945 | Greek Small Letter Alpha |
| 🌏 | U+1F30F | 127759 | Earth Globe Asia-Australia |
Unicode currently contains over 150,000 characters, covering nearly all human writing systems from ancient scripts to emoji.