Latin script in Unicode

Listen to this article
From Wikipedia, the free encyclopedia

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages (including click symbols in Latin Extended-B) and the Vietnamese alphabet (Latin Extended Additional). Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista).[1] Latin Extended-F and -G contain characters for phonetic transcription.

Blocks[edit]

As of version 15.1 of the Unicode Standard, 1,481 characters in the following 19 blocks are classified as belonging to the Latin script.[2]

In addition, a number of Latin-like characters are encoded in the Currency Symbols, Control Pictures, CJK Compatibility, Enclosed Alphanumerics, Enclosed CJK Letters and Months, Mathematical Alphanumeric Symbols, and Enclosed Alphanumeric Supplement blocks, but, although they are Latin letters graphically, they have the script property common, and, so, do not belong to the Latin script in Unicode terms. Lisu also consists almost entirely of Latin forms, but uses its own script property.

Table of characters[edit]

In this table those characters with the Unicode script property of Latin are highlighted in colour, indicating the version of Unicode they were introduced in. Reserved code points (which may be assigned as characters at a future date) have a grey background. All characters that do not belong to the Latin script have a white background (and the version of Unicode they were introduced in is therefore not indicated).

Legend: Unicode version
Unicode 1.0 Unicode 6.0
Unicode 1.1 Unicode 6.1
Unicode 2.0 Unicode 7.0
Unicode 3.0 Unicode 8.0
Unicode 3.2 Unicode 9.0
Unicode 4.0 Unicode 11.0
Unicode 4.1 Unicode 12.0
Unicode 5.0 Unicode 13.0
Unicode 5.1 Unicode 14.0
Unicode 5.2 Unicode 15.0
Reserved Not Latin script
U+ 0 1 2 3 4 5 6 7 8 9 A B C D E F Block #
0040 @ A B C D E F G H I J K L M N O C0 Controls and Basic Latin
0000–007F
(identical to ASCII)
52
0050 P Q R S T U V W X Y Z [ \ ] ^ _
0060 ` a b c d e f g h i j k l m n o
0070 p q r s t u v w x y z { | } ~ DEL
00A0   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ C1 Controls and Latin-1 Supplement
0080–00FF
(identical to ISO/IEC 8859-1)
64
00B0 ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
00C0 À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
00D0 Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
00E0 à á â ã ä å æ ç è é ê ë ì í î ï
00F0 ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
0100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Latin Extended-A
0100–017F
128
0110 Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
0120 Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į
0130 İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
0140 ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ
0150 Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
0160 Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů
0170 Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
0180 ƀ Ɓ Ƃ ƃ Ƅ ƅ Ɔ Ƈ ƈ Ɖ Ɗ Ƌ ƌ ƍ Ǝ Ə Latin Extended-B
0180–024F
208
0190 Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ɵ
01A0 Ơ ơ Ƣ ƣ Ƥ ƥ Ʀ Ƨ ƨ Ʃ ƪ ƫ Ƭ ƭ Ʈ Ư
01B0 ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ ƹ ƺ ƻ Ƽ ƽ ƾ ƿ
01C0 ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ
01D0 ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ
01E0 Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ
01F0 ǰ DZ Dz dz Ǵ ǵ Ƕ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ
0200 Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ
0210 Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ
0220 Ƞ ȡ Ȣ ȣ Ȥ ȥ Ȧ ȧ Ȩ ȩ Ȫ ȫ Ȭ ȭ Ȯ ȯ
0230 Ȱ ȱ Ȳ ȳ ȴ ȵ ȶ ȷ ȸ ȹ Ⱥ Ȼ ȼ Ƚ Ⱦ ȿ
0240 ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ
0250 ɐ ɑ ɒ ɓ ɔ ɕ ɖ ɗ ɘ ə ɚ ɛ ɜ ɝ ɞ ɟ IPA Extensions
0250–02AF
96
0260 ɠ ɡ ɢ ɣ ɤ ɥ ɦ ɧ ɨ ɩ ɪ ɫ ɬ ɭ ɮ ɯ
0270 ɰ ɱ ɲ ɳ ɴ ɵ ɶ ɷ ɸ ɹ ɺ ɻ ɼ ɽ ɾ ɿ
0280 ʀ ʁ ʂ ʃ ʄ ʅ ʆ ʇ ʈ ʉ ʊ ʋ ʌ ʍ ʎ ʏ
0290 ʐ ʑ ʒ ʓ ʔ ʕ ʖ ʗ ʘ ʙ ʚ ʛ ʜ ʝ ʞ ʟ
02A0 ʠ ʡ ʢ ʣ ʤ ʥ ʦ ʧ ʨ ʩ ʪ ʫ ʬ ʭ ʮ ʯ
02B0 ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ Spacing Modifier Letters
02B0–02FF
14
02E0 ˠ ˡ ˢ ˣ ˤ ˥ ˦ ˧ ˨ ˩ ˪ ˫ ˬ ˭ ˮ ˯
1D00 Phonetic Extensions
1D00–1D7F
111
1D10
1D20
1D30 ᴿ
1D40
1D50
1D60
1D70 ᵿ
1D80 Phonetic Extensions Supplement
1D80–1DBF
63
1D90
1DA0
1DB0 ᶿ
1E00 Latin Extended Additional
1E00–1EFF
256
1E10
1E20
1E30 ḿ
1E40
1E50
1E60
1E70 ṿ
1E80
1E90
1EA0
1EB0 ế
1EC0
1ED0
1EE0
1EF0 ỿ
2070     Superscripts and Subscripts
2070–209F
15
2090    
2120 Ω Letterlike symbols
2100–214F
4
2130
2140
2160 Number Forms
2150–218F
41
2170
2180        
2C60 Latin Extended-C
2C60–2C7F
32
2C70 Ɀ
A720 Latin Extended-D
A720–A7FF
188
A730
A740
A750
A760
A770
A780
A790
A7A0
A7B0
A7C0          
A7D0                
A7E0                                
A7F0    
AB30 ꬿ Latin Extended-E
AB30–AB6F
56
AB40
AB50
AB60        
FB00                   Alphabetic Presentation Forms 7
FF20 Halfwidth and Fullwidth Forms
(fullwidth Latin letters)
FF00–FFEF
52
FF30 _
FF40
FF50
10780 𐞀 𐞁 𐞂 𐞃 𐞄 𐞅   𐞇 𐞈 𐞉 𐞊 𐞋 𐞌 𐞍 𐞎 𐞏 Latin Extended-F
10780–107BF
57
10790 𐞐 𐞑 𐞒 𐞓 𐞔 𐞕 𐞖 𐞗 𐞘 𐞙 𐞚 𐞛 𐞜 𐞝 𐞞 𐞟
107A0 𐞠 𐞡 𐞢 𐞣 𐞤 𐞥 𐞦 𐞧 𐞨 𐞩 𐞪 𐞫 𐞬 𐞭 𐞮 𐞯
107B0 𐞰   𐞲 𐞳 𐞴 𐞵 𐞶 𐞷 𐞸 𐞹 𐞺          
1DF00 𝼀 𝼁 𝼂 𝼃 𝼄 𝼅 𝼆 𝼇 𝼈 𝼉 𝼊 𝼋 𝼌 𝼍 𝼎 𝼏 Latin Extended-G
1DF00–1DFFF
37
1DF10 𝼐 𝼑 𝼒 𝼓 𝼔 𝼕 𝼖 𝼗 𝼘 𝼙 𝼚 𝼛 𝼜 𝼝 𝼞  
1DF20           𝼥 𝼦 𝼧 𝼨 𝼩 𝼪          
Total characters 1,481

See also[edit]

References[edit]

  1. ^ Everson, Michael; Dicklberger, Alois; Pentzlin, Karl; Wandl-Vogt, Eveline (2011-06-02). "Revised proposal to encode "Teuthonista" phonetic characters in the UCS" (PDF).
  2. ^ "Scripts-15.1.0.txt". Unicode Consortium. 2023-07-28. Retrieved 2023-09-12.
Listen to this article (4 minutes)
Spoken Wikipedia icon
This audio file was created from a revision of this article dated 9 November 2023 (2023-11-09), and does not reflect subsequent edits.