Talk:List of languages by number of native speakers/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5

Czech and Slovak languages

Since there's a request to point out innacuracies, I'd like to point out that Czech and Slovak are separate languages, and should not be listed together. While similar, neither one of them qualifies as a dialect of the other, and they have evolved independently (see Slovak language#Relationships to other languages for a discussion on the topic). There is certainly more arguments for listing them separately than there are for listing Bulgarian and Macedonian separately, and unlike that example, both Czechs and Slovaks will agree that their languages are separate. Since I'm not familiar with the souces used for the article, I'd like to request that someone with knowledge on where to find reliable data separate the two. Thanks. --Aramգուտանգ 8 July 2005 04:47 (UTC)

Are we going to declare that every dialect with an army is a separate language? When people learn a language, they're interested in where it's spoken, and who they can speak to with it. I don't speak much, but I picked up a little Slovak while in that country. When I went to Czechia, all of a sudden I was speaking Czech. The words coming out of my mouth hadn't changed, only the country I was saying them in had. Sure, there are differences in standardization, but they are clearly two standardized dialects of the same language. Your arguments for separating them (neither a dialect of the other, and evolving independently) could equally be made for separating Bostonian and Brooklynese. As for people's conception, that's a social distinction, not a linguistic one (at least not in the narrow sense of the word). Separate standards would argue for Canadian and Usonian being listed separately. As for the statement "both Czechs and Slovaks will agree that their languages are separate", it's false. True, some people do hold that opinion. I'm sure you could dig up people who'd say the same about US and UK English. But I've also met Slovaks and Czechs who state flatly that they're dialects (usually, of course, that the other is "just a dialect" of their language!) This certainly deserves mention, as any difference in national or regional standardization does, but as a note to a single Czechoslovak language entry. kwami 2005 July 8 06:30 (UTC)
I have lived in the Czech Republic for 8 years, and I am a fluent speaker of Czech, yet I would never claim to be able to speak Slovak. The ability to understand a language by a speaker of another language does not make one a dialect of the other. True, the languages are mutually intelligible to the extent that on Czech TV news reports from Slovakia are in Slovak, and any speaker of Czech, including myself, can understand them, but this does not imply they're the same. Especially in the case of Slavic languages which are very similar in both vocabulary and grammar, differentiating languages is often based on subtle differences. Such differences, however, are significantly strengthened when the speakers of one language claim a separate cultural identity from speakers of the other, as is the case with Czech and Slovak. As an example, consider Eastern Armenian and Western Armenian. I am a native speaker of Eastern Armenian, yet I have incredible difficulty understanding a person speaking in Western Armenian, more so than I do in understanding Slovak. Yet, E. & W. Armenian are considered to be dialects, not separate languages, perhaps in part due to the fact that the groups of speakers both claim the same cultural identity. Of course, there's a flip side to this argument, as can be illustrated by the Moldavian language, but in the case of Czech and Slovak, it's quite clear that they are generally recognised as separate languages. Comparing the differences between them to differences between UK and US English is especially out of place. --Aramգուտանգ 8 July 2005 09:57 (UTC)
You've just demonstrated, very nicely, that Czech and Slovak are dialects of the same language. The description you've given is almost the definition of what dialects are: Not the same, but readily intelligible. If common understanding differs from objective reality, then that warrants a footnote, but not division of the language. Cultural identity does not define a language, mutual intelligibility does. As for Armenian, many people say that Armenian is two separate languages, for exactly the reasons you give. Ethnologue does not reflect that, but again, Ethnologue is rather sloppy. kwami
There's no definition of a dialect. Linguists don't distinguish dialects from language. The only useful criterion for counting purposes is self-identification of the speakers. If they say they speak language X, then they do. Without having to interview them, you can infer from their behaviour: If they use standard language X for formal situations and writing, you can deduce that they view their spoken language as a dialect of X. Problems arise when the situation changes for political reasons, like it did in former Yugoslavia and former Czechoslovakia. Which just illustrates that there's no objective criterion. 213.73.117.218 05:09, 6 October 2005 (UTC)

This is an argument that's come up repeatedly, and will continue to come up. We have two lists of languages by native speakers. Should we make one a list of languages by cultural identification, separating Czechoslovak and unifying Chinese, and the other a list of languages by the criterion of mutual intelligibility? That should make everyone happy. kwami 2005 July 9 17:51 (UTC)

Kwami, intelligibility just can't be the definition of a language. As our Romance languages article notes, standard Italian and Castilian Spanish are mutually intelligible. But Sicilian and Piemontese are not...yet Italian and Spanish are universally considered "separate languages," while Sicilian and Piemontese are almost as universally considered "dialects of Italian," (although this is perhaps inaccurate of the north-Italian dialects, which are more closely related to the Langue d'Oc than to standard Italian, supposedly). I think that the distinction has more to do with standardization and institutionalization. "A language is a dialect with an army, a navy, and a police force," as my advisor likes to quote some linguist he talked to as saying. One might add hat a language has television broadcasts, a literary tradition, education taught in schools, and so forth. john k 9 July 2005 19:08 (UTC)
You'll notice that several Italian "dialects" are listed as separate languages. As for Italian and Spanish being mutually intelligible, they are, to a degree. But which degree of intelligibility warrants classifying two lects as dialects of the same language? Turkish and Azeri are listed separately, and have a large degree of intelligibility. All I ask is that we use a reasonably consistant criterion.
True, intelligibility isn't the only definition of what constitutes a languages, but it is one of them. But by the criterion of self identification, Zhuang is a Chinese dialect, even though it's a member of the Kadai language family. If we separate Czech and Slovak because some native speakers (hardly all) consider them to be separate languages, should we also include Zhuang in Chinese, because almost all of its speakers consider it to be a dialect of Chinese? Should we then say that Chinese belongs to both the Sino-Tibetan and Kadai language families?
By your definition, most African and American languages are no longer languages, because they aren't taught in schools and aren't used for journalism or TV broadcasts. This ties into the 19th century idea that civilized people speak "languages", and savages speak "dialects". kwami 20:02, 2005 July 9 (UTC)

This is a whole new low with regard to sociolinguistics on Wikipedia - that Czech and Slovak are not merely grouped together, but wholly equated. While trivially useful for some readers to have some sort of an overview of which foreign languages can be grouped together, and (I presume) a fun exercise for linguists, it's also out of touch with reality because it blatantly ignores the behaviour and thoughts of the people speaking those dialects. You simply cannot claim the high ground "genetically they're the same, so there!" and expect for people to just accept it. --Joy [shallot] 9 July 2005 19:15 (UTC)

Then disambiguate them as you did for Serbocroatian. You can always improve an article by editing it yourself! (Actually, they're not wholly equated. They're linked to two separate articles.)
Since we're ranking languages by number of speakers, dividing them up into their constituent dialects makes them look less important than they are. kwami

We have two contrary tendencies here: distinguishing languages genealogically, and distinguishing them culturally. This is going to continue to create conflicts, until we decide on one or the other - or create two lists. When I first saw this article, Chinese was listed as a single language, but there were half a dozen Italian "dialects" listed as separate languages. That's just silly: Italian has about the diversity of Cantonese. We should go one way or the other. I've tried to make the list somewhat more consistant, but of course haven't been able to do everything. If you don't like the direction I've gone, fine: Do something better. But let's at least make it internally consistant. kwami 20:02, 2005 July 9 (UTC)

A couple of points: 1) the Italian dialects are, according to Ethnologue, not even all that closely related to each other. Calling Sardinian an Italian dialect would appear to be technically inaccurate. And, according to Ethnologue, Piemontese, Lombardese, and so forth are closer to French and the Langue d'Oc than they are to Italian. As far as general standards, I'd suggest that the presumption should be that languages listed as separate languages by Ethnologue should be treated as separate languages. However, if languages listed by ethnologue as separate languages are often considered to be the same language, especially for political reasons, we should unify them. Thus, Eastern and Western Farsi, or Gheg and Tosk Albanian, should get unified despite being separate languages on Ethnologue. Probably this goes for Arabic as well, if only because differentiating the dialects is so difficult. This should perhaps also be done for some of the Hindi dialects like Awadhi or Haryanvi (but probably not Punjabi or the Bihar dialects). I'm not sure what should be done about Hindi and Urdu, but they should probably be separated out again, as well. Czech and Slovak should definitely be separated, because the languages are listed as separate on Ethnologue, and are not normally considered to be actually the same language. john k 20:16, 9 July 2005 (UTC)

Sardinian is pretty universally considered a separate language, so it's not really relevant to the discussion of Italian dialects. But you're trying to have it both ways. Why should we separate Italian dialects to be "technically accurate", if we unify other lects that are generally considered to be the same language, which Italian is? I believe Panjabi is now unified, despite the fact that Ethnologue classifies W. Panjabi as closer to Sindhi, and E. Panjabi as closer to Hindi, than they are to each other. This was because of vociferous opposition to dividing it up. If we're going to include lects within Hindi that are more divergent than Urdu is, but separate out Urdu, then we're going on sociological criteria alone. If that's the case, we shouldn't follow Ethnologue. Shouldn't we also list "Bosnian" as a separate language and unify all of Chinese?
I think we need to cover our bases. Either (1) have two lists; (2) follow self identification, but note that (a) Bosian is essentially the same language as Serbian and Croatian and Czech and Slovak are essentially the same language, and (b) there are significant differences among Chinese, Arabic, Italian, German, Igbo, and Armenian "dialects"; or (3) follow mutual intelligibility, but note that Serbian/Bosnian/Croatian, Czech/Slovak, Hindi/Urdu, and Mandarin/Cantonese are separate standards and often considered separate languages. I just don't think we should follow one criterion for some languages and a different one for others. kwami
I see you've gone ahead and done it. When I get the chance, I'll unify Chinese, Hindi (all Indic dialect continuum lects not given official status in Indian constitution --> Hindi), and Italian, and separate Serbocroatian and Malay. (Should Malay be three languages, Malaysian, Indonesian, and Malay, the latter for Brunei and Singapore?)
Also unified Fulani, Quechua, etc. Sorry the edit was anonymous, Wikipedia signed me out in the middle of it! kwami 01:00, 2005 July 10 (UTC)
A couple of points: 1) For Punjabi, other sources (e.g. Britannica) give a completely different (and really more sensible) division of the Indo-Aryan languages, which only has one group for Punjabi. 2) In terms of the other dialects, I'm not sure there's reason to remove them. I think it would make sense for Hindi, Italian, German, and so forth to give the number of total speakers of the broader language, but to also list the major subsidiary languages separately as well. We should just note when we are doing this. We could do the same thing for Chinese, I think, although Mandarin is such a distinct dialect/language that we should perhaps note both the total number of Chinese speakers and the total number of Mandarin speakers. I certainly don't think we should remove all the entries for the southern Chinese languages/dialects. In terms of separating languages, I'm not sure either Serbo-Croatian or Malay should be separated out. For Serbo-Croatian, maybe, since there are different writing systems involved (which alphabet do Bosnian Muslims use, by the way?) And what's the deal with "Malaysian" as a language? I've never heard this claim. I've always heard the language of Malaysia called "Malay." At any rate, there's no reason to be dogmatic about this. I think the best thing to do is be flexible. As I said before, use Ethnologue as a baseline for when separate languages exist (e.g. don't separate out Moldovan), but unify in instances where the Ethnologue divisions are contrary to generally accepted usage, like East and West Farsi. I think we should be especially careful about removing information - does any good really come from removing the different Chinese dialects? john k 02:35, 10 July 2005 (UTC)
Why not try for some semblance of consistancy? If Czech and Slovak are separate languages, fine: Your criterion is speaker identification. Then all of Chinese is a single language. If you want to split up Chinese, fine: Your criterion is mutual intelligibility. Then Czechoslovak is a single language. As for Malay, no, in Malaysia it's called Bahasa Malaysia (Malaysian language). The language is called Malay (Bahasa Melayu) in Brunei. Both Malaysians and Indonesians tend to get rather indignant when told that they're the same language. Most people I've met insist that they're separate, despite the fact that I'm speaking "Indonesian" on one side of the border, "Malaysian" on the other side, and the words out of my mouth haven't changed (except perhaps for a couple, like 'shan't' vs. 'won't' in English). Arguing for half of one and half of the other is just being wishy-washy.
If you want this article to be taken seriously, then apply the same criterion to all languages in a rational manner. If you want to split Czechoslovak and unify Arabic, then split Malay and and Serbocroatian, and unify Chinese. What does the writing system have to do with anything, except people's conception of their language, which you say should form the basis of classification? You say you don't want to remove information, but you don't hesitate to remove the fact that Czechoslovak and Hindustani, as spoken, are essentially single languages. That's also information. I'll let the revert go for a while, but eventually we need to decide what our working definition of "language" is. I tried mutual intelligibility, and was reverted because that's 'not what language is'. I've now tried speaker identification, and have been reverted because that's 'not what language is'. Maybe there's a third way someone wants to work out (having both in one list, like you suggested above, so languages are counted double; or having two separate lists [which is what we had prior to this revert]); I'm not going to bother myself. However, I do expect some sort of a half-way intelligent standard. The current mish-mash is not acceptable in an encyclopedia. kwami 04:35, 2005 July 10 (UTC)
Yes, consistency is exactly what my initial request entailed. If Czech/Slovak are together, then so should Russian/Ukrainian, which are mutually intelligible to a similar extent. Yet grouping the latter pair would not be acceptable to most people, thus showing that intelligibility is not a good criterion. If words thou placest archaic in manner queer within sentence, stayest the union not understood? Perhaps the ability of the speaker of a language to speak the other is a better criterion. Cultural identity is yet another, however with more obvious pitfalls. You must also keep in mind what kind of information people are looking for when they go to a page called "list of languages by total speakers". It seems that the highly ambigous definition of "what is generally accepted to be a separate language" is the most preferable one, which is a kind of amalgam of the above. No matter what the criterion is, however, consistency is crucial. --Aramգուտանգ 08:05, 10 July 2005 (UTC)
Yes, Aram, exactly. Intelligibility alone is clearly not a good criterion. By that standard, By that standard, as noted before, Italian and Spanish would have to be combined. Also all the Scandinavian languages. All the East Slavic languages. And so on. Cultural identity alone, though, is just as bad. The southern Chinese dialects are clearly defined and quite distinct from Mandarin. We need to use a combination of all these things and use the basic idea of what is generally considered a language. I do think that in some instances counting double would be appropriate - we list "Chinese" for all the Chinese languages, then list the various dialects separately; we list "Hindustani" for Hindi, Urdu, and all the closely related dialects (Awadhi, Haryanvi, the Bihari dialects, and so forth), and then list them all separately as well. Have "Malay" give the total for Malay+Indonesian, but then give Indonesian a separate entry. That kind of thing... john k 16:11, 10 July 2005 (UTC)
I agree about Slavic, so either cultural identification (what I had done) or John's suggestion of doubling up would work. As for the Chinese "dialects" (which most Chinese insist are dialects and not languages, despite what Europeans feel they "should" think), there's always the second article based (partially) on mutual intelligibility, which I referred to in the intro. Is someone going to follow up with John's suggestion? Because my last edit was at least consistant, and preferable to the current situation. kwami 21:54, 2005 July 10 (UTC)

Dear kwami,

I'm glad you're interested in foreign languages, including Slavic languages. It seems we share the same passion :-) It's nice to meet you.

But I need to say something less pleasant: in my opinion you are somewhat misinformed about Czech and Slovak.

Czech and Slovak share most of their vocabulary, but there are some significant differences between them in phonetics and phonology. Slovak grammar is almost identical to Polish grammar, while Czech grammar is quite different (partly due to peculiar phonetic shifts in Czech).

As regards mutual intelligibility, I think you are overlooking an important fact: it depends not only on the similarity of languages, but also on the exposure to the other language. I'll give you an example. I'm a Pole and I live in Poland. I've been in Slovakia many times since I was a kid. For 10 years I've been living 2 kilometres away from the Slovakian frontier. I often go to Slovakia to have a lunch. As a result, I can understand spoken Slovak, even if I never learned to speak it. I didn't have to learn to speak, because Slovaks I talked with could understand Polish. That's because Polish and Slovak are closely related and if one of them is your native language, then it's enough to figure out what the main phonetic differences are and learn a few different words to understand the other language. On the other hand, I can't understand Czech, because I rarely heard it. So what you need is just some contact with the other language and it will naturally become intelligible to you. That's what happened in Czechoslovakia. Czechs had a lot of contact with the Slovak language and Slovaks had a lot of contact with the Czech language. For example, both languages alternated on the same TV channel. That's why Czechs and Slovaks can easily understand each other. Slovaks who didn't have the opportunity to hear the Czech language often, for example because they lived outside of Czechoslovakia or they were born in the 90s, when there was no common state anymore, find Czech difficult to understand. Of course, they can understand many words, because the two languages are very similar. But they have problems in understanding spoken Czech on TV, etc.

The dialectal differentiation is also relevant to mutual intelligibility. Western Slovak dialects are very close to Czech, so they are mutually intelligible with Czech and Moravian dialects. Central Slovak dialects are less similar to Czech and they share many common features with Polish and Slovene (but still Czech is the closest language). Eastern Slovak dialects are more similar to Polish than to literary Czech. The standard Slovak language is primarily based on the central dialects.

You wrote: I don't speak much, but I picked up a little Slovak while in that country. When I went to Czechia, all of a sudden I was speaking Czech. The words coming out of my mouth hadn't changed, only the country I was saying them in had.

Similarly, you could wrote: I don't speak much, but I picked up a little Spanish while in that country. When I went to Italy, all of a sudden I was speaking Italian. The words coming out of my mouth hadn't changed, only the country I was saying them in had. Spanish and Italian are similar languages. If you use Spanish in Italy, you'll probably be understood (all the more so if the matter is simple and the subject of message can be guessed from the context). Your interlocutors may even think you are awkwardly trying to speak Italian. Of course, they won't say your Italian is awkward. They'll probably be much more polite. Actually, I had that experience 7 days ago when I was going by train to Bratislava. I had a conversation with a Slovak girl. I spoke an awful mixture of Slovene and Polish with some Slovak words thrown in. I explained that I couldn't really speak Slovak, but I can understand it. She replied, "but you speak Slovak very well"! I said, "oh, no..." but she insisted, "you really speak very well." I said "I can't believe it..." and then she said to another passenger, "he speaks very well, doesn't he?" and the other passenger confirmed. I felt a bit embarassed and I didn't explain I hadn't even spoken Slovak :-)

I'm astonished that some people you talked with claimed that Czech and Slovak are dialects of the same language. I never met such people. 160 years ago this opinion could be justified, but now that there are two independent standard languages, for a long time officially recognized as separate, it seems anachronistic. Anyway, even if some people share this view (I don't think they are many), I'm sure no linguist claims that Czech and Slovak are the same language. And this is important, because I think Wikipedia is meant to be a source of scientific information.

To sum up, Czech and Slovak are very closely related (they are the closest relatives), but they are definitely separate languages. This is quite clear to linguists, even if the question how to distinguish between separate languages and different dialects of the same language was not satisfactorily answered by linguistics (the related discussion is interestingly presented by Einar Haugen in his article "Dialect, Language, Nation", American Anthropologist, vol. 68 (1966), pp. 922-935; reprinted in: "Sociolinguistics", ed. by J. B. Pride and J. Holmes, Penguin, 1972).

Kwami, I wish you much success in learning foreign languages.

Best regards. Boraczek 23:46, 12 August 2005 (UTC)

Confusion over name of the page

So are we ranking by Native speakers or Total speakers? If it's by total, why does it say otherwise in the first sentence of the page? If it is by Native, then what is this page: List of the most spoken native languages ? Please clarify.--Zereshk 8 July 2005 22:32 (UTC)

Yeah, it's stupid, they're both of native speakers. We need to work this out, I think. john k 9 July 2005 00:11 (UTC)

If it is by total, then we should start re-ranking the page accordingly. Right now, it's ranked by natives. I'll be woking on that.--Zereshk 9 July 2005 12:02 (UTC)

I'm the one who started adding second language speakers to this page. Before that, it was entirely native speaking populations. It was the "total number of native speakers", by which the author meant all native speakers of a language, in all countries. The other page is simply a list lifted from the 13th edition of Ethnologue, and might actually be a copyright violation.
If you want to order this list by total (1st + 2nd language) speakers, there are two major problems:
  • We do not have 2nd language data for most languages,
  • Estimates of 2nd language numbers are even less reliable than those for native speakers, so if we rank by the 1st+2nd total, we will get into lots of edit wars by people claiming that language X is exagerated, and Y really has more speakers, based on varying definitions of what a second-language speaker is.
I think it's safer to order the list by number native speakers.
I have another suggestion, in the Czechoslovak discussion above: Since our main disagreements center around what's a language and what's a dialect, why not use one list for languages defined by linguistic criteria (intelligibility), and one defined by cultural identification? That is, in one Czechoslovak would be unified and Chinese broken up, and in the other Czechoslovak would be broken up and Chinese unified? kwami 2005 July 9 18:05 (UTC)

Im fine one way or the other. But I dont think getting estimates of 2nd language speakers would be too difficult. The list can be according to any predefined definition. In any case, all we must do is define what we want to be listed here, and stick to it.--Zereshk 9 July 2005 18:39 (UTC)

I tend to think Chinese and Czech/Slovak should both be broken up. I'd suggest that only in cases like Arabic, where it's not only a complicated question of whether it's one or several languages, but also hard to figure out how exactly to divide it up into separate languages, that we should keep them together. I'd also suggest just redirecting the other page to here, and explaining that this page is listing native speakers. In terms of 2nd language speakers, one problem is how to define "2nd language" - is it anybody who has any knowledge of the language at all? Or is it more specific than that? Is a Dane who speaks some English because he studied it in school the same as a Yoruba who speaks English as a second language and has to use it in everyday communication? Better to avoid the whole question, I think. john k 9 July 2005 19:01 (UTC)

Someone went ahead and reordered the list according to 1st+2nd language speakers, but since we don't (yet) have the data for 2nd language speakers for most languages, I reverted. However, 2nd language data is of interest when you're considering how useful a language is.
If you want to separate Czech and Slovak because of speaker identification, why would we even want to break up Arabic? Or Chinese? Or Italian? They are single languages by cultural definition. kwami 20:09, 2005 July 9 (UTC)
  • I've talk about this earlier, please don't confuse a second language with a learning language. These are two different subjects. Finding second language users is not hard! Finding learning languages users is difficult. -Pedro 01:07, 10 July 2005 (UTC)

Language families

I've added a column for language families - so far, I've mostly only had a chance to add in the broadest families - Indo-European, Austronesian, &c., but hopefully we can add in the more specific branches over the next few days. I think this should be useful, especially for the less well known languages which we don't have specific articles about. As far as I can tell, 17 18 language families (Indo-European, Uralic, Altaic, Afro-Asiatic, Niger-Congo, Nilo-Saharan, Sino-Tibetan, Dravidian, Tai-Kadai, Hmong-Mien, Austro-Asiatic, Austronesian, Japonic, Quechuan, Aymaran, Uto-Aztecan, and Mayan, and Tupian) and 1 language isolate (Korean) are represented among the languages with more than one million speakers. john k 9 July 2005 01:30 (UTC)

And Tupian. kwami 2005 July 9 18:21 (UTC)
Yup, forgot that. john k 9 July 2005 18:55 (UTC)

The language called "Persian" is known internationally and domestically within Iran as "Farsi."

Yup, and in English it's called "Persian". Lots of languages (Spanish, Chinese, French, Japanese) have English or Anglicized names. Even when the native name is sometimes used in English (such as "Ivrit" for "Hebrew", "Bahasa" for "Indonesian", or "Italiano"", "kiSwahili", "Russki", etc.), this should be noted in the dedicated articles, but isn't necessary here.
Actually, "Farsi" isn't a good name internationally, because many speakers outside of Iran refer to their language as Dari or Tajiki. kwami 21:22, 2005 July 12 (UTC)

More accurate sources

The CIA, in the World Factbook have a list of the most used native languages:

  • Chinese, Mandarin 13.69%,
  • Spanish 5.05%,
  • English 4.84%,
  • Hindi 2.82%,
  • Portuguese 2.77%,
  • Bengali 2.68%,
  • Russian 2.27%,
  • Japanese 1.99%,
  • German, Standard 1.49%,
  • Chinese, Wu 1.21% (2004 est.)

note: percents are for "first language" speakers only See reference

Internet World Stats has also a list of the people able to speak each language, including second languages (is ordered by internet users thought)

  • English 1,109,729,839
  • Chinese 1,316,007,412
  • Japanese 128,137,485
  • Spanish 389,587,559
  • German 96,141,368
  • French 375,066,442
  • Korean 75,189,128
  • Italian 58,608,565
  • Portuguese 227,621,437
  • Dutch 24,218,157

See the stats.

The english numbers seem a bit high compared with other reports, but they claim to have accurate data. :? --Bisho 15:33, 14 July 2005 (UTC)

Language defined by cultural identification/self designation

Okay, despite all the talk, no one's fixed this article. I'm reverting to the last version to define language by speaker identification (keeping Akira's edits), since most people feel that intelligibility tests are unworkable. I'm the one who added the Chinese "dialects" in the first place, and I don't have any problem removing my own additions to this article.

By all means, please add the individual Chinese "dialects" back in if you like, but keep the main heading and make a note under it. I might do that myself. No need to go to a lot of work; the info is all in the page history from when I added it the first time. And put back Malay, Czechoslovak, and Serbocroatian back in if you like, as additional info - it's all there in the page history.

If our conception of language is to be cultural or self identification, then we shouldn't mix in intelligibility tests, unless it's added as additional information, and cross referenced. We need some sort of consistency in an encyclopedia article, not just whatever feels right for everyone's favorite language. kwami 22:40, 2005 July 14 (UTC)

I say group Chinese, Arabic (except Maltese and other separated Arabic languages) add a linguistics note in these cases, Group Swiss German to the rest of German. Split Malay/Indonesia, Hindu/Urdu, etc. -Pedro 23:37, 14 July 2005 (UTC)

That's pretty much what we have now with the last revert. (Maltese isn't populous enough to include anyway.) kwami 00:45, 2005 July 15 (UTC)

Well, I have an idea....why not, we group each disputed language under its most commonly referred by name, eg. Chinese for the "chinese" languages and German for the languages spoken in germany, austria, most of switzerland and then provide a sub-division under which one can see the number of speakers per "dialect"/sub-language? That would make naviagation a breeze too! Kenkoo1987 12:37, 6 August 2005 (UTC)

Probably a good idea. You wanna take it on? The info for Chinese and some Hindi is already there. Arabic is easily obtainable from Ethnologue, though I don't know what kind of intelligibility standard they use or whether we'd want to use those numbers. And then there's Quechua... We might want to unify Malay & Dai, since they have names as such. kwami 06:40, 2005 August 7 (UTC)

Turkish language

Turkish language has much more total speakers than it is written on this page. Azeri, Kyrgyz, Kazakh, Uzbek, Turkmen and other Turkic languages have only dialectic differences from Turkish. And they are called with Turkish like Azeri Turkish or Kyrgyz Turkish instead of Azeri or Kyrgyz. So, the total number of Turkish speakers is 165,61 million according to this page and in fact it is almost 250 million with the Turks living as minorities all around the world and living in autonomous Turk regions especially in Russian Federation and China.

We are considering including both "language" by the criterion of mutual intelligibility, double listing "language" by social/national convention. Turkish is certainly one of the languages we need to consider.
If you can provide evidence that in general Turks consider their 'dialects' to be a single language, as the Arabs and Chinese do, then we should make the change you suggest even without the double listing. However, this needs to be the general conception of Turkic language speakers, and not just a political platform of pan-Turkish nationalists.
Meanwhile, you've changed the numbers of just Osmanli. Do you have documentation? Everybody and their brother wants to maximize the numbers for their favorite language, so we're automatically reverting such edits unless they're supported. (Read above for the interminable discussion for Persian.) It appears at first glance that you're counting Turkish Kurds as native Turkish speakers.
kwami 23:57, 2005 July 16 (UTC)
Someone else revised Turkish upwards, in effect denying that a quarter of Turkey's population is Kurdish. I reverted. 66.27.205.12 06:33, 10 August 2005 (UTC)
Fine, but 1/4 of the population (72M) gives you 18 million, if 18 million people in Turkey speak Kurdish as their first language then the 16 million figure for Kurdish doesn't make sense. Besides there are Kurds in places other than Turkey, but not all people of Kurdish descent speak Kurdish as their first language.

The Ethnologue figure of 46.28M native speakers in Turkey is just the 1987 population times 85%, based on a guess that 15% of the population is Kurdish. Actually, that figure is now generally considered an underestimate, and the Kurdish population could be as high be 25-30%. However, the Kurdish article and most of the estimates I've seen place it at approximately 20%. Given Turkey's current population has increased dramatically to 69.66M, that would be 56 million native speakers today. The population of Bulgaria has descreased dramatically to 7.45M; at 9.4% ethnic Turk (mostly Turkish speaking), that's a further 700k. Greece: emmigration offsets population growth, so perhaps still ~130k. Cyprus: N. Cyprus population is now 210k. Macedonia: 1982 figure 200k; don't know about now. Uzbekistan: Population increased dramatically from 1979. At current growth rates, calculating back from 1993 (from the demographics chart for Uzbekistan), the 1979 population was ~17.2M. Assuming the same percentage today, that gives ~300k Osmanli speakers. Germany: 2.1M. Netherlands: 200k. France: 140k. All other countries are I believe < 100k, though Moldova has 140k Gagauz speakers. Total: 60.0 million, plus Gagauz. In any case, our major uncertainty is the %age in Turkey: if the Kurds are just a bit more numerous in Turkey, that would offset Turkish-speaking immigrants in the rest of the world. We could maybe guess it's 61 million? I'm putting in 60M native, and assuming 2nd speakers are basically the 14M Kurdish population, ~75M total. kwami 05:39, 29 September 2005 (UTC)

As for the pan-Turkish claims, Uzbek has a distinct identity, and there is a pan-Uzbek movement (mostly orchestrated by the govt.) But I would like to add a note mentioning the number of native speakers of the Oghuz languages (60M Osmanli, 21-22M Azeri, 6.5M Turkmen, 1.5M Qashqai, plus 0.5M other = 90M.) kwami 20:53, 6 October 2005 (UTC)


Hi, I read what you have written, I have some words to say;

(0) Kurdish issue is not easy to figure out as kwami put forward it. Since they never have had a country all these numbers are ambigious. Even if the numbers are correct it does not imply the number of native Kurdish speakers. So many Kurdish people in Turkey do not know a word of Kurdish. It is also not easy to claim that Kurdish is those people's native language. Since there is no institution in Turkey that gives Kurdish lessons it is not that plausible that Kurdish people, at least the ones in Turkey, speaks Kurdish properly. That is to say, as the article starts "This is a list of languages ordered by number of first-language speakers" Turkish is the native/first language of so many Kurds in Turkey -sad but true! Therefore, population analysis is pointless... What we need is qualitative analysis instead of quantitative ones.

But this is just like any other language. Ethnicity doesn't equal native language status.
you're right, Is -10M OK with you?
Traditionally the estimates had been 15%, but it's now generally thought that the number is higher, perhaps 20-25% or even 30% in some estimates. However, I have no figure for the percentage of Turkish Kurds who are native speakers. The wiki article estimates 20% of the population of Turky, which would be -14M. I know that many Kurds in the west of the country do not speak Kurdish, and identify with Turks, but in the (south)east of the country, where most Kurds live, Kurdish is practically universal. (Although of course they're fluent in Turkish as well.) Since most Turks are exposed to Kurds in the west, where they're more assimilated, it's possible they might get a skewed impression of the number of Kurds who don't speak Kurdish. (I was amazed at how many Turks in Istanbul were afraid to go past Diyarbakir, and warned me I would be unsafe, and they were mostly exposed to assimilated Kurds.)

(1) Osmanli is not Turkish. Osmanli is a dead language; a strange combination of Turkish Arabic and Persian. Cannot be considered as Turkish. I can suplly you documents. As a native speaker of Turkish it is almost impossible for me/us to read a text in Ottoman. Remove the word Osmanli!

Perhaps you're thinking in Turkish? Turkish osmanlı translates as "Ottoman" in English, and yes, of course that's distinct from modern Turkish. But the word "Turkish" is often applied to Oghuz, so we need a term to disambiguate the Turkish of Turkey. I've seen "Osmanli" used with this meaning several times. Less commonly, I've also seen "Anatolian Turkish".
That is not correct. As I've written above "...impossible for me/us to read a text in Ottoman." Even if you've seen "Osmanli" used for Turkish in Turkey, it is completaly wrong. We must remove it. By the way my name is oghuz :-))
Okay, but since many references state that Turkish=Osmanli, we should make a note of that, with a link to Ottoman Turkish, and not just delete it. How about "Turkish, or Anatolian Turkish, is ..." instead?

(2) I am reluctant to consider Kyrgyz, Kazakh, and Uzbek languages as Turkish. I can understand Azeri and Turkmen languages. Others are certainly Turkic languages but not Turkish! For instance, there are great similarities between Spanish and Portuguese: even 2 native speakers of P&S are so close to understand each other when they speak but Portuguese is not Spanish and vice versa. But I do not know if we can take it as a criterion, because Danes also understand what Sweeds say :-)))

The question is really whether the ethno-linguistic identity of the speakers is the same, not how intelligible they are. That is, do the Azeri and Turkmen consider their languages to be dialects of Turkish?
I do not know.
I think that until we have evidence that they identify as Turkish, the Oghuz population should be kept separate, just as the Scandinavian languages are.

(2.1) Uzbek, this is an example that I obtained from Uzbek_language page: "Barcha odamlar erkin, qadr-qimmat va huquqlarda teng bo'lib tug'iladilar. Ular aql va vijdon sohibidirlar va bir-birlari ila birodarlarcha muomala qilishlari zarur." If is there any Turk out there who claims that he/she can understand this text, then OK lets add Uzbek language to the list as Turkish, but It is almost impossible to understand. There are only 2 or maybe 3 words that I catch, not more! Native speakers of Turkish, like I did, may check http://uz.wikipedia.org/wiki/Main_Page try to read the articles, Can you? I admit that there so many similarities but to be honest I couldn't read those articles.

(2.2) Turkmen_language is very close to Turkish and according to my account, it is convinient to add it to our list. I did check wikipedia's Turkmen edition and yes I can read it.

(2.3) When it comes to Kyrgyz and Kazakh languages, I really do not know much about them... Since they are not using the latin letters it is at least impossible for us to read what they write.

(2.4) Yakuts people, 363,000 speakers (according to wikipedia), is forgotten. I, partially, understand when those people speak in Yakut (Sakha language). I shall not claim that I do understand it as good as I understand Azeri but at least it is worth to consider them as well. I posted a question to sakha language page. Lets see if we can communicate in Turkish.

I'd purposefully left out Saha because I thought it would be even more distant than Uzbek. Likewise Chuvash, which I understand is completely unintelligible.
I do not think that Yakut people would call their language as Turkish.

(3) Turkish people all around the world must be taken into account as well, especially in Europe/Germany.

They were included above in the 60M figure.
Lets take it separetly, when the numbers change in a region -we can update our pages easly.
Okay, see below.

(4) Is it really a offical language in Bulgaria? I dont think so!

I believe it is recognized officially, at least by local governments.
I could not find a source to put forward it. Do you have any sources? At bulgaria page at Wikipedia, there is no information for that.
Ethnologue has: "National or official languages: Bulgarian, Turkish." I don't know what this means in practical terms. Turks are 10% of the population.

(5) It is not Cyprus! It must be at least written as Northen Cyprus.

Turkish is certainly the official language of North Cyprus. But isn't it also co-official in South Cyprus?
At this point you're right. Turkish is offical at the south part. How shall we write it? (a) as Cyprus or as (b) Northen Cyprus + Southern Cyprus
Republic of Cyprus is the recognized name of the south. It might be best to list both, since otherwise people would wonder about its status in the south.

(6) Turkish is the first language in Turkey and Northen Cyprus.

(7) Briefly (numbers are taken from wikipedia);

Turkey --> 70M (-10M is accepted owning to Kurds)

Azerbaijan --> 22M (8M in Azerbaijan and 16M in Iran)

Turkmen --> 5,4M (It is 6,4, -1M due to Turkmens in Turkey)

Do you mean that they're being counted twice?

Germany --> 2M (We need to check that, I am not sure)

Bulgaria --> 0.7M (Hard to say if Turkish is the their first language)

Northen Cyprus --> 0.2M

All over --> 100,3M (Without Kyrgyz and Kazakh)

Turkish as first language (lets drop Bulgaria and Kurds) = 89,6M janus_tr 05:00, 19 October 2005 (GMT+1)

Actually, I believe the numbers for Azeri are 10M higher than this. kwami 05:55, 19 October 2005 (UTC)
Could you supply documents for it? janus_tr 15:00, 19 October 2005 (GMT+1)
Wikipedia 15:
  • North Azeri: 6,069,453 in Azerbaijan (1989 census). 7,059,529 total. 8,000,000 second-language speakers.
  • South Azeri: 23,500,000 in Iran (1997). 24,364,000 total.
  • Qashqa'i: 1,500,000 (1997).
= 33 million all Azeri dialects.
  • Turkmen: 3,430,000 in Turkmenistan (1995). 6,403,533 total. Second-language speakers = ? Of which you deducted 1M for reasons I don't follow. Ethnologue gives 925 in Turkey in 1982, but of course that could have changed recently.
Anatolian: Wikipedia has a figure of 46.28M native speakers in Turkey, but this is just the 1987 population times 85%. Taking instead the hopefully more accurate wiki figure of 20% native Kurdish speaking (so 80% native Turkish), and the current population of 69.66M, we get
= 56 million Turkish in Turkey.
Other countries:
  • Bulgaria 7.45M @ 9.4% ethnic Turk (mostly Turkish speaking) = 700k
  • Greece: ~130k
  • North Cyprus: population = 210k
  • Macedonia: 200k (1982)
  • Uzbekistan: ~300k (my estimate above)
  • Germany: 2.1M
  • Netherlands: 200k
  • France: 140k
  • all other countries: < 100k each.
Total Anatolian Turkish, all countries: ~60 million (0.4M Gagauz are not significant with these uncertainties), plus 14M second-language speakers.
Total Oghuz Turkish: 100 million, plus at least 22 million second-language speakers. If the whole population of Turmenistan speaks Turkmen, a total estimate (1st+2nd) of 125M would not be unreasonable.
kwami 20:10, 19 October 2005 (UTC)


Good. After each message, at least, we progress. [But please do not write about politics then these pages are getting really crepy -it is the weak side of wikipedia] Anyway...

Kurdish dispute: We need a reference point; I checked the other wiki articles neither a source nor a citiation... You say %20, I say %30, X says %60... Today, I checked Britannica there it states; "...The largest minority group is the Kurds, who probably make up at least 15 percent of the population." Kurds. (2005). Britannica Student Encyclopedia. Retrieved October 21, 2005, from Encyclopædia Britannica Online http://search.eb.com/ebi/article-9275335 Even Britannica uses the word "probably", the very reason behind it very simple. There is no scientific work on that. Hope it happens one day: but at least we, I think, can use it. Therefore, (70M x 15) / 100= 10.5M What do you say?

Osmanli, Turkish, Anatalion Turkish dilemma: Officially it is called "Turkish" and the "dialect of Istanbul" (that I speak) is regarded as its core. Therefore, if you would like to say something different than Turkish, you may write Turkish (Istanbul). Anatolian Turkish is just a dialect. For instance near the black sea region they speak Karadeniz dialect and at the west coast with a different dialect whereas "Osmanli" is misleading and wrong. Osmanli is out of my scope because it looks like Azeri people could/may understand it. You can easliy understand if you ever try to read in Osmanli :-)) In Turkey, since 1932 -Türk Dil Kurumu, Turkish Language Association, has been regulating the language. According to this governmental association what proper is the Istanbulise (I don't how to spell) Turkish; in France they have the same system. Seen thus, anatolian turkish is deceptive as well. Lets right "Turkish" and since the istanbulise turkish is, officially, Turkish we may put a note for that and may lead the readers to www.tdk.gov.tr.

Lets keep Oguz population seperate, I am OK with that.

Until we have response from Yakut people, lets take them aside.

No reliable knowledge about Bulgaria... What is that suppose to us national or offical language?! strangeeeeee

Should we list all the Oghuz family, then we have to consider http://en.wikipedia.org/wiki/Tatar_language as well. I have a lot fo diffuculties to figure it out but seems closer than Uzbek. If you want you can add all these middle-asia oriented turkic languages to the list.


My conclusion:

  • Language: Turkish
  • Offical language: in Turkey and in Cyprus (both at the north & south part, alink to cyprus seperation pages)
  • Natives:
    • Turkey: 59,5M
    • Germany: 2,1M (It is still their first tongue + Native)
    • N. Cyprus: 0,23M
    • In total: 61,33 M
  • First-tongue:
    • Turkey: 59,5M (Native)
    • Azeri: 7M (as you did put it forward, I didn't check it yet)
    • Turkmen: 5,4M (Turkmens all around, 1M already counted at Turkey line)
    • Germany: 2,1M (It is still their first tongue + Native)
    • N. Cyprus: 0,23M (Native)
    • In total: 74,23M
  • Second-tongue:
    • Azeri: 26M
    • Kurds: 10,5 (All of them capable of Turkish?!)
    • Bulgaria: 0,8M (7.45M @ 9.4%)
    • Greece: ~130k
    • Macedonia: 200k (1982)
    • Netherlands: 200k
    • France: 140k
    • In total: 38,17M

Total number = 112,40M :janus 05:43 (GMT+1), October 21, 2005

I will just give a quote from the Constitution of the Republic of Bulgaria (here):

Article 3

Bulgarian shall be the official language of the Republic.

I hope this ends all confusion about the matter. --Mégara (Мегъра) - D. Mavrov 17:05, 24 April 2006 (UTC)

Kypchak not included with Oghuz by integrationists

Thought the New Kypchak language article was interesting. Of course, it's nowhere near reality, but it does show what the conception of a broader standardized Turkic language could be: in this case, Kazakh, Kyrgyz, Tatar, etc, but not Anatolian Turkish or Uzbek. I think we're pretty safe in assuming that Turkic as a whole need not be considered as a language. kwami 10:46, 3 November 2005 (UTC)

I checked the article, as you mentioned "it's nowhere near reality". There'll be so many projects like this one; time shall show us which one(s) shall prevail.

  • You're right, Turkic is no language. There're Latin languages and of course the la nguage Latin itself -it is not the same case for Turkic: nobody speaks Turkic.
  • I shall say the same, even the concept of "Anatolian Turkish" is very deceptive and not trivial.
  • Anyway, Are we ready to re-write the article? Any coments for my previous post?
  • Along these lines, all these languages' subsets, language (politically speaking) and conceptions are too Euro-centric to me. But this is rather a different subject, may be I write write an article, euro-centric_views_in_wikipedia :-)))

janus 04:00, 5 November 2005 (UTC)

What do you want to rewrite? It looks like you want to change Turkish from 60M to 61M, but given the uncertainties, that seems overly precise.
As for 'Anatolian Turkish', we need some way of referring to the national language of Turkey. 'Turkish' is ambiguous, because it includes Gagauz etc, and you object to 'Osmanli'. I'm just looking for an acceptable term so we can talk about what should be included under 'Turkish'. kwami 09:15, 6 November 2005 (UTC)

Korean

Someone just revised the Korean population upward to 71M, but left no ref. However, this looks about right: S Korea 48.4, N Korea 18-20 (officially 23, not considering the famine), China 1.9 (probably not counting recent refugees), USA 1.8, Japan 0.7, Canada and Australia together 0.1. (Few Russian Koreans still speak the language.) This gives us 70.9 million using the lower estimate for North Korea. Perhaps a million or so more wouldn't be unreasonable, but I don't know how the Wikipedia article gets 78. kwami 06:44, 2005 July 19 (UTC)

Thai Language

Why is there only 20 M Thai native speaker while the population is 67 M right now. ALthough there are several dialect in Thailand right now. But everyone can use the standard Thai including old people and adolescents.

The numbers should probably be updated. But the 20M refers to native speakers, not everyone who can speak the language. (Take a look at Vietnamese, Burmese, Tagalog.) Many Thai speak something closer to Lao, or are otherwise considered to speak distinct languages, and have been counted appropriately; "Siamese" speakers are less than half the Thai population. (However, 80% of ethnic Chinese are counted as Thai speakers.) There are also quite a few non-Thai languages in Thailand: 3M Malays, a million Khmer, a million Hakka Chinese, etc.
However, the distinctions are Ethnologue distinctions based on intelligibility, which is not the standard we are following for this article. Perhaps all Thai, or at least all non-Lao Thai, should be counted? Do "Northeastern Thai" speakers consider their language to be Standard Thai, or Lao? kwami 01:58, 2005 July 21 (UTC)
I don't think there's any linguistic justification for regarding Thai and Isan as the same language, whatever the criterion- Thai and Lao are always treated as separate languages, and Isan is much closer to the latter. Mark1 06:45, 1 August 2005 (UTC)
I've gone ahead and added 15M Isan to Lao. Isan had not been counted at all! Split it up if you like; Lao proper has 3.2M native + 0.8M 2nd speakers = 4M total (1991 UBS) if you do. kwami 11:07, 2005 August 2 (UTC)

Bosnian and Serbian separate?

It is absurd to separate Bosnian and Serbian. Both the written and spoken languages are, as far as I am aware, virtually identical, and about half of the population of Bosnia are Serbs, who would be surprised to learn that they do not speak Serbian. I'd suggest that, given this confusion, we should merge Serbo-Croatian back together into a single language. john k 05:42, 2 August 2005 (UTC)

It's an absurd indeed. One of many absurds of that national conflict. The paradox is that sometimes even if you know how a person speaks, you still can't decide if this person speaks Serbian, Croatian or Bosnian. You need to know what his/her religion or origin is. If this person is orthodox, then his/her language is Serbian, if catholic, then Croatian, if muslim, then Bosnian. Looking at the numbers, it's clear that most Serbs living in Bosnia were counted as Bosnian-speaking though. Boraczek 00:09, 13 August 2005 (UTC)
I thought we weren't going by mutual intelligibility tests? They're separate national languages. Presumably the Serbs in Bosnia would say that they speak Serbian, not Bosnian. In fact, I believe that's how they were counted. It's also absurd to separate Malaysian and Indonesian, Belorusan and Russian, Turkish and Azeri, etc etc, but these divisions are accepted. Who among us is going to bestow the status of "language" on a particular lect? kwami 08:46, 2005 August 2 (UTC)
I think the extent to which they are "separate national languages" is questionable. Until quite recently, they were considered the same language. It is to be added that the numbers given here clearly include the whole Serbo-Croatian speaking population of the country as speaking "Bosnian," including both Serbs and Croats. john k 00:21, 13 August 2005 (UTC)
That is screwy then. I don't care much one way or the other; I was just afraid that if we allowed edits for personal opinion, then everyone would want special treatment for their favorite language. Serbian and Croat do, though, have a longer history of being considered separate languages than Bosniak, and there are other national standards out there like "Moldavian" which most speakers agree are just silly to consider as separate languages.
The Ethnologue figures are ridiculous: 35% of the population speaks Croatian, Romani, or Serbian, but 100% speaks Bosniak. I hadn't looked this closely before, and only saw that the 3 Slavic languages were listed separately, without adding up the numbers. If you subtract the other languages from the total population, you get 2.7M. The actual number should be smaller, though, if you go along ethnic lines, for only ~45% (1.8M) are Bosniak.
What do you think we should do? Reduce the Bosniak figure to 2.7M or so, or add that figure to one of the other languages? If the latter, is it Serbian or Croatian that gets the boost? Since there aren't any good linguistic reasons for the split in the first place, I don't know how we decide whether Muslims are more Catholic or more Orthodox. kwami 01:36, 2005 August 13 (UTC)
I'd say standard Croatian and standard Serbian are separate national languages in the sense that they belong to separate nations and they are regulated by different institutions, so they'll develop "independently" (in fact, Croatian linguists try to make Croatian as different from Serbian as possible). Still, the difference between standard Croatian and standard Serbian is similar to that between standard American English and standard British English or even smaller. Maybe the case of Malay and Indonesian is similar, but I'm not acquainted enough with it to be sure. Boraczek 01:18, 13 August 2005 (UTC)
The same is true of Bosniak: separate national standard, separate institutions, a concerted attempt to make the language distinct, for example by using large numbers of Turko-Persian words and pronouncing etymological aitches which have been dropped from Serbian and Croatian. kwami 01:36, 2005 August 13 (UTC)
While we discuss whether or not a separate entry is appropriate, I'll at least fix the double counting that John pointed out. How about a range of 1.8M (ethnic)-2.7M (reasonable interpretation of Ethnologue), ranked at its midpoint of 2.25M? (The figure the Wikipedia article uses is 2.5M.) kwami 04:33, 2005 August 13 (UTC)
Good job! :-) I think we can either split Serbo-Croatian into Serbian, Croatian and Bosnian/Bosniak and list them separately, possibly trying to avoid overlapping, or give one entry "Bosnian/Croatian/Serbian" (I guess this name is easier to swallow for separatists than "Serbo-Croatian"). In my humble opinion both solutions are acceptable. BTW if Serbia and Montenegro split into two separate states, we'll certainly hear much more about the Montenegrin language. Boraczek 11:00, 13 August 2005 (UTC)

Dividing up the table

I just split up the table both for ease of navigation and for ease of editing. As for the numbers I picked, it's a logarithmic scale: languages with 106 (1 million) speakers, 106.5 (~3 million) speakers, 107 (10 million) speakers, 107.5 (~30 million) speakers, 108 (100 million) speakers or more. That way there are similar numbers of entries in each table, though the first is rather shorter and the last somewhat longer than the others. Anyway, that's why the 3 and 30 are there, in case anyone thinks they're odd numbers to use. kwami 08:55, 2005 August 2 (UTC)

Bajar?

There is no "Bajar" language listed in Ethnologue for Malaysia or Indonesia. There are too many speakers for it to be Bajaw, and Banjar should be included in Malay. Any ideas? If not, we should probably delete this. kwami 10:41, 2005 August 2 (UTC)

Deleted. kwami 07:10, 2005 August 9 (UTC)

Maithili

Why is Maithili split out, but the other Bihari languages (e.g. Bhojpuri) are included in Hindi? I would suggest that the Bihari and Rajasthani languages are perhaps distinct enough, and considered distinct enough, from Hindi to warrant not being included. This in contrast to, say, Awadhi, which is usually considered a dialect. john k 06:08, 4 August 2005 (UTC)

All Bihari dialects are counted by many as Hindi, but since 2003 Maithili has had official status. I agree this is bizarre, but no more bizarre than giving Urdu separate status. I think we need to ask: do Maithili speakers consider their language to be Hindi, or separate? Do speakers of the other Bihari dialects consider their languages to be Hindi, part of a Bihari language with Maithili, or separate? I don't know. Do you have evidence as to how this is perceived? If Maithili is perceived as distinct, but Bhojpuri is still perceived as a Hindi dialect, then the article is fine as it is. kwami 06:25, 2005 August 4 (UTC)
How bout we put Maithili back in Hindi until we figure this out? It won't be the only language with more than one officially recognized standard; think of "Chinese" & Cantonese. Perhaps Maithili hasn't been used officially long enough for attitudes to have changed much in this area. kwami 06:33, 2005 August 4 (UTC)
BTW, I don't believe Rajasthani is considered a single language. Marwari may be, but splitting it out and leaving the rest in Hindi would be like splitting out Maithili. kwami 06:58, 2005 August 4 (UTC)
It seems to me that the Rajasthani and Bihari languages are in a situation where they are sometimes seen to be part of Hindi, but often seen as their own languages. I still think this contrasts to Awadhi or Haryanvi, which are almost always considered to be mere dialects of Hindi/Hindustani. john k 05:56, 11 August 2005 (UTC)

Chart request

Would someone skilled in m:EasyTimeline be willing to make a chart of these? – Quadell (talk) (sleuth) 13:43, August 5, 2005 (UTC)

Belarussian

It says "Indo-European, Slavic, deposed and executed 1314"

What does that mean?

Looks like a copy error that was added in with the language family data. I'll delete. kwami 06:14, 2005 August 9 (UTC)

"Berber"?

Should we lump all the Berber languages together, as we have with Karen, Chinese, etc.? Just a thought --kwami

Not sure. Do the Berbers consider themselves to speak a single language? They are, unlike the Chinese and Karens, geographically isolated from one another, which I think makes a difference. john k 05:59, 11 August 2005 (UTC)
I don't know. There's a certain amount of pan-Berber identity, but maybe no more than pan-Turkic or pan-Slavic. I was hoping someone out there might know better than I. kwami 06:36, 2005 August 11 (UTC)
I was just looking at a non-Ethnologue (sigh, how dependent we are on internet sources) language reference in the library today, and it basically said that all the Berber languages were one language, with the possible exception of the Tuareg dialect. john k 23:35, 11 August 2005 (UTC)
Based on the info in the other Wikipedia articles, it seems that there is a fairly strong pan-Berber identity among speakers of the northern lects of Morocco & Algeria, and some attempt to standardize the language. Based on that, I unified Northern Berber. Also adding a unified Tuareg, which makes the list at 1.2M. kwami 20:07, 19 September 2005 (UTC)

remove warning?

We seem to have come to a general consensus on most things here. I've also verified languages down to 2.3 million speakers with Ethnologue 15, and marked those that need further confirmation (because E does not give figures, etc.) (Basically, all those data with the word "million" in them have been confirmed this way.) So, what do people think, remove the warning, or replace it with a general warning that some data is dated, and that the definitions of many languages is fuzzy? kwami 08:20, 2005 August 9 (UTC)

Ethnologue is a bit dubious, though, isn't it? For instance, our number for Thai includes only the numbers for the Ethnologue Thai language. The "Northern Thai," "Northeastern Thai," and "Southern Thai," which between them have another 25 million or so speakers, are excluded. This seems problematic to me. The same problem perhaps adheres to others. john k 02:28, 11 August 2005 (UTC)
I don't see what's problematic. If your point is that there's never going to be agreement on where to draw the line between languages in every case, then that's a good reason for mentioning that in the introduction. If your point is that there are more reliable sources than Ethnologue, or that Ethnologue's standards differ from the consensus among linguists, then that would be news to me. Mark1 03:05, 11 August 2005 (UTC)
I'm saying that including all the Chinese dialects/languages together as one language, but separating out the Thai dialects of Thailand is completely inconsistent. Especially since the other Thai dialects, all of which have more than one million speakers, are not listed separately. And Ethnologue is certainly not authoritative. It is a generally reliable source, but it has weird issues around the margins - there are numerous areas where we differ from Ethnologue, many of them in this article. john k 04:43, 11 August 2005 (UTC)
I think we can work such these out, now that we've decided to go on speaker conception and not mutual intelligibility. Thai, Turkish, and Berber are the three that come to mind as being in need of review. Within Thailand, Isan is commonly conceived as closer to Lao than to Siamese. We'd need speaker input here. But you're right, the other Thailand dialects probably should be in with Thai - that's something I can change tonight. How about including everything except Isan, and leave Isan in with Lao where it is now? Haven't had much real input on Turkish, but that can wait. kwami 05:43, 2005 August 11 (UTC)
That seems fine for this particular case, but who knows how many similar cases there are. I just removed the "Deccan language" which appears to exist only in Ethnologue, for instance. (At least, I think I removed it - it's possible that it was one of the changes that got erased by the edit war). I'm not sure what the issue is with Turkish - I think Azeri is usually conceived to be a different language. Certainly Turkmen is. I'm not sure about Berber - I don't think it would be right to act as though there is a single Berber language, but I'm not sure that "Tamazight" and "Kabyle" should necessarily be conceived as their own language, either. john k 05:50, 11 August 2005 (UTC)
Yes, Deccan is still gone. Sorry, that wan't intended to be an edit war. I overwrote some of my own changes by having multiple windows open, and only noticed your changes when I went to the history page to recover them. I should have been notified that there was an edit conflict; don't know why I wasn't.
Best to leave Turkish and Berber as they are, then. By the way, all figures (except as noted) are now confirmed through Ethnologue. Granted, it's not the most reliable source. But at least the quality is now reasonably consistant! kwami 07:09, 2005 August 11 (UTC)

Yes, that's true. I didn't mean edit war, so much as edit conflict - I was assuming that it was accidental. I'm beginning to think the Berber languages should be combined. john k 23:37, 11 August 2005 (UTC)

Speaker identification would have several languages, but not as many as Ethnologue. From the Tuareg I've met, it seems that they consider their language to simply be "Tuareg". Also, there's a movement to standardize the northern Berber lects of Morocco & Algeria as a single "Tamazight" language. (This according to the Wikipedia articles, but you can certainly see evidence of this on the web.) Therefore I unified those two. The other Berber lects are too insignificant numerically to worry about. kwami 20:37, 19 September 2005 (UTC)

native English-speakers in Switzerland?

Yes, 1% of the population according to Ethnologue 15, presumably expats.

I've been removing 'significant communities' if the language is not native to the country and is less than 1% the population of the country per Ethnologue 15. So far I've covered America, Europe, Oceania. kwami 06:43, 19 September 2005 (UTC)

Request to move

I haven't even attempted to keep track of all the 'significant communities in' entries that people have been adding. I think a large part of our problem is the title of this article. People read "List of languages by total speakers" and foolishly believe that it is a list of languages by total speakers.

Wanna move this to List of languages by native speaking population or List of languages by number of native speakers (which currently redirects here)? We should probably also rename List of languages by total native speakers and link it to this article as an example of the problems involved. kwami 07:11, 2005 August 13 (UTC)

List of languages by number of native speakers makes more sense, since that is what the article is. john k 04:55, 17 August 2005 (UTC)
Okay, I put in a request to move. They're backlogged, which should give anyone who objects time to chime in. I just reverted another confused edit; they'd moved French up according to its total number of speakers.
Admin: The French edit, and many like it, assumes that the data are for the total number of speakers. They will correct the figure for the language they know, then move it up in the ranks. Problem is, we don't know the number of non-native speakers for most languages, and in any case this is much harder to estimate, and would result in continual edit wars. The Swiss edit, and many like it, assume that if people learn English in school, then there is a significant community of English speakers in that country, again forgetting the native-speaker part. I believe that much of the trouble could be avoided if the word "native" were in the title. kwami 19:38, 2005 August 17 (UTC)

This article has been renamed after the result of a move request. I have renamed this list of languages by number of native speakers as per the request. I did not do anything with the similarly named list of languages by total native speakers as it is not clear from the above what, if anything, you would want done. Dragons flight 23:18, August 22, 2005 (UTC)

Sylheti

Sylheti should be incorporated into Bengali, shouldn't it? john k 04:55, 17 August 2005 (UTC)

Yes, it looks like it should, especially with an alternate name like "Sylhetti Bangla". Can you tell if the population figure in Ethnologue for Bengali includes Sylhetti and Chittagonian? I don't know whether I should increase the number still further. kwami 07:25, 2005 August 17 (UTC)
Tell you what, I'll assume that separate listings means the numbers are separate as well, and you revert if you think I'm mistaken. I won't add to the total speakers, as I assume that the Sylhetti are already counted there. kwami 07:35, 2005 August 17 (UTC)
Yes, I think this is probably correct. Ethnologue is so fucking random about when it decides that something is a dialect, and when it is a language. john k 06:51, 20 August 2005 (UTC)

Farsi

Is Farsi there? I searched for Iran and didn't find anything.

It's the second language to come up in a search for "Iran", after Arabic. Listed under its English name of Persian.

Cantonese, Shanghainese, Southern Min with "(no recent data)"

These three chinese languages are the ONLY ones without numbers shown. Really I don't think it matters how unrecent the data is if you put a clear date. All of these languages have numbers on their own specific pages, and should be listed here as well. Frencheneesz 13:36, 2005 August 18 (whats UTC?)

Done. kwami 07:21, 2005 August 21 (UTC)

Egyptian Arabic

I think the number (46 million) in the article page needs to be edited to 77 million to match the country population, or it might be less since there are few minorities who speake other languages.

"Egyptian Arabic" is a specific dialect, not just any Arabic spoken in Egypt. Many in the south of the country speak something closer to Sudanese Arabic, and so aren't counted. kwami 11:54, 2005 August 30 (UTC)
I'm Egyptian, and I know there is no co-official language in Egypt beside plain Arabic. There are minorities in Egypt, such as Nubians and Ababda-Beshareya who total less than 100,000 in a 77 million population. Even these minorities speak Arabic.
By 'plain Arabic', I assume you mean Modern Standard? kwami 06:19, 24 October 2005 (UTC)
Also, you have increased the population of Arabic without sources or explanation. As stated clearly at the top of the page when you edit, all such changes will be reverted.
I will add Egyptian Arabic back in as the national language. You're probably correct in saying that it's not co-official. But it is used in the media in a way that other forms of Arabic are not in their respective countries (except perhaps Hassaniya). The source for this, as anything thing else that hasn't been decided on the talk pages, is Ethnologue 15. kwami 07:13, 24 October 2005 (UTC)

Renaming data article

The sister article List of languages by total native speakers is a compilation of published lists, such as the CIA and Ethnologue, useful as a source of data. I've suggested renaming it to better reflect its contents on its talk page, but there's been no response. How do people here feel about renaming it, and what would be a good name? "Language population data" maybe? kwami 18:55, 2005 August 30 (UTC)

What is "significant"?

We currently say that a "significant" presence of a language in a country is 1% of the population. However, that is not what we actually have. There are many languages in India and China that would need to be taken off the list, because they aren't official and are spoken by less than 10-13 million people. It would also be weird to have a language only listed for Burma, when the main population is in China, because it makes up more than 1% of the Burmese population but less than 1% of the Chinese population. So obviously this 1% thing isn't going to work if we take it literally. Or is it only supposed to apply to immigrant languages?

Should we do that, and explicitly say 'immigrant languages'? Or do we want some other criterion? (Someone just added Urdu to the US, and there's no way there are 3 million Urdu speakers there.)

kwami 02:02, 2005 August 31 (UTC)

  • the communities is better (cause it includes communities that have longe existed there and are a minority), but many countries listed in many languages have percentages smaller than 1%. I noticed that in some languages. About that case of Burma and China, well... it's an exception I believe. If it is spoken by less than 1% it surely is a language without importance. While 1% is a percentage that makes the language useful.
  • Significant is part of the society of one given country speaking that language, it is much better and realistical than numbers with zeros, 10000 in one country has not the same power as in another, while in one is a language without importance because the country has millions of speakers in other lang., and in the other the population is 20000 and 10000 is a lot, so it is a very important language there. It can be brought by immigrants or have existed there for a long time, but it is not official. --Pedro 01:03, 1 September 2005 (UTC)
I'm not sure what you mean by 'communities', or if that answers the question. We include all languages spoken by more than 1M people, and these include languages spoken only in China or India which have under 10M speakers. That's less than 1% of the population. So how do we decide which countries to list a language under, if we don't follow the 1% rule stated at the top of the page? When is a language to be listed when it's under 1%, and when is it not to be listed?


Ok. 1 % is a figure that makes a language important, but we also know that 1 Million is an important number for a language to survive. So there you go: 1% or 1 M.... -Pedro 19:45, 2 September 2005 (UTC)

Welsh

Where is Welsh on this Page?

Welsh is spoken by half a million people, and the cut-off for this article is one million (since the number of languages grows exponentially as the population decreases, a limit of half a million might double the size of the article). kwami 19:13, 2005 September 2 (UTC)

Urdu and Hindi

Shouldn't Urdu and Hindi be listed together? Especially if Chinese is listed as one language. Sukh | ਸੁਖ | Talk 20:04, 2 September 2005 (UTC)

By mutual intelligibility, Ukrainian is a dialect of Russian, but people would have a fit if we classified it that way. And "Hindi" isn't a real language either: many forms are not mutually intelligible, even if the diversity isn't as great as Chinese. Also, for most of the languages of the world, there isn't data for which idioms are mutually intelligible. (What do you do with Arabic?) So we decided to go by social criteria: a language is what its speakers think it is. Most Chinese and Arabs think that Chinese and Arabic are languages, so that's how we have it. Most Standard Hindi speakers would object to their language being included under Urdu, so we've kept them separate. kwami 22:03, 2005 September 2 (UTC)
That's true I suppose. But it negates the fact that Hindi and Urdu are much more mutually intelligible than either Chinese dialects or Arabic. The big difference is that they use two different scripts - is that enough to divide a language? Sukh | ਸੁਖ | Talk 22:36, 2 September 2005 (UTC)
The big difference is that they are considered to be separate languages by the people who use them, and have separate standards as official languages. The technical vocabulary is quite different. Level-headed people will admit otherwise, but most Croats will take offense at being told they speak Serbian, and I've even offended Indonesians by calling their national language Malay. (And Indonesian and Malaysian are very very close, basically just a matter of tech vocab. Not even many spelling differences anymore.) I once unified Hindustani in this article, and split up Chinese, and it was a mess. I fought with half the people here. And they were right: there was no way to be consistant. (Is Slavic one language, or more than one? How do you tell?) We finally decided to use speaker identification as the criterion for what is a "language". kwami 23:17, 2005 September 2 (UTC)


Urdu and Hindi are not separate languages and are only considered to be separate languages by rhetoricians who wish to distance Pakistanis from Indians. I would know, I speak them...both? The pronounciation is exactly the same. Each is perfectly understood by speakers of the other.

They are simply written in a different script, and recently religious and socially motivated individuals have sought to bring Sanskrit vocabulary to "Hindi" and Farsi vocabulary to "Urdu" in an attempt to create the impression that speakers are distinct ethnic groups. This is simply not true.

Swahili?

Surely there are enough native Swahili speakers to make this list.

Do a search for SWAH- and it's the first thing that pops up. Don't have a good native speaker estimate, however. kwami 18:35, 2005 September 9 (UTC)

How many people actually speak Mandarin?

In December 2004 the China Daily released an article describing a survey about how many Mandarin speakers there actually are in China. It turned out 18% spoke Mandarin at home, 42% spoke it at work or school, and 53% could speak it. Since 18% of China's population is about 235 million people, it follows that Mandarin should be placed behind Hindi, English, and Spanish.

It appears that the article is using "Mandarin" as the English translation of pútōnghuà, or standard Chinese. The equivalent for English would be to claim that there are only 10 million English speakers in the world, because only 15% of the population of England speaks RP, plus a few more in New Zealand, but practically no one in Australia, the US, or Canada does. The normal meaning of "Mandarin" is broader than just pútōnghuà, just as "English" is broader than RP. kwami 03:01, 14 September 2005 (UTC)
Is it? It contrasts Mandarin with "dialects," which in Chinese context means languages such as Yue, Wu, and Hakka; the article gives Cantonese (Yue) as a specific example of such a dialect. Besides, "can communicate with" implies understanding; although few Britons speak RP natively and few Americans speak GA natively, all can understand both accents.
No, "dialect" doesn't mean language, not even in the Chinese context. Cantonese, for example, includes many different dialects. In some Cantonese dialects you inflect the verb for aspect (by changing the tone), which you don't do in Standard Cantonese. Within Mandarin, some dialects have three tones, and others have five. In Xi'an Mandarin, hua 'flower' is pronounced fa; other words are pronounced va where Standard Mandarin has ma. In other Mandarin dialects, wǒmen "we" specifically excludes the person spoken to; there is a separate word zámen "we" when including the second person. There are dozens of independent sources claiming that somewhere around 80% of Chinese speak a form of Mandarin. Therefore I assume that this newspaper article is wrong. Newspapers are frequently sloppy in their reporting, and often in an attempt to simplify things they misrepresent them. The only way I can see to make sense of your article is to assume that "Mandarin" is a poor translation of pútōnghuà. kwami 18:04, 15 September 2005 (UTC)
Putonghua in fact means the official spoken form of Chinese, but the statement "18% spoke Mandarin" at home does not mean that only 18% of Chinese speak Mandarin as their native language. To correctly interpret the "18%", you need to understand the context. Except for people in the northern part of the country, the rest of the country usually speak two dialects -- Mandarin at work/school, and the "native" tongue at home. In schools, teachers use Mandarin as the medium of instruction, and students usually communicate using Mandarin because that is the common language that is understood by all.
"Speaking a language at home" does not necessarily mean that a person can use it proficiently. This may be weird concept, but it is common that many Chinese can speak their "home dialect" only within the family context, and they have to "revert" to Mandarin to talk about things in general. This is main due to the fact that Mandarin is what is taught in schools, while the "home dialect" is used at home only.
This is similar to say Italian immigrants who came to America. The parents were brought up in Italy and so they know only Italian. Their children are raised in America and receive American education, and thus speak English in virtually all contexts except at hoome. These children probably know Italian because they have to communicate with their parents in Italian, but they probably are not considred "native Italian speakers" since they may not be able to Italian outside of the family. 9/18/2005
If you add up the populations of just three Mandarin-speaking provinces, Sichuan, Henan, and Shandong, you get 35% of the population of the country. However, there were these comments in the Chinese article: 'The term "Mandarin" can also refer to Standard Mandarin, which is based on the Mandarin dialect spoken in Beijing.' and '"Mandarin" usually refers to only standard Mandarin in everyday usage. The broad academic concept of "Mandarin" encompasses a large number of linguistically related dialects, some less mutually intelligible than others, and is very rarely used outside of academic circles as a self-description. Instead, when asked to describe the spoken form they are using, Chinese speaking a form of Mandarin will describe the variant that they are speaking, for example Sichuan dialect or Northeast China dialect ...' That is what I suspect the article is doing. However, in English "Mandarin" is normally a translation of 北方話, and "Standard Chinese" is used for the official standard. kwami 19:48, 18 September 2005 (UTC)
No, Mandarin does not mean 北方話 (Beifanghua), but rather 普通話 (Putonghua). I was actually arguing for you, but someone you have interpreted my message to be against you. The Putonghua spoken in each area of China definitely has variance, which is influenced partially by customs. (It's similar to English: Pop, soda and soft-drink means the same thing, but some terms are more prevalent in one area than another; you still won't consider these to be different "dialects" of English.) However, since 99%+ of the words and grammar is identical, there is no reason to consider "Mandarin to be a large number of linguistically related dialects." You are NOT going to find someone who will say that "I speak Sichuan Mandarian" -- this is very absurd, as Mandarin is just Mandarin. (Will you hear someone say that "I speak the Bostonia dialect of English." You can clearly hear that someone has a Boston accent and use words that are more common in New England, but I highly doubt that the person will stress that he/she speaks Bostonian English instead of Midwestern English or California English.) The article simply indicates that at home, people **DO NOT** speak Mandarin, which is very common in China. People in Sichuan will speak BOTH Mandarin and Sichuanese, and the use of Sichuanese is generally limited to home. When someone goes to school or work, he/she will use Mandarin. Even in Guangdong where a large number people speak Cantonese at home, it is difficult for them to tell you whether Cantonese or Mandarin is their "native" language, since they in fact use both. The term "native language", in my opinion, means the language that you use in your daily life when you were brought up, and you can use the language fairly fluently. I don't see why an individual has to be classified to speak only one "native" language.
For the definition of Mandarin used in this article, and for the normal usage of the name in English, Sichuanese is a dialect of Mandarin. 北方話 is the language; 普通話 is the standard. Saying that a Sichuanese speaker does not speak Mandarin because they don't speak the standard is like saying a Bostonian does not speak English because they do not speak the standard. (And no, they will not say "I speak Sichuan Mandarin", but they will say "I speak Sichuanese", which was the point of the quote.) As for more than one native language, sure, lots of people have more than one. No problem there. kwami 02:59, 19 September 2005 (UTC)

Assyrian

We have an edit war going on here, which I think we should resolve here on this discussion page rather than simply reverting each other.

This article is based primarily on Ethnologue 15. Now, Ethnologue is hardly the most reliable resource, and I think we're all aware that we can do better, but at least it provides a modicum of stability to a contentious field. We have changed several languages from what Ethnologue has published, but these changes have been discussed here so that there is basic agreement to the changes.

According to Ethnologue, there are 4.5 million ethnic Assyrians. However, most of these people speak Persian or Arabic as their native/home language. Some may speak some Assyrian as a second language, but that's not what this article is about.

Again, according to Ethnologue, Assyrian has 210,000 native/home speakers out of this ethnic population of 4.5 million. (That is, about 4% of the ethnic population.) There is a similar number of Chaldean speakers, and lesser numbers of other idioms such as Turoyo. If these people consider that they all speak the same language, then they should all be lumped together. However, all the Aramaic languages/dialects total only ~ 534,000. Since the cutoff point for this article is 1 million, Aramaic/Assyrian/Chaldean/Syriac does not make the cut even if it is considered a single language.

Assyria 90, if you have evidence that the number of speakers is greater than Ethnologue reports, please present that information here. However, I personally will consider any unsubstantiated attempt to list 4.5 million speakers of Assyrian as politically motivated, and will revert it: Mark isn't the only one doing this! If you wish to convince people of your claims, please provide something more than your personal say-so. We can't take seriously the claim that you personally know all 4.5 million Assyrians and have verified their native language. kwami 23:43, 14 September 2005 (UTC)

I get really sad about this Etnologue report that is all false..Nearly 95% of all Assyrians speak their originally language so this makes me very upset.We are about 4.5 million assyrians worldwide.The assyrians are an patriotic people and learns their children very strict their language.There are 4.5 million assyrians so how can you write that only 210 000 speaks assyrian?We have persian.arabic as our second language I dont know what has happend to the net but its all away from the future..Im really sad that people are against me when im right..--Sargon 16:02, 15 September 2005 (UTC)
We are not against you, Sargon. I personally would be quite pleased to have millions of Assyrian speakers in the world. Who needs more English or Arabic speakers? There are hundreds of millions of them already, which is quite enough. However, I see several possibilities for your opinion:
  1. You have good evidence for the number of Assyrian speakers. If this is the case, please share your evidence with the rest of us.
  2. You have no evidence, and you say what you do only because you want to believe it.
  3. Your personal experience leads you to believe that it might be true. However, we need better evidence than that. If you have made this observation, other people have as well, and you should be able to dig up higher figures in a reputable (that is, not political or nationalist) source.
I'm a native English speaker, but that doesn't mean my opinion as to the number of English speakers counts for anything, even in my own country. kwami 17:49, 15 September 2005 (UTC)

Families

At my screen the rightmost column on Chinese spans on a page and a half (!). At the same time, we have a second column about the language family. It's as wide. I suggest to remove that column. If I were to look for statistical info about number of speakers, most likely I wouldn't give a buck about the family of the language. And if I did, I would just click the link for the lang and see. Currently the family column takes up important space. If we remove it, the list will become about two times shorter (at 1024x768). Does anybody disagree? (I'm User:logixoul, I'm having problems with my cookies at the moment) --85.130.99.211 20:26, 19 September 2005 (UTC)

It looks fine on my display. Anyway, this only affects the first dozen languages. kwami 09:33, 20 September 2005 (UTC)

Portuguese

Recalculated Portuguese. The main discrepancy is that no one has given a source for the claim that 60% of Angolans are native Portuguese speakers. Since 99.5% of Angolans speak some other language as their native tongue, I find the figure doubtful. Here's my calc: Angola 52k, Cabo Verde 15k, Mozambique 30k, Sao Tome 2.5k, South Africa 617k, India 250k, Macao 2k, Paraguay 636k, Luxembourg 100k, France 750k, Switzerland 86k, Andorra 2k. That's just under 2.6 million, which just offsets the 2.6 million Brazilians who do not speak Portuguese as their native tongue. So, Brazil 186.1M, no adjustment; Portugal 10.6M, less 0.5M non-native, and you get 196M. This agrees with the WA 2005 figure of 195M (which is rounded off to the nearest 5M).

Bengali also is listed with 196M. However, the bulk of that figure is now ten years old. Given that Bangladesh has a high growth rate, it should be well over 196M by now, and therefore I've ranked Bengali ahead of Portuguese. kwami 09:33, 20 September 2005 (UTC)

  • Sorry that is completly innacurate. The data for Angola are official data, not any claim.

in 1983: native Portuguese speakers: 60% (of 100%) native Portuguese speakers capable of speaking an African language: 50% that gives a max. of 70% speaking African languages. In the capital of Angola, Luanda: 75% are native speakers of Portuguese.

You really dont know the country believing that 99.5% of Angolans speak an African language. So before reverting anything, or saying that you dont know any source. And you've asked me once about the data, but you simply ignored.

Here's a link: http://www.linguaportuguesa.ufrn.br/pt_3.4.a.php -Pedro 23:04, 26 September 2005 (UTC)

Yes, I've asked you before, but you never responded adequately that I can recall. Here's a record of the relevant portion of Pedro's source for future reference:
O português é a língua oficial de Angola. Em 1983, 60% dos moradores declararam que o português é sua língua materna, embora estimativas indiquem que 70% da população fale uma das línguas nativas como primeira ou segunda língua.
"Portuguese is the official language of Angola. In 1983, 60% of the inhabitants declared that Portuguese was their mother tongue; even so estimates indicate that 70% of the population speak one of the native languages as a first or second language." [pardon my pathetic Portuguese; please correct this if it needs it]
I don't know how good this source is, but the numbers for other countries seems entirely reasonable. Mozambique, for example, is listed at < 1% native Portuguese speaking. So I'm inclined to accept this, and that Ethnologue is completely wrong for Angola, and am bumping Portuguese above Bengali in the rankings. Any other input, anyone?
However, although the Angolan figure may have grown beyond ~60% in the last 22 years, especially considering the ethnic mixing due to regugees from the civil war, we only have this percentage to go on, and that adds ~6.6M to the Portuguese population, not 12M, which is more than the 11M population of Angola. So I'm revising Pedro's total figure downward. If you have other refs, Pedro, please document them here so we have a record of why our figures deviate so much from Ethnologue/World Almanac. kwami 00:24, 27 September 2005 (UTC)
  • That’s because ethnologue uses old numbers or rough estimative. But who said that the 208 numbers is only from Angola? (That link was only to support that your numbers were completely incorrect for that country) I've said there is a listing in the Portuguese language article, and unfortunately many origins were not added. In Mozambique there are also official numbers, the best numbers that can be seen in Instituto Nacional de Estatística de Moçambique (National statistics institute of Mozambique): www.ine.mz.gov, but most in here use Portuguese as a second language although the Portuguese speaking population is growing due to tribal mixage in the urban areas. In 2003 or 2004, the number of speakers (first/second lang.) were already in 47% from what I've seen in the ministry of Education of Mozambique. But that's an estimative, and there are census numbers from 1997 (so better numbers).
  • You've translated Portuguese correctly (i'm impressed, do you know Spanish, Portuguese or you simply used babelfish? - if so, you were lucky :P). 60% is also a fine number for today's Angola, the rural population speaks a Portuguese pidgin, which is useful for speaking with neighbouring villages, the war only stopped very recently, so 60% is fine. It is a nation wide language in Angola, and that can be seen in Angolan music for instance. I don't mind with the 203 mi number, but I really think a number bellow 200 mi is fully incorrect and only seen in English sources. So the current number is good enough although incorrect by 5 mi (not important), IMO. BTW, I think you are doing a great job in this article, I hope someone would do the same in the list of country by date of independence/nationhood. -Pedro 14:05, 27 September 2005 (UTC)
I know a little Spanish, so I just had to look up moradore and embora. Wasn't sure I got embora right. Anyway, glad you're fine with it. I'd be happy to work with you on further revisions, but I've noticed that many of the individual language articles are not credible, as they tend to exagerate the population. Turkish, for example, was listed in its article at 150M! (I've changed that, though there will probably now be an edit war.) The Portuguese page lists 10% of Moçambique as native speakers, but the ref you gave me had < 1%. Seems that we don't have very good data, as is true with most languages, I suppose. kwami 18:36, 27 September 2005 (UTC)
  • read the rest of the article about mozambique, but the better is to look in INE for the reason why there's 9% in the article. In that link (a Brazilian university) there's 6% for native speakers in Mozambique. Which is true, but is is mostly based on ethnicity. The 9% is the number of people that uses Portuguese at home. Both numbers are figures in the census of 1997 in Mozambique. 1% - 3% was in 1983.

I've a problem connecting to INE, the site is http://www.ine.gov.mz/ (maybe they are down) you may link on censo de 1997 and search "língua", or something...

Embora means several things, in that case it means "meanwhile", "moradores" (plural), "morador" (singular) is a person that lives in a given place, in this case, Angola. The most common word for countries is in fact "habitantes" (inhabitants).--Pedro 20:36, 27 September 2005 (UTC)

1% rule enforced

Okay, verified all languages for all countries, per Ethnologue 15, and took out anything less than 1%, with a few exceptions: languages native to one country are listed there regardless; languages slightly under 1% in a second country may be listed if that number is a third or more of the total speakers; and a couple judgment calls like leaving Korean in Japan, since they are the most significant minority in mainland Japan even if somewhat under 1%; and Portuguese in Namibia, even though I don't have figures, because there are significant numbers in both Angola and South Africa. I suggest reverting any attempts to add additional countries (like Mongolia for Russian, or the US for Panjabi) unless these additions are supported.

However, sometimes Ethnologue just states that an immigrant language is found in a country without giving figures. Usually this means the numbers are small, but not always, so I may have deleted a few countries I shouldn't have. In other cases, as with Philippino emmigrants and Ivory Coast immigrants, numbers are given for nationality but not language. In a couple of these cases I left a country in with a question mark. kwami 13:19, 22 September 2005 (UTC)

Akan

Should the Akan languages be unified? Currently Baoule, Anyi, and Brong have separate entries. kwami

I think that the internal diversification of Akan is comparable to that of Yoruba. The Yoruba dialect cluster is taken as one unit in this list and so could Akan, too.
I don't feel very strong about it, though — I think issues like this mainly serve to show how difficult it is to keep politics (or language policies and language attitudes) out of this list. Yoruba, for one, is mainly here because of the existence of Standard Yoruba, the written form and the prestige dialect. Standard Yoruba surely is a unifying factor in the sociolinguistic context of Yoruba-speaking areas, but is that a reason to lump all Yoruba lects together to get to some 18 million native speakers? I know that Ethnologue does it, too, but at the same time I know of at least one dialectological study (Adetugbo 1982) which says that, given the internal diversity, it's a bit awkward to speak of 'one' language.
I am not trying to say that Yoruba should be removed from the list, but I'm interested in other editors' opinions about this issue. — mark 11:05, 23 September 2005 (UTC)
Thanks for your answer, but we aren't trying to keep language attitudes out of this list - precisely the opposite. Based on the degree of intelligibility you're speaking of, Slavic should be a single language, and Chinese should be several languages. But we're going on speaker identification: Chinese is one language because that's how they see it, and the Slavic national standards are separate languages because that's how they see it (although I know one Ukrainian who says he speaks the Ukrainian dialect of Russian). The Mokole in Benin say that they speak Yoruba, as far as I know, so by our standards Yoruba is a single language regardless of degree of intelligibility. So do the Baoule etc. consider their language to be Akan? kwami 23:54, 26 September 2005 (UTC)
Aha, that's something I misunderstood about this list. In that case, no, I believe the Baule don't consider their language to be Akan. I.e. unlike the case of Yoruba, where there is a common Yoruba identity despite linguistic discontinuities, there is no common Akan identity which would justify listing it here as one language. — mark 07:30, 27 September 2005 (UTC)
Thank you! That should settle it -- except, do the Akan groups in Ivory Coast (Baoule, Anyi, Brong, etc.) consider themselves to be distinct languages from each other as well?
(I had tried converting this list over to one based on mutual intelligibility, and it was a real mess!) kwami 07:36, 27 September 2005 (UTC)