Talk:List of languages by number of native speakers/Archive 5

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Farce?

The ranking looks like a farce. If you take the Ethnologue column, you get a ranking that doesn't match the "Ranking" column. If you use the other total speakers column (CIA estimates), you get another ranking which also doesn't match the "ranking" column. This MUST be sorted out ASAP. --Ragib 06:02, 14 February 2007 (UTC)

I agree. I'm not aware of any decision of how to rank, but it would seem best to do it according to one source, to avoid constant changes. In this case the ethnologue (2005)[1] seems to be the best way to do it. I'd also propose only putting sourced figures in the other column, and to have some principled way of dealing with outliers (a more complicated issue, which need not be sorted out before the other 2 principles). Let's get some consensus on this, so we can have several editors editing and reverting according to the same principles. Drmaik 06:11, 14 February 2007 (UTC)
I agree using Ethnologue's ranking/numbers for this page. Otherwise, this has become a daily battleground for various language-advocates. I noticed an attempt today at mass scale change of most language rankings by one particular editor. Such unsourced and arbitrary changes are regrettable. --Ragib 06:14, 14 February 2007 (UTC)
It seems the Ethnologue column was only added recently (within the last week), but no one got around to reordering the rankings. Diego Lee 06:55, 14 February 2007 (UTC)
You missed this. --Ragib 07:23, 14 February 2007 (UTC)

I agree that the ranking shown should reflect the Ethnologue counts. The SIL (the organization that maintains and publishes the Ethnologue) is a highly respected organization among linguists - I can go ahead and reorder accordingly. What I'm confused about is why so many people are citing the Joshua Project which (like the SIL) collects information on languages around the world as part of a missionary effort, but unlike the SIL, does not perform surveys on the number of language speakers. Am I wrong about the Joshua Project? Their site doesn't seem to offer much information on numbers. --SameerKhan 07:46, 14 February 2007 (UTC)

To follow up - I just reordered the ranking to reflect the Ethnologue statistics. As I mentioned before, the SIL is highly respected in the linguistic community, and the statistics provided by them in the Ethnologue are the most widely used in linguistic journals for references on numbers of native speakers. I haven't gone through and verified the statistics provided here on the Ethnologue itself (if someone would like to do that and confirm that these are the correct numbers, that would be really helpful), so they may be inaccurate if someone tampered with the numbers earlier. Anyhow, please let me know if I've made a mistake. --SameerKhan 07:56, 14 February 2007 (UTC)
Yet another follow-up - I just saw that the Ethnologue list of most spoken languages includes statistics that are vastly different from that which is shown here. Can someone verify the real situation? —The preceding unsigned comment was added by SameerKhan (talkcontribs) 08:02, 14 February 2007 (UTC).
First, thanks for doing the re-ordering. I don't think Joshua Project do as much direct data collection as SIL, though I think the latter do their direct research mainly for languages they work in, which tend to be smaller. It seems that various advocates of different languages find the biggest figure they can come up with, so will use whichever source gives the biggest number: I think this is why the Joshua Project data is being used. As for what the Ethnologue actually says, I think Ethnologue list of most spoken languages does accurately reflect it (that has been my aim), but by all means check online at [2]. As for the real situation, that's what we're all struggling with! Drmaik 09:27, 14 February 2007 (UTC)
Ethnologue does things like separate Punjabi and Farsi into multiple separate languages. I'd want to be careful about using it as our basis. john k 18:03, 16 February 2007 (UTC)
Although Punjabi and Farsi are a bad example, since they're widely accepted as diverse languages, it's not true at all that SIL is "highly respected" among linguists. Actually quite the contrary. The SIL data can quite often be proved inaccurate, sometimes even wrong. Although it's a very large and comprehensive list, every linguist knows that one should always be sceptical about the data from that site. To name a few examples: Their naming conventions are sometimes intriguing, numbers of speakers can sometimes be off, clear dialects (especially those from Germany, that noone, be it linguist or housewife, would consider a "language") are declared "languages", while in other cases a distinction is not made. It's a good source for starting a research, though. But: Be careful with its data! Crosscheck twice! — N-true 13:34, 21 March 2007 (UTC)

Seems like after all the discussions above and below, we are back to square one as people add more and more "estimates", and pick one of them arbitrarily to suit their preferred ranking from whatever language group they belong to. It might be better to have a consensus on what data source to use for ranking, as the 3 different sources provide widely varying estimates for a given language. --Ragib 05:56, 17 February 2007 (UTC)

It does seem that adding the Ethnologue column was a concensus decision, but I'm not sure if adding the Encarta column was. S. Lodovico 10:08, 17 February 2007 (UTC)

Encarta column

I added a column that shows data from Encarta 2006. Maybe this could finally provide an accurate ranking system... Jerse 16:34, 14 February 2007 (UTC)

I think "other estimates" should be more respected.--220.217.87.84 18:50, 14 February 2007 (UTC)

Encarta is a copyrighted source. While "facts" are outside of the copyrighted domain, it's giving numbers that are far too specific to be meaningful. The number of Arabic speakers is given as: "422,039,637". How do they know it's not 422,039,638, or 422,039,636? Such specificity is improper at that scale. These numbers should all have no more than 3 or 4 significant digits, and the ones digit should be reported as significant only in the case of hundreds of speakers, not in the millions of speakers. I will ask you please to make the proper changes:
  • Do not reference Encarta 2006, as this is a copyrighted work, rather find out where they got their facts/information from, and use that.
  • Do not include unnecessary and inaccurate specificity in the numbers. Arabic has about 422 million speakers, not 422,039,637.
I will reference to SIL who provided the information to Encarta instead —The preceding unsigned comment was added by Jerse (talkcontribs) 19:53, 14 February 2007 (UTC).
That sounds much better! Thanks. :) --Puellanivis 20:49, 14 February 2007 (UTC)

Since SIL provides Ethnologue with their language information, isn't it safe to assume that SIL has the same credibility as Ethnologue? In other words, can we finally arrange the chart by information that's up-to-date by using the SIL column insted of the Ethnologue column?Jerse 21:18, 14 February 2007 (UTC)

well, the Ethnologue is the mouthpiece of SIL, so providing two columns, one ethnologue, one SIL, doesn't really make sense. The encarta data for Arabic is so different from from the ethnologue, that one has to question where they really got the data from: it is an extreme outlier: all other data I've seen gives Arabic between 170-225 million [3]. Even another encarta site [4] gives 206 million, evidently from the ethnologue. CIA has a figure of 323 million for population of all Arab countries (and there are big non-Arabic speaking minorities in Morocco, Algeria, Sudan, Iraq), so a 422 million figure is, well, ridiculous. So my proposal is, change the column back to encarta, rather than SIL, and mark the Arabic figure as an extreme outlier. Arabic is the main problem with the encarta data, though the ethnologue data also seems to be out of date in some places. Drmaik 06:50, 15 February 2007 (UTC)
Well if Arabic is the main problem as you say, here are a number of different factors that can contribute to the increase in recent numbers. 1) Islam is the religion of about 1 billion people if not more. The Holy Quran is written in Arabic and the it is very good to know how to understand Arabic in order to read the Quran. 2) Even though the events of 9/11 were tragic, it has been a milestone in the increase of Arabic speakers around the world. More people are studying Arabic taday than at any other time in the history of the world, I myself am apart of this group. 3) Who are you to say that the estimate of a well-respected linguistic corporation as SIL is incorrect. 4) The 2006 SIL estimate is only 1 year old. All other sources in this article are older, most of which date to the last millenia. The list goes on... —The preceding unsigned comment was added by Jerse (talkcontribs) 03:28, 16 February 2007 (UTC).
Also the Ethnologue list already has it's own webpage. Why would wikipedia need two chart's of the same information?Jerse 03:33, 16 February 2007 (UTC)


Number 1 point is quite wrong. A lot of Muslims can read Arabic, without understanding it at all. With translations available in most languages, it is not necessary to understand Arabic to read The Quran. Number two point is original research without supporting stats. As for latest SIL estimates, any referenced information is quite welcome. But we need to be consistent, we can't use a 1999 stat to compare with a 2006 stat when making a ranking. --Ragib 03:36, 16 February 2007 (UTC)
So are you saying we should use the SIL coulmn?Jerse 23:47, 16 February 2007 (UTC)
No. There is no consensus, so first try to achieve that. --Ragib 10:00, 17 February 2007 (UTC)

There is no reference to the SIL source. The reference (1) points to encarta, not to SIL. --Ragib 10:00, 17 February 2007 (UTC)

I you look at the source at the bottom of that link it says "Source:Summer Institute of Linguistics"Jerse 16:48, 17 February 2007 (UTC)
Then link to THAT directly. It is misleading to link to Encarta and claim SIL 2006 as a source. Thanks. --Ragib 16:50, 17 February 2007 (UTC)
Why is this so difficult? The source is clearly stated.Jerse 16:52, 17 February 2007 (UTC)
Well, that's because we can't really see the source to verify it. Right now, we only see that Encarta has this info, and cited SIL as the source. At most, you can claim Encarta as the source of the info and have the column named as such. But you can't link to Encarta and claim SIL as the source. --Ragib 16:55, 17 February 2007 (UTC)
Well at first I did name it the Encarta column but there was a problem because it's a copyrighted source. So instead I changed it to Encarta's actual source, SIL. But now there is a problem with that as well. The SIL website hasn't been updated to show the current data, or if it is I can't find it (and trust me I looked for it). Encarta seems to have correct sources being an encyclopedia and all, I don't understand the problem. Jerse 17:04, 17 February 2007 (UTC)
In that case (if you yourself haven't seen SIL/06), you should name the column and the source as encarta. --Ragib 17:10, 17 February 2007 (UTC)
So can the chart be ranked by Encarta 2006 then? Jerse 17:12, 17 February 2007 (UTC)

(resetting indent) That's a completely different issue. As mentioned above, the consensus seems to be of using Ethnologue data. If you want, you can start an RFC to gain a consensus on what data source to use. Thanks. --Ragib 17:15, 17 February 2007 (UTC)

I just took a look into your "Encarta" link. Actually, it is about "languages spoken by more than 10 million people". It doesn't specify *at all* whether they are considering native speakers. Also, you continuously mention SIL 2006. However, the Encarta page *only* refers to "Source: Summer Institute of Linguistics.". That is, there is no mention to the 2006 SIL report you keep mentioning (even though you haven't seen it yourself). Please clarify this. Thank you. --Ragib 18:51, 17 February 2007 (UTC)

If you scroll down it says, in red, *Data are for first language speakers only*. And I'm still working on the SIL/2006. Anyway, why would Encarta use obsolete information? It's not like it's wikipedia... Jerse 21:07, 17 February 2007 (UTC)
When Jerse changed the source to SIL, I had been thinking that he was changing the cited source as SIL, not continue to cite Encarta. We should cite SIL, and use whatever information they have released, out-dated or not, and then once SIL/06 information is released publicly, we can then update the information. --Puellanivis 21:09, 17 February 2007 (UTC)
Arabic is not the only problem. Ethnologue has also strange numbers on Persian. For example, according to CIA, the number of Persian-speakers in Iran alone is more than 30m. According to Ethnologue, the number of Persian-speakers world-wide (including those in Afghanistan, Tajikistan, etc) is only 31m - not mentioning the large Persian-speaking minority in Uzbekistan. According to experts, the number of Tajiks in Uzbekistan is up to 10m (see: D. Carlson, "Uzbekistan: Ethnic Composition and Discriminations", Harvard University, August 2003). According to Ethnologue, the number is 0! I have sent many E-Mail to Ethnologue and asked for their sources. Either they simply ignored the mails, or gave a simple answer: "We are not really sure". Tājik 17:26, 23 February 2007 (UTC)

Removed influences in language family catogory

I removed the influences from other families in the language family colomn. For Norwegian, Swedish and Danish they were mostly wrong (these languages are much more influenced by the Romance languages, especialially Latin and French than from Slavic or Finno-Ugric). I don't know much about Finnish, Lituanian, Slovak or Afrikaans, and the claimed influences might well be correct, but since this isn't mentioned for other languages, I see no reason to mention it for these languages.

213.225.127.188 02:57, 15 February 2007 (UTC)

I agree, it's information that is not appropriate for this list. Such information is more appropriate for the distinct articles for the languages themselves, as such there they can give the issue the proper treatment that it deserves. (Influences of a language are a very complicated subject.) --Puellanivis 02:59, 15 February 2007 (UTC)
Perhaps the language families should be reduced to three, as some are rather specific. PioKuz4 20:40, 21 February 2007 (UTC)

Hindi again

Hindi being listed as only 182 million native speakers, because that is the number of native speakers of Khariboli, is problematic. We don't list any of the other dialects of Hindi separately, nor the Bihari, etc., languages that are sometimes considered dialects of Hindi. We should either count all the "dialects" of Hindi together when giving the totals for Hindi, or we should list them separately, or some combination of the two (counting, say, Awadhi and Haryanvi as dialects of Hindi, but Maithili and Bhopuri as separate languages). Whatever solution is agreed upon, though, the current set-up is unacceptable. Either Bhojpuri is its own language, with 25 million odd speakers, in which case it should be listed here as its own language, or else it is a dialect of Hindi, in which case those 25 million odd speakers ought to be counted as Hindi speakers. As it stands, they are not counted as anything. The same goes for Awadhi and Maithili and Haryanvi and Kanauji and so forth. john k 19:11, 18 February 2007 (UTC)

To expand on this, our article on Hindi lists five groups of dialects/languages which are considered to be "Hindi" - Western Hindi, spoken around Delhi, in Haryana, and in Western Uttar Pradesh and Madhya Pradesh, including standard Hindi; Eastern Hindi, spoken in eastern Uttar Pradesh and Madhya Pradesh, and in Chhattisgarh; Rajasthani, spoken in Rajasthan; Pahari, spoken in Uttarakhand and Himachal Pradesh; and Bihari, spoken in Bihar and Jharkhand. Obviously, these languages are often quite different from one another, and aren't always mutually comprehensible. Western Hindi is closer to Urdu than it is to the other Hindi dialects, and the Pahari dialects are closer to Nepali, also considered a separate language. But I think that we have to employ political definitions of languages in this article, because those are more or less the only definitions that exist. Linguists can tell us that standard Hindi is closer to Urdu than it is to Bhojpuri, but they can't say that Bhojpuri is a language and not a dialect, because that's not what linguistics is concerned with. At any rate, I would suggest alternately a) counting all speakers of Western and Eastern Hindi (other than Urdu speakers) as Hindi speakers, and counting speakers of Rajasthani, Pahari, and Bihari languages separately; or b) counting all speakers of all five, save Urdu and Nepali speakers, as Hindi speakers. john k 19:58, 18 February 2007 (UTC)

Actually John, 182m is the Ethnologue quote for all of Hindi. I believe Drmaik can confirm this. Bhojpuri and Maithili appear to be the only two listed separately. It seems probable that someone thought the figure was too low and added Khariboli dialect. Ryan Leigh 20:23, 18 February 2007 (UTC)
My dear friend Ryan, just to give you a little insight in what you are claiming, look at population of state of Uttar Pradesh which is believed to be home state of Hindi speakers. It is somewhere 165 million. And go to any government of India site and it will tell you that only language spoken at home in UP is hindi(may be different dialects). Now there are atleast 5 other states (bihar, madhya pradesh, jharkhand, chatisgarh, haryana) with population more than 30 million where hindi is majority language (though again different dialects). Cities like mumbai, delhi etc having poulation near to 10 million (not a joke) are predominantly hindi speaking. I do not know what you talking about that there are just 182 million speakers of hindi. The studies qouting figures above 300 million seems to be more reasonable to me. -zombie_neal 21:22, 18 February 2007 (UTC)
No, I never claimed any figure. Actually, I was merely answering John's question about how Ethnologue classifies Hindi. I don't have an opinion on Hindi. Ryan Leigh 23:35, 18 February 2007 (UTC)
Ryan, Ethnologue lists the following languages separately, which are normally considered dialects of Hindi - Haryanvi, Kanauji, Awadhi, Chhattisgarhi, Bagheli, Bundeli, and some others. These languages/dialects account for about 50 million additional speakers to the 180 million Hindi speakers they give. If you include the Bihari languages (65 million or so for Bhojpuri, Maithili, and Magahi), and the Rajasthani (another 35 million), it gets even worse. One way or the other, these languages aren't being counted either in our count for Hindi or on their own. They should be counted one way or the other. john k 16:43, 21 February 2007 (UTC)
John, the 181m number is listed as simply Hindi [5]. I'm only concerned with what Ethnologue meant when they used the word Hindi. Looking at Arabic as an example, it is important to note that Ethnologue sometimes groups all varieties, other times separates them; but when they list Arabic is means all varieties of Arabic. As for how Hindi should be classified here, that's up to you and the others. Ryan Leigh 17:25, 21 February 2007 (UTC)
I don't understand your point. The Ethnologue number for "Hindi" clearly excludes Awadhi, Kanauji, etc., because that's how ethnologue works - it specifically tells you if it's double-counting, as it does with Arabic. It doesn't do that with Hindi. At any rate, any kind of review of the population of India makes it fairly clear that 181 million is too low if it is meant to include dialects. The combined population of Uttar Pradesh, Madhya Pradesh, Haryana, Delhi, and Chhattisgarh is something like 260 million, and even if we assume about 10% Urdu speakers that still leaves us with a lot of Hindi speakers unaccounted for. In fact, it leaves us with approximately the 50 million Haryanvi, Kanauji, Awadhi, Chhattisgarhi, Bagheli, Bundeli, and so forth speakers. If one counts the Bihari and Rajasthani languages as dialects of Hindi, as they often are, the numbers change even more. At any rate, I'm not really sure what your point is. Don't you think that the 25 million or so native speakers of Bhojpuri should either be counted in the Hindi totals, or listed on their own? john k 19:13, 21 February 2007 (UTC)
I've never disagreed with anything you've said. I just wasn't sure what is or isn't included under Hindi, that's all. I would've assumed it would be listed as Hindi, Standard or Hindi, Khariboli instead if they meant that. But you and apurv1980 seem passionate about the subject, so perhaps any further discussion should be with each other, not with me. Ryan Leigh 19:50, 21 February 2007 (UTC)
Guys, I think each one of us is saying the same thing and that is that status of hindi is not reported correctly on this page. Two possible sloutions are 1) Report all dialects of hindi as seperate languages with accurate figures 2) Report all dialects as one langauge 'Hindi' with total figure. Now it is upto other editors that which way they want to report it but the page in current form is factually incorrect and misleading. apurv1980 20:17, 21 February 2007 (UTC)
Yes, or perhaps one could report it as one language, but give figures for each division; in the way Persian is listed now. Ryan Leigh 22:09, 21 February 2007 (UTC)
I agree totally with your idea Ryan, lets report them as persian is reported. But the thing is there is lot of editing war going on here, how do we make a consensus. apurv1980 17:32, 22 February 2007 (UTC)
hindi - 182m, these guys make me laugh. Figure cannot be far from truth unless you consider various dialects of hindi as seperate languages. If they are not considered then this figure is a joke. -apurv1980 21:17, 18 February 2007 (UTC)
John's question wasn't about the number of speakers. Ryan Leigh 23:35, 18 February 2007 (UTC)
First, if we're using the the ethnologue as the source for ranking, then here's what it says: '180,000,000 in India (1991 UBS). Population total all countries: 180,764,791. Ethnic population: 363,839,000 (1997 IMA).' It also says that 'Alternate names Khari Boli, Khadi Boli'. Ethnic population means the number of people who self identify as Hindi speakers, but evidently the Ethnologue feels that mutual intelligibility is not assured. And if you look at [6], you will find varieties such as Bhojpuri, Haryanvi and Bundeli listed separately, with their own figures. Indian census data 1991 for Hindi was 337m, I believe: I did put that in in the midst of the edit warring, but it seemed to get lost. It should be there somewhere. Drmaik 05:49, 22 February 2007 (UTC)

Update on Encarta information

I e-mailed SIL at info-sil@sil.org to ask if the information on Encarta was correct and they confirmed. Here is the original message:

"Dear Mike, Yes, we sent them the data. Conrad

Info-SIL/IntlAdmin/WCT Sent by: Jane Pappenhagen 02/19/2007 09:23 AM To Editor Ethnologue/IntlAdmin/WCT@SIL cc Subject Re: Encarta 2006 information

Jane Pappenhagen SIL Information

To <info-sil@sil.org> cc Subject Encarta 2006 information

Hi, I was just wondering whether or not the SIL information on Encarta 2006 is correct?

Here is the link: http://encarta.msn.com/media_701500404/Languages_Spoken_by_More_Than_10_Million_People.html

SIL is cited on the bottom of this page."

It's not much but it's a start. Just thought I would add this to this thread.Jerse 05:28, 21 February 2007 (UTC)

Ranking

the precise ranking is flawed. Maybe we should try tiers or something. The problem is that there is no central authoritative source. Ethnologue is the best bet, but their data are from rather different periods, and with population doubling every 30 years in some countries, data from 1991 just doesn't cut it (e.g. Tajikistan). We still have to go by ethnologue for the moment, there is no obvious alternative, but maybe we should review the whole idea of "ranking" "languages" by number of speakers. dab (𒁳) 21:10, 25 February 2007 (UTC)

I don't consider Ethnologue the best bet. Indeed, in many cases, it is the worst source for this kind of statistical data. CIA factbook is a much better source. Jahangard 02:29, 26 February 2007 (UTC)


CIA factbook would be monstrously difficult to use for ranking the languages as the data is divided amongst every country in the world. It is also inconsistent in the manner in which it reports linguistic information for each country, sometimes giving a veryt detailed linguistic profile of a country and other times only stating offical languages with a list of some select minority languages with no way to determine raw numbers of speakers. As a result the biggest obstacle to using the world factbook for ranking is that the effort to derive the ranking from their data renders it essentially original research by wiki standards.Zebulin 03:34, 26 February 2007 (UTC)
For major languages, using data from CIA factbook is very straight-forward, and adding a couple of numbers which are given for different counties is not original research (because there is no room for different interpretations). For other languages (specially those with less than 10 million speaker), there is no reliable source wich can be used for all of them, and therefore, it's better to just forget the idea of ranking them. Jahangard 05:17, 26 February 2007 (UTC)
It becomes original research for some languages because it is so difficult to uniquely identify all of the countries entries which will be consulted to derive a total for a given language. The only way to avoid that sort of judgement call would seem to be to consult *all* of the countries data for *all* of the languages so as to be sure that no minority population is missed in the total. It might even be less work to script it somehow in that case.Zebulin 22:42, 26 February 2007 (UTC)

Ranking by comparing information from various sources, from various time periods, is inherently problematic. So, Dab's idea of a tier based listing sounds good. Also, ethnologue itself is sometimes contradictory, especially in separating languages and dialects (e.g. Chinese, Hindi). --Ragib 04:20, 26 February 2007 (UTC)

The problem of Ethnologue is much more than that. Ethnologue uses different statistical data from different sources, and from different dates. The main problem is that in many cases, Ethnologue mixes these data in the most stupid way, and sometimes generates pure statistical nonsense. Jahangard 05:16, 26 February 2007 (UTC)
The problem is that there is no source which uses precisely the same criteria for each language, as there is not one organisation that collects all the data. I think before changing the ranking, may be try to have another column for CIA data. But I have not seen any good CIA data for Arabic, for example. (If I've missed it, let me know). Stating the population of the Arab world is irrelevant. So, faute de mieux I'd recommend leaving the criteria as they are: getting rid of ranking would make the article much less clear, and useful, and without established criteria for where a language would be in the tiers, we'd be back at edit warring with proponents of a particular language. Drmaik 06:02, 26 February 2007 (UTC)
I tend to agree with Drmaik. The ranking does already say that it's based on Ethnologue, so the reader will know the rank is an estimation and not perfect. Tiers may cause more disagreements. Lyc. 00:41, 27 February 2007 (UTC)

ethnologue is the only organisation I am aware of that collects and makes readily available data on all (6000+) languages. If we're just going to rank the top 100 (and anything else is pointelss anyway), we might find some more up-to-date source. The problem is the "rank" parameter in {{language}}. I think that should go, because the "rank" cannot be established with any certainty for any but a handful of languages. I think we could rank the top 25 or "above 50 M" or so with some confidence. After that, we should drop the ranking and just do tiers of 10-50 M, 5-10 M, 1-5 M. For the top 25, I think we can also manage to compare various sources and look for a consensus estimate. To do this for the entire list would be a nightmare. Regarding Persian, what is the source for the "70-80 million" figure? dab (𒁳) 07:57, 26 February 2007 (UTC)

It seems the other estimates column is mostly from 2004-2005, so I doubt you will receive a response. The 50-100 ranks seems to have remained relatively unchanged for years. Perhaps no one looks at them. Lyc. 00:41, 27 February 2007 (UTC)
About the whole idea of of ranking languages, I think it should be limited to languages with more than 20-30 million speakers (for others, it's not feasible). About Persian language, are you asking me? I've changed the numbers to 62 million (for native speakers) and the source is the CIA factbook (its estimate of the percentages might be old, but it's still the most reliable source that we have). Jahangard 08:10, 26 February 2007 (UTC)
I just noticed that. 62 million sounds like a reasonable estimate for Persian. This means that the actual ranking of Persian would be closer to 20 than 27. But we cannot deviate from the stated "ordered by SIL" just for Persian. Somebody would have to make the effort to check all the top 30 or so against the CIA factbook, and then order by that. Ranking above 30 or so is increasingly pointless. Already above 12 it is becoming difficult, what with French vs. Wu, Javanese vs. Korean, Cantonese vs. Marathi vs. Tamil. We should give the same "rank" for estimates within 2% or so of one another. I hereby suggest we remove the "ranking" number for the entries below 10 M. We can state how many languages we list in each tier, but maintaining a strict ranking is going nowhere (we'll never get rid of the "disputed" tag). dab (𒁳) 08:37, 26 February 2007 (UTC)
Using CIA factbook even for just the top 30 will certainly be harder than most people seem to be giving it credit for. I remember trying to use CIA factbook to just get a rough ballpark estimate for english using their data for just the 4 countries that I assumed would have the most native english speakers. Straight away I found an oddball in the data for the united kingdom which seemed to suggest that everybody in the country either spoke english or welsh as their native language. I had expected to make a rough estimate in maybe 5 minutes from US, UK, Canada and Australia cia world factbook data but in fact it took something like 10 minutes due in large part to time pondering the obviously odd UK data. Are we going to want to rank the top 30 languages by such half arsed methodology? Furthermore english is probably not the most difficult case. Deriving a total from world factbook data for Spanish, Arabic, French, and Russian for instance will surely be much more frustrating. Any number thus derived will surely be continually tweaked up and down due to minor arithmetic errors by the original editor and by would be revisionist editors attempting to check their work. I'm tenatively backing the tiered ranking idea. We can alphabetize the languages within each tier. The only sensitive points would be those languages that happen to fall on the borderline of a tier but skillful selection of tier ranges may minimize the ambiguity of which languages belong to each tier.Zebulin 22:24, 26 February 2007 (UTC)


If in fact we do go ahead and use CIA world fact book derived totals we ought to include a listing of all country entries from which the total is derived to facilitate proper checking of the math. A wrong assumption about which countries were considered in deriving the total number of speakers for a language will make the total thus derived seem to have been either grossly in error or doctored/made up.Zebulin 22:34, 26 February 2007 (UTC)
so far, our options appear to be CIA or SIL. Maybe there are other sources we are missing so far? If we're going to rely on CIA, we need to create a clean table of the CIA data first. I.e. download all 200 or so pages and parse them into a html table sorted by language. Once we have that, addition will be comparatively simple. In cases where SIL and CIA are close, the number is probably reliable, and we'll just have to look further into those cases where the two sources are in significant disagreement. dab (𒁳) 11:44, 28 February 2007 (UTC)

Official speakers

In addition to my previous comments in Archive 2, can we continue adding the following languages in the bottom table: Tetum 800k speakers Venda 750k speakers Irish Gaelic 380k speakers Maltese 371900 speakers Luxembourgish 300k speakers Dhivehi 300k speakers Maori 165k speakers Dzongkha 130k speakers Hiri Motu 120k speakers Romansh 60k speakers New Zeland Sign Language 7.7k speakers Bislama 6.2k speakers, 200k as second language.

Thanks to those for their additions earlier. I would do it myself, but unable to due to lunchtime access only to PC, and lack of editing experience in tables!!!! RAYMI....................80.68.39.212 15:18, 5 March 2007 (UTC)

Navajo, with 178,000 would also be a good addition. john k 16:53, 5 March 2007 (UTC)

Old Proposal

I propose deleting the rank from the language template, I think every body who reads the talk page of this article will appreciate it. --Pejman47 19:22, 6 March 2007 (UTC)

Continue to list the countries in the current order but just eliminate the rank column?Zebulin 01:33, 7 March 2007 (UTC)
I mean deleting it from the language template that is in all the language articles. --Pejman47 21:18, 8 March 2007 (UTC)
I can't believe I hadn't thought of how much trouble that is likely causing until just now.Zebulin 21:38, 8 March 2007 (UTC)
If you agree, we can vote for it here. --Pejman47 20:30, 11 March 2007 (UTC)

Indian languages

Has anybody else seen the Central Institute of Indian Languages site? It gives very detailed figures for number of speakers of Indian languages, and divides them up in a comprehensible way - it gives "Scheduled languages" and "Non-Scheduled languages" as broad groupings, with total numbers, and then lists "Mothertongues" within each of the broader groupings. It lists every scheduled and non-scheduled language and every mother tongue with more than 10,000 speakers. Thus, for instance, Hindi the scheduled language is listed with 337,272,114 speakers. Within that, it gives 233,432,285 speakers for Hindi as a mother tongue, and then goes on to the various other related languages - 23,102,050 for Bhojpuri, 10,595,199 for Chhattisgarhi, and so forth. It strikes me that it would make sense to use this source for our numbers of speakers of Indian languages. john k 20:31, 8 March 2007 (UTC)

excellent. God knows we can always use more hard references for this.Zebulin 21:14, 8 March 2007 (UTC)
The question would be whether we should use the scheduled/non-scheduled language totals or the mother tongue totals for the purposes of this list. This makes a significant difference for Hindi (and also determined whether Bhojpuri, Chhattisgarhi, etc., are listed separately or not), and considerable difference for some of the others, especially ones like Bhili, where there's lots of different dialects and no clear standard form. The other issue would be how to combine the numbers given with data for other countries - Bengali, Punjabi, Sindhi, Nepali, English, Tibetan, and Arabic are spoken primarily outside India, and there are significant numbers of Hindi, Tamil, Urdu, and perhaps Gujarati speakers outside of India as well. The numbers given also appear to exclude Jammu and Kashmir, or do something else that results in an abnormally low number of Kashmiri speakers (which it acknowledges, noting that figures for Kashmiri are partial, but failing to indicate what exactly they cover. Figures for Dogri are also low). john k 21:43, 8 March 2007 (UTC)

Another issue is how to deal with it on the list, in terms of sourcing. john k 21:48, 8 March 2007 (UTC)

Persian

Persian should be moved up to the list of more than 100 Million speakers. There are approxiamtly 110 Million Persian speakers worldwide. The chart itself states that.Dariush4444 20:26, 11 March 2007 (UTC)

How do you get to 110 million? There's 70 million people in Iran, but a sizeable minority do not speak Persian as their first language (at least 30% - i.e. no more than 49 million native Persian speakers in Iran). Afghanistan, with about 30 million, has about 50% native Persian speakers - that gives us 64 million or so. This is all being rather generous, as most figures I have seen give no more than 60% for the Persian population of Iran, and there's certainly not a 55 million person Persian diaspora. john k 22:23, 11 March 2007 (UTC)
yes, I agree with you, Dariush exaggerates and I reverted him. But you forgot to add tajikistan and Uzbekistan (via CIA fact book), and about Iran, there is a situation like in Wales or Scotland or some parts of Spain: most of the population are at least bilingual from childhood. --Pejman47 22:47, 11 March 2007 (UTC)
The CIA numbers do not mention bilingualism either. Many people in Iran, Afghanistan, Tajikistan, Uzbekistan, etc are bilingual and speak Persian as well as some other language (mostly Azeri, Pashto, and Uzbek) as a "first language". Thus, the number of native speakers is indeed much larger than the 60m mentioned in the text. But this also means that the number of Pashto, Azeri, and Uzbek speares is larger. The CIA (as well as Ethnologue) count everyone as "Non-Persian-speaker" who speaks also another language in addition to Persian. Someone with mixed Azeri and Persian origins is automatically labled "Azeri". That way, the number of Azeris in Iran reaches 20-30% of the total population, while Persian remains at 50% although the "real" number of native Persian-speakers may be up to 90%. Tājik 00:15, 12 March 2007 (UTC)
Tajik, my understanding is that this page is generally based around the assumption that a person has only one first language, and first language is generally defined on the basis of the common census category of "language used at home." What the exact answers to this question in Iran are, I cannot say, but we certainly shouldn't be double-counting. john k 00:21, 12 March 2007 (UTC)
I know the problem. But the point is that Ethnologue is not a reliable source at all. Not only in case of Iran, but generally. The numbers for Iran are simply invented by Ethnologue - they have no sources for it, they have not carried out a census. Maybe you should write them a letter and ask them for information. Believe me: either they will ignore you or they will tell you that they have sources - of course not naming them. Ethnologue's numbers for Uzbekistan contradict all other sources, even that of the Uzbek gouvernment: [7] Tājik 21:00, 13 March 2007 (UTC)
The Uzbek government should hardly be treated as a trustworthy source. I agree, though, that Ethnologue is not particularly reliable. But what source would you suggest we use instead? john k 21:46, 13 March 2007 (UTC)
I suggest to use either academic sources (for example the Encyclopaedia of Islam) or the CIA Factbook. The CIA factbook is not really reliable either, but at least it is something - and it is official. Ethnologue is the mouth-piece of a religious organization and has certain "agendas". Tājik 23:53, 13 March 2007 (UTC)
HI just wanted to let you know ethnologue is not reliable with regards to Iran. I have contacted them directly and they said they can not locate their sources and they will make an update on the next version. I have the e-mails with this regard if anyone is interested. The e-mails are from Ray Gordon the major editor of ethnologue. So the ethnologue info should be removed all together with regards to Iran. --alidoostzadeh 02:47, 14 March 2007 (UTC)
The World Factbook gives 58% Persian speakers in Iran, 50% in Afghanistan, 80% Tajik in Tajikistan (if we are counting Tajik as the same as Persian), and 4.4% in Uzbekistan. There's apparently an addition 33,200 in China. That would come to, er, 62,451,835. Presumably beyond that there's a considerable diaspora - According to various wikipedia articles, there's 310,000 in the United States, and 94,095 in Canada. I'd assume beyond that large communities in western Europe and the gulf states, at least. But probably no more than, what, a million or so? So no more than 64,000,000, using the CIA numbers. john k
Yes, that sounds much better and much more realistic to me - except for the numbers in Uzbekistan. The 4.4% are directly copied from official Uzbek numbers and do not reflect the opinions of Western scholars and experts on Central Asia. The real number for Uzbekistan is - by estimate - somewhere between 20-50%. The 50% is too high since many Uzbek nationals are naturally bi- or multi-lingual and speak several languages at a native level (including Russian). The best guess for Uzbekistan's Persian-speakers is probably 30% (1/3 of the total population). Keeping in mind that almost all Uzbek cities - except Tashkent - are predominantly Persian-speaking (most of all Bukhara and Samarqand), the 4.4% are totally wrong. Conclusion: estimating the total number of native Persian-speakers at 70m sounds pretty good to me. Another 50-70m (estimate) speak it as a second language, most of all in Iran and Afghanistan where the language is spoken and understood by 90-99% of the population in each country. Tājik 10:46, 14 March 2007 (UTC)
Well, the list is ranked according to ethnologue. I agree that their figure for Persian is almost certainly too low, but I think we need to keep it for the sake of consistency. Putting other referenced data in is fine, including adding up CIA figures. But coming up with new figures for Uzbekistan seems to me to be original research, which isn't what wikipedia is about. Drmaik 13:31, 14 March 2007 (UTC)
I do not see why the ranking should be by ethnologue when ethnologue confirms their numbers with regards to Iran are wrong. --alidoostzadeh 01:17, 15 March 2007 (UTC)
I would concur with this. It seems deeply odd to present readers with information based on a source that itself acknowledges that information is incorrect. john k 04:37, 15 March 2007 (UTC)
Err, they said they couldn't find their sources, which is very different from saying/admitting it is wrong. It's just that once ranking is done by more than one criterion, everyone will want in, and come up with their own reasons for their own language to be ranked higher, and the page will become essentially worthless. This isn't a competition, people. By all means put criticism of the ethnologue figure in, point to other sources etc., but deciding to change the basis of the ranking based on one case will dirupt the whole page. Drmaik 05:52, 15 March 2007 (UTC)
Fair enough on what Ethnologue actually said. Beyond that, I think we should change the basis of the ranking because Ethnologue is pretty bad on a rather wide front. Personally, I'd prefer to use official census type data for as many countries as we can find it for, and for ones where we can't to use the best sources available, and only use Ethnologue when we have no better option, but I fear this might count as "original research." But, at any rate, there's plenty of good reason not to use Ethnologue. Largely because it's shit. If we want to have a list of ethnologue's top languages, that's easy enough to do. This list is at least theoretically meant to be a list of the number of speakers of languages, not a list as determined by Ethnologue. If the general sense of those who should know is that Ethnologue is not terribly good, we shouldn't rely on it when we can avoid it. For instance, as I've noted before, for Indian languages there's a much better source available, in the form of the Central Institute of Indian Languages. For the US, the US Census of 2000 has detailed figures available on language use (although it sometimes groups together several related languages for the smaller groupings of immigrant languages). South Africa's census is online, as well, and includes linguistic data, at least for South Africa's eleven official languages (the rest seem to be grouped together as "other"). If a mishmash list can actually be sourced, I don't really think it counts as OR. Or, at least, if it does, that only shows how out of whack the OR policy has gotten. john k 06:03, 15 March 2007 (UTC)
BTW, I think this discussion probably needs to be under a different title now, but didn't want to move anyone else's contributions without their permission. I wouldn't oppose a 'properly sourced mishmash list' (don't think that would be WP:OR), but we'd need quite a discussion of the principles first, and state them clearly somewhere (I think it would need to be a little complicated), and have quite a few people on board. And it seems that most editors don't stick around here for very long, probably becuase of the constant edit warring, which, it seems to me, has been a lot simpler to deal with since this page is ranked according to the ethnologue. So I think I'm mildly suportive of what you're thinking, more so in theory than in practice! Not sure how much I could contribute though... Drmaik 06:28, 15 March 2007 (UTC)
I agree that in practice this might be difficult to implement. I'd like to hear other opinions about it. john k 19:06, 16 March 2007 (UTC)
I wonder if Tajik might point us to some links as to the estimates of "western scholars" of the Tajik population of Uzbekistan. john k 14:45, 14 March 2007 (UTC)
Sources were already given, for example D. Carlson, "Uzbekistan: Ethnic Composition and Discriminations", Harvard University, August 2003, who estimates the total number of Tajiks in Uzbekistan to be somewhere around 11m (40% of the population), and the number of those who speak Persian at home at 30% (the total number also includes Tajiks who speak Uzbek or both languages at home). Some other sources [8] have also picked up this number, even going as far as 14m. Tājik 18:17, 14 March 2007 (UTC)
citing other sources is fine. But: for the sake of consistency, this list needs to be ordered by one central source, and Ethnologue is our best bet for that. Otherwise, ranking will become a function of the presence of linguistic activists on Wikipedia who go and collect the highest possible estimates: to do this for Persian but not for other languages that may also be under-estimated will lead to biased ranking. If Ethnologue is, as you say, aware their numbers for Persian are too low, it's no big deal, just wait for the next edition and they will give a higher estimate. dab (𒁳) 20:38, 16 April 2007 (UTC)

Nia / Dene languages

I expected to see some languages from this family on the list, since they are shown on the map that is on this page. I was suprised not to find the Navajo/Dine language on this list, since it is shown on the map (SW United States). Maybe a different map would be more appropriate to illustrate this article? 71.213.139.166 08:42, 18 March 2007 (UTC)

Navajo could easily be added to the list. john k 00:22, 22 March 2007 (UTC)

Indonesian language and placing

In the article of Indonesian language it ranks the number of most spoken at 8, while it doesnt appear on the list and its place is taken by Russian. I'm not fully aware of the complexities of the Indonesian language and that, if someone could explain that'll be beneficial to my self-awareness. Cheers. Aeryck89 17:19, 18 March 2007 (UTC)

You are quite correct. There are more than 200 million Indonesians, who learn the official language of the republic at school. There are hundreds of local languages, which in places are the language of the home and in some the language of business also. Encarta quotes Indonesian speakers at 17 million. I just cannot understand where they get that number!
I will do my best to research the issues that make quoting language statistics so difficult. Alastair Haines 02:02, 13 April 2007 (UTC)
I noticed this weird link from the Indonesian language page, too. I think 8th is roughly the correct ranking for Indonesian if you include all fluent (instead of native) speakers. From what I understand, only a few people in Indonesia learn Indonesian *first*, while almost everyone learns it fluently later on in school. So there are very few "native speakers" though there are over 200 million perfectly fluent speakers. That's why you have this absurdly low 17 million number - it's only capturing "true" native speakers, whatever that means.
So 8th appears to be roughly correct with rankings that take this into account. One at the bottom of this page "The Thirty Most Spoken Languages In the World" has Malay, Indonesian at 9th.
Obviously Indonesian is really missing out in most of these language rankings. It's spoken by almost everyone in the fourth most populous country in the world, plus almost everyone in neighboring Malaysia speaks a dialect of the same language (however you want to classify these things, Indonesia is a dialect of Malay or vice-versa). Point being that this all seems to be a huge flaw in the ranking methodology, though I can see how it arose.Thewhiterabbit11 20:39, 19 April 2007 (UTC)

See my response in the upper part. Kembangraps 15:34, 3 May 2007 (UTC)


This entire article is fundamentally flawed

IMHO the information provided in this table is misleading and ill-defined. Throughout the above discussion, no real consensus has been reached on what constitutes a native language, and even if we fix on a particular definition (which would have to be merely arbitrary), the information will still be useless. Here's a few reasons why:

  • Millions of people grow up in one country, move to another, and eventually end up speaking their adopted tongue with far more ease than their "native" language.
  • A German child may be regularly visited by an Australian au pair but have only a rudimentary knowledge of English. But because he speaks the language "from childhood" and "in the home" he would be classed by some of the above contributors as a "native speaker". Conversely, someone from China may have learned English at university as a third language, and speak it far better.
  • People like Vladimir Nabokov and Joseph Conrad would not register in this list, which renders it meaningless.
No, it doesn't render it meaningless. Nabokov (who wrote novels in Russian, too) was a native speaker of Russian, and Conrad was a native speaker of Polish. That they both wrote well in English tells us nothing of what their native language is. Nor are extraordinary writers statistically significant in what is an effort to have some sense of what the most spoken languages are. john k 04:40, 10 April 2007 (UTC)

I could give many other illustrations of why, in compiling such a table ranking, half-baked attempts to define "native" are neither interesting, nor of any practical use.

I'm also not sure what "half-baked a ttempts to define "native" means. We have generally used the listings of Ethnologue which, although it is not the greatest source in the world in terms of being rather out of date, is a widely respected source which in fact records the number of native speakers of more or less all languages in the world. Many governments also make similar renderings of their residents (the U.S. census and South African census figures on native language, for instance, are available online. There's an Indian institute, possibly government sponsored, which keeps similar information, also available online). john k 04:45, 10 April 2007 (UTC)

What would be far more useful would be simply a list of speakers of languages. How many people know English/Arabic/Mandarin/Russian well enough to be able to freely communicate? Admittedly you can't possibly obtain a precise figure for the number of speakers, you can only provide ball-park data. For example, we could provide a ranking of the number of people who live in territories where a given language has official status.

As it stands, this list will do nothing but confuse and irritate people. The global use of English vastly outstrips that of Spanish, so it's a patent absurdity to rank Spanish above English. I have no idea where the figure for Russia came from, but it's complete nonsense. I know from experience that most people in former Soviet states speak Russian fluently, very often without a foreign accent. Above the age of about 35, the majority of Kazakhs, Muscovites, Belarusians, Kiev Ukrainians and Riga Latvians are able to converse with one another in not only the same language, but pretty much in the same idiom and dialect - which is more than can be said for a Liverpudlian and a Texan.

In short, the upper part of this list bears bears no relation to any kind of reality, and says nothing useful. Palefire 19:55, 8 April 2007 (UTC)

It would be much much less useful to have simply a list of speakers of languages. Among other things, as you say, this is very difficult to determine. At any rate, the list is not a list of number of people fluent in a language (which is very hard to determine) but a list of languages by native speakers. You may not find such a list useful, but that doesn't really make it so. Lots of governments keep data on people based on "language used at home" or "native language." This is a useful criterion, because it more or less assigns everyone above a certain age to a single language. This also gives some sense of the size of ethnic groups, and so forth. Obviously, it does not do everything that one would wish a list of languages to do, but so what? No single list possibly could. There is nothing stopping you from making a List of languages by number of total speakers, if you can find sources for such a list. But this list is a List of languages by number of native speakers. It is rather unreasonable for people to get "confused and irritated" because this list does what it says it does. For all the difficulty you want to put around the concept of "native speaker," it really isn't that difficult an issue, and its one that is used by many sources. john k 04:40, 10 April 2007 (UTC)
A further point, which is that you harp a lot on the supposed difficulties of defining "native language," but this is actually a fairly well-established concept. The idea of listing people based on whether they know a language well enough to "freely communicate" is a lot more difficult to figure out, and there's going to be far less data on this conception. john k 04:43, 10 April 2007 (UTC)
I feel sympathy for those who are working on this article. There are many issues and they are not easy issues. Reliable sources differ in substantial and significant ways. I do not think the idea of the article is flawed, however. It is very natural to ask, "how many languages are there?" and "how many people speak each one?" More people speak English than Sanskrit, for obvious reasons. I am curious to know, though, if Spanish is indeed more widely spoken in some way than English. For me these questions are merely interesting. For a young person considering which language or languages they might want to learn, there is a practical side to the questions. One thing that would be ridiculous is to suppress numbers of speakers or ranking of languages out of fear of offending smaller language groups. For one thing, near extinct languages cannot be protected unless we know facts. Alastair Haines 02:29, 13 April 2007 (UTC)

This article is redundant.

Over the past few years I've been watching this article, commenting in the discussions, et cetera. I have thus decided that this article will forever be redundant and utterly moot. Why? Because no consensus can be reached on how many NATIVE speakers there are of a language. This will never change. The amount of English speakers is still hundreds of millions below figures I painstakingly laid out almost a year ago. God only knows how far off the other languages are.

Thus, I put it to you that we create an article entitled List of languages by number of speakers so that way there can't be niggling and arguing over figures incessantly and no forwards moving progress like there is with this article. I think this is the only real solution, and still won't negate the fact that there will still be quite a bit of niggling, the amount -spoken- is much better measured than the amount of 'native speakers'.

So, please, let's get a consensus going on this matter and we can try and lay the framework of the new article. Jachin 11:52, 18 April 2007 (UTC)

I don't think you've checked this page recently. In fact, the problem might not be that there's too many arguments, but that there's too few (or not any). There have been a few difficult-to-define languages, such as Hindi and Arabic, but other than that the article is extremely quiet. Perhaps too quiet, for such a large page. Sèryt 05:04, 2 May 2007 (UTC)

  • For the creation of the new article as a replacement of this article in purpose, but retainment of this article with redirection of citations of this article as a source of 'speakers' subjugated to that of 'native speakers' clearly with 'speakers' being focused more on the newly created article. Jachin 11:52, 18 April 2007 (UTC)
Let's not vote yet. Your proposal is not worded very clearly, and some of the statements above are misleading at best. It is generally much harder to find clearly defined and consistent measures of all "speakers" than it is to find figures for native speakers, so it would be harder to arrive at consensus under your proposal. Your English figures were rebutted last year. Also, a good part of the "niggling and arguing" here is about which languages to use, which would still be just as much of an issue under your proposal. -- Avenue 14:24, 18 April 2007 (UTC)
As Avenue says, it is much harder to find figures for all speakers than it is for native speakers. And the issue of definitions will be just as strong, too, because there will be issues of how fluent one has to be to count as a speaker. And, again, this doesn't solve the issue of when to group and divide macro-languages, and similar business, which is much of the arguing. john k 14:26, 18 April 2007 (UTC)
This proposal amounts to doing original research. If it's hard to agree on what defines a native speaker, and to find consistent sources for that, it is doubly, nay thrice, so hard to do it for the definition of a speaker. Drmaik 15:08, 18 April 2007 (UTC)
Hi, I'm new to this article and working on it seems to be an incredibly difficult (but interesting) task. While the arguments about sticking to the number of "native" speakers make a lot of sense on the whole, they also lead to some patent absurdities. For example, Indonesian/Malay is spoken by almost everyone in the fourth most populous country in the world (Indonsia's population is 235 million and Malaysia's is 25 million), yet Indonesian doesn't even appear on this page. For the lay user of Wikipedia (I fall more in this category) this is more confusing than anything when coming onto this page. "Native speakers" may be an unbiased way of looking at it, but it also sacrifices a lot of richness. I wonder if there is some sort of third way? Like leaving a lot of stuff about native speakers, but at least attempting a rough ranking based on fluency? There seem to be some sources with first and second languages that could work. Barring that, perhaps some discussion of the difficulties/controversies surrounding ranking languages? Surely this issue is a hot topic in many academic areas other than just the Wikipedia talk page. Thewhiterabbit11 21:10, 19 April 2007 (UTC)
I don't think Indonesian has been consciously excluded, it's missing only because no one's added it yet. I've noticed that languages are still being added, so feel free to. It would have a low native speaker count (Ethnologue column) and a high fluent speaker count(other estimates column). Granted, it's a little awkward, but then it's probably the only language in that situation. Sèryt 04:28, 2 May 2007 (UTC)

arabic

so arabic is num 2 after Chinese —The preceding unsigned comment was added by 89.139.225.36 (talk) 00:03, 5 May 2007 (UTC).

This is puzzles me ! tamilian population is 62M and people speaking tamil is 63M and population of karnataka is roughly 53M and people speaking kannada is 33M .. come on this is plain wrong stastics! Anand.raichur 16:36, 14 May 2007 (UTC)

Really stupid...the literary Arabic is written in all arabic conuntry, but in each arabian country they have a particular dialect... which means that an algerian, do not understand a palestinian when they speack together. So, cause You make a ranking on the number of speakers, and cause the arabic is only a literary language, you should asap remove the "arabic language "from a ranking on number of speacker. --84.47.61.180 18:43, 26 May 2007 (UTC)
This is not true, an Algerian would undrstand a Palestinian and any Arab from any Arab country. All arabs only study the Stndard Arabic in School and University and they can use it to communicate instead of dialects if there is a problem, the Quran and Hadith is wrriten in Classical Arabic which the Standard Arabic is its contemporary version and so religion keeps the Standard Arabic preserved, it the only language that servived very few changes in more than 1000 years according to some Academics (refer to some Arabic books). The only difficult dialect is from Almaghrib (Morocco, Algeria and Tunisia) when they speak fast. I am from the Gulf and when I was in Morocco I did not have too much problem understanding them unless they are old people from Villages or are using french loan words, they could understand my Gulf dialect easily. I used to switch to Standard Arabic if there was any problem in communication.
Really, I'm a Tunisian but I have no problem in understanding a Palestinian or an Iraqi. I am getting tired of explaining this. Moroccans or Algerians use some French loan words because of the colonial past but this is disappearing with new generations.138.48.213.186 (talk) 22:52, 8 December 2007 (UTC)

Afrikaans

Please note that Afrikaans has not since its independence in 1990 been an official language of Namibia. It is, however, probably the most widely spoken language.

I agree about the estimate for English speakers being extremely low. I also agree with the part about English and Italian being referred to as dialects. I am a fluent English and Spanish speaker and I do not understand any Italian at all. US-UK-NOR-MEX 01:56, 26 May 2007 (UTC)US-UK-NOR-MEX

French language

Well, there is a list of languages spoken by the TOTAL number of people, instead of native speakers. See "See Also" of this page! —Preceding unsigned comment added by Oscarch (talkcontribs) 13:33, 16 February 2008 (UTC)

Several detractors enjoy changing the datas, in particular the French language's rank, officially between the 11th and the 13th rank, the francophobe detractors decreasing this language on the 18th rank, they have to stop this hacking, the French is spoken in France (63,5 million native French speakers), to Canada (8 million native French speakers), in Belgium (3-5 million native French speakers), Switzerland (2 million French speakers), and more generally by divert immigrated in the USA, in Europe of French origins who speak french.

err, no, you're changing the data. The reference I gave in reverting your edits previously [9] is what the ethnologue actually says, while you seem to be making up the figure in the ethnologue column. Wikipedia is based on sources, not what you or I think. Also, as well as falsifying data, your edit includes some vandalism (changing 30-100 million to 10-30), so I'll have to leave a warning on your talk page. Drmaik 12:17, 28 May 2007 (UTC)

Désolé Drmaik, I don't undestand, your ranks' changing. In all the books French is classified 11th language by number of native speakers, it's possible that these last one years languages as the Javanese and Wu, due to the demography, were made declined French on the 13th place. But, I'm sure that 65 million native French speakers is an erroned data! It isn't Chauvinism, is Realism, the native french speakers in Québec are 8 millionss, in Belgium 4 millions, in France also 62 millions, etc... So you don't see a big error somewhere?

--Irrintzi 16:48, 28 May 2007 (UTC)

Il n'y a pas de quoi. I have two main comments to make. 1) we rely on sources, and it was decided, (not unanimously) to base the ranking on the ethnologue, for all its imperfections, as there seems to be nothing that is clearly better. The figures in wikipedia need to state what respected sources say, rather than original research. And you can check... 2) on that figure. Yes, it's probably a little low, but let's check assumptions. We've had 2 figures quoted without source for the number of speakers of French in France: 62 and 63.5 mill. These seem to be estimates of total population. But look at Languages of France, where research by INSEE (hardly a francophobe organisation) puts native French speakers at 86% of the population, which takes the number of French speakers in France down to around 50 million. Remember we're talking mother tongue speakers, not people who are fluent (and, yes, that is tricky, but this is the best data available).
On another note, where I do feel there is serious undercounting is in Africa, but this is unsourced: many people grow up with French as their first language in Abidjan for example, but would report to the census their ethnic tongue, the tongue of their tribe. But I know of no respected study which makes reliable estimates for this, so we can't put such figures in wikipedia for the moment, as they would be original research. Merci de votre comprehension. Drmaik 05:42, 29 May 2007 (UTC)

Ok, it's clean and clear. I undestand now, even if I'm a little pragmatic, I don't manage to find the ethnologists' link about the number of maternal/native speakers... I'm lost in the main page, could you help me? Thanks for all --Irrintzi 12:11, 30 May 2007 (UTC)

check [10]. And I'll copy something from the page...
French
A language of France
ISO 639-3: fra
Population 51,000,000 in France. Population total all countries: 64,858,311.
So in fact 64.9 would be a better rounding.
Hope that helps. Drmaik 12:28, 30 May 2007 (UTC)
I think this little discussion here illustrates the fact that ranking languages by native speakers is completely absurd. Henry Kissinger is not a native English speaker, and Shimon Peres is not a native Hebrew speaker, yet they are obviously English and Hebrew speakers now, so it really doesn't matter what language they spoke when they were 5 y/o if they have long discarded these languages. In the case of France, claiming that only 86% of the people are native French speakers is technically correct but practically meaningless. In real life, close to 99% of the inhabitants of France speak French only or almost only in their daily lives. Also not that if we were to use this strict native tongue criteria, probably the number of standard German native speakers is not even half of the German population, yet the list here considers that all the inhabitants of Germany are native standard German speakers. Same with Italian. Strictly speaking in Italy a large part of the population are native speakers of Sicilian, Napolese, Milanese, etc. All I can say is that this list here is deeply flawed. Godefroy 15:06, 18 June 2007 (UTC)
I wonder what your source would be for 99%? I know many British would say the same about the UK, but they'd be just as wrong. That's the whole problem with what you're saying here and below: sources. The majority always marginalises the minority. Have you been into Arab/Berber homes in France? in the quartiers nords of Marseille, Belleville, Barbes? Listened to conversations in cafes in Alsace? If so, I don't think you'd suggest 99%. But anyway, sources are the thing. Sorry you don't like the results. Drmaik 05:46, 19 June 2007 (UTC)
In the quartiers nord of Marseille as in Alsace, everybody speak French in their daily lives, even the people who speak Arab or Alsacian at home speak French only or almost only outside of home. In fact in 27 years spent in France the only one time I met someone who wasn't able to speak French was the Indian owner of an internet café in an immigrant neighborhood of Paris who was able to speak close to little French. And that was really odd (everybody complained how can you run a business in France and not speak French; actually the people who complained the most about this guy were the black immigrants from French-speaking Africa). That's the only one time. So 99% is certainly not far-fetched. Besides, remember that 1% of the French population that's still 640,000 people, which is a lot of people not being able to speak French. Godefroy 12:53, 19 June 2007 (UTC)
Hi. I wasn't saying these other people can't speak French, or not even not very well, but was talking about their first language. And actually in the quartiers nord lots of people do speak more Arabic/Berber in their daily lives than French, even though that doesn't mean they don't speak good French. Moi, j'ai vecu en France, et je n'ai pas eu de problemes a communiquer avec les gens la, mais je ne me suis jamais considere comme francophone (c.a.d de langue maternelle). Mais il y a un grand nombre d'autres anlgais la, en normandie et bretagne, qui ne parlent presque pas le francais (mes excuses!). In any case, I know we're not debating that, nor who exactly speaks what in the quartiers nord, but comparable definitions of speakers are much trickier than the already tricky issues concerning the mother tongue.

Hey there,... Godefroy send me a wiki-mail to join this particular conversation. It makes me upset when I see a foreigner wikipedian who try to convince other ones, that in France, there are like in the US, people who are not able to speak the national language.
For me, I've moved a lot throughout my born-land Alsace and dealing with this specific point : it brooks no agreement that people in Alsace speak in their daily lives only FRENCH, nevertheless, I have to add, so to recognize, that after a long working day, or during some family lunch during the weekend, some alsatian people would speak at first sight alsacian together in a restaurant but this point does not signify that they are only able to speak alsatian : alsatian is just a local dialect, a familial one. To work, to fulfil administrative paper, to go shopping or for everything else, French is THE language. Moreover, they is less and less people who are able to speak correctly alsatian, the reason is certainly that it is outdated and not daily spoken. So to conclude, a living person in France has to speak French... there is no way he could live speaking only a dialect Paris75000 14:12, 19 June 2007 (UTC)

Paris7500,you said of me I see a foreigner wikipedian who try to convince other ones, that in France, there are like in the US, people who are not able to speak the national language. Well, I did not say that. Please read comments before making such accusations. Merci bien Drmaik 05:22, 20 June 2007 (UTC)
Like I said before the list as it stands now is completely flawed because for France the Ethnologue counts only 51 million speakers of French, which is just 83% of the population of metropolitan France, whereas for Germany the Ethnologue counts 75.3 million standard German speakers which is 91.5% of the German population, and for Italy they count 55 million standard Italian speakers which is 94% of the Italian population. How can there be a larger proportion of standard German and Italian speakers in these two countries than the proportion of French speakers in France, when actually large parts of the population in Italy and Germany are dialect speakers and not standard German or Italian speakers? It just shows that the Ethnologue data are not credible and should not be used for this article. Godefroy 17:08, 19 June 2007 (UTC)
Godefroy makes a good point, these ethnologue figures are inaccurate. It would be reasonable to simply count the entire population of metropolitan France and the DOM-TOMs as 100% French speaking. These people all speak French, as will their children over the age of 24 months. I live in Alsace and I don't know a single Alsatian who doesn't speak French. Does anyone have sources to the contrary ? Metropolitan France + the DOM-TOMs (Guadeloupe, Guyane, Martinique, Réunion) + the territoires d'outre-mer = 64,5 million French speakers in France. As for the African nations, we'll have to be more careful because significant portions of their populations don't speak French. However, including Canadian francophones, Belgian and Swiss francophones, we are already at approximately 77 million. This is not individual research, it's just common sense. We must find more reliable statistics which better represent what is obvious to anyone who has studied francophonie. --84.101.142.131 09:11, 7 November 2007 (UTC)

Is Ethnologue a reliable source ?

The figures given in this page are not up the standards of wikipedia. Several figures appear to be contentious. Why the choice of Ethnologue as a primary reference ? Is it a politically motivated one ? Ethnologue is quite partial and not based on accurate statistics. If we look at Wikipedia in Chinese, Spanish or other languages we can see quite different rankings that match each other. Obviously, there the sources come from not only ethnologue, but also from UN and several UN agencies, national stats and CIA fact book. Ethnologue could be cited, but not used as the primary reference. It just depreciate the standards of quality and gives a partial point of view.


It is just a bunch of Christian people who just like to throw numbers around worst page I have ever seen I am better off checking out stub articles

Emadd (talk) 01:22, 24 June 2008 (UTC)


I agree. Why is this article based around numbers from an obscure special interest group like Ethnologue? I am not an expert, but some of the numbers are obviously inaccurate. A source such as the UN or CIA would be much more appropriate. Udibi (talk) 02:39, 11 November 2008 (UTC)

Original research

There are many problems with this article, but I can't see how original research is one of them. It's mainly cited, and if you read WP:OR you'll see that an we are meant to create a synthesis. Please quote something from the policy if you disagree. Otherwise I'll remove the tag. Drmaik 13:25, 4 June 2007 (UTC)

Spanish native speakers according to Encarta

The Spanish native speakers figure according to Encarta is 322 to 358 million people in the world, not only 322. The 322 figure is incorrect. See Encarta: [11], Encarta/Spanish Language, Encarta/VI.Languages of the World/The 10 most widely spoken languages

No, 322 is correct. Check the reference at the top of the column. For one column, we use one source, and that source is the one I've repreatedly referred to. I can also find a figure (referenced above) of 206 million for Arabic for Encarta, but it's not appropriate to put that in, as it's not from the same page. And we're using that link becuase it lists all languages above the top 10, rather than just the top 10. Drmaik 11:44, 7 June 2007 (UTC)

The list is only a summary, but the correct figure is in the definition of spanish in Encarta, and is only one source because is the Encarta's figure, it isn´t another source. Spanish is underestimated because It is not the same 322 than 322 to 358.

The figure 322 to 358 is more according to Ethnologue [12], and with CIA figures (See World figures). In both sources, spanish is the second in native speakers.

Yes, but that's the list referred to. I'm making the variable figure clearer in the article. Does that keep you happy? And in any case Spanish is listed second as is. Personally, I doubt that this is correct, but that's no justification for me to change it! The sources chosen say 2nd, so 2nd it is! (BTW, that's very old ethnologue data you refer to: it's from the 1996 edition). Oh, and please sign your contributions with four tildes. Drmaik 09:46, 8 June 2007 (UTC)

Has anyone actually read the references listed?

One of the references states that Spanish will be the fourth most spoken language in the world after chinese, english and Indian! This reference is clearly not suitable as Indian isn't even a language.

Lo cual lo que lo sitúa como cuarta lengua del mundo después del chino, el inglés y el indio.

Another reference states that Spanish will be the number one language of the USA in 50 years time. Where did you people get these references from? Also according to the same reference El castellano ocupa el cuarto lugar después del mandarín, más de 1000 millones; el inglés 500 millones y el hindi 497 millones/ in other words Spanish occupies the fourth place after madarin, ingles and hindi!

These references seem to be saying that Spanish is the fourth most spoken language yet the people that have included these articles as references have it listed as second in this article. What's going on?

Link

strange behaviour: you question these sources (which are provided as other estimates, rather than as a basis for ranking), and then seem to use them to demote Spanish to 4th place, while the ranking is based on the ethnologue. And you add Spanglish later on... What am I to conclude? Drmaik 09:20, 8 June 2007 (UTC)

Why ranking languages by native tongue speakers?

Native speakers data are inherently misleading. See what I wrote a few sections above. It would make more sense to rank languages by number of total speakers today (i.e. people who use most the language in their daily lives today, whether they are native or 2nd language users is irrelevant). Or better still, let's rank languages in alphabetical order. Ranking languages by native speakers will always lead to feuds and disagreements. Godefroy 15:08, 18 June 2007 (UTC)

So will your proposal, you cant satisfy everyone. Enlil Ninlil 19:37, 8 July 2007 (UTC)

Proposal

Ok, this is my proposal. We shouldn't rank the languages by number of speakers. This can only lead to permanent feuds because data are just too poor and inconsistent. What we should do is we should list languages in alphabetical order, which is more neutral. Then for each language we should present speaker figures from various sources. We should not make the Ethnologue figures prominent, as there are serious credibility problems with the Ethnologue (read what I wrote a few sections above). What do people think about this? Please vote (support, oppose, or neutral), and state the reason for your choice. Godefroy 17:23, 19 June 2007 (UTC)

Well, there's already an article List of languages by name, please go and edit that if that's what you want to do. This is a different article. Drmaik 05:21, 20 June 2007 (UTC)

Support

  1. Support - I've already stated why. Godefroy 17:23, 19 June 2007 (UTC)
  2. Support, the best source for this list is Ethnolog which is in the best case is outdated. --Pejman47 18:32, 19 June 2007 (UTC)
  3. Support, this list is pretty messy, that's why I clearly support its deletion Paris75000 18:48, 19 June 2007 (UTC)
    1. Comment This is not the place to discuss deletion: that's AfD, which is not done on the talk page. (later) Interesting to note that the two supporters besides the proposer were contacted on their talk pages encouraging them to vote. I wonder on what basis they were chosen? Drmaik 05:21, 20 June 2007 (UTC)
  4. Support, as long as other sources besides Ethnologue are given--its data has been debated numerous times. Perhaps someone leaving a note to the effect of "We know Ethnologue is outdated, but it's the most complete source we have" on the talk page would help clarify the situation. -- Sectori 03:39, 23 June 2007 (UTC)
  5. Support Our standards are not fallen so low that we start comparing ourselves with trash like Encarta. The rankings here have severe problems. For example, there aren't probably more than a couple of hundred million native speakers of Mandarin Chinese. deeptrivia (talk) 04:43, 5 July 2007 (UTC)
  6. The ranking will always be debated regardless of the number and the credibility of the sources used. GizzaDiscuss © 02:56, 6 July 2007 (UTC)
  7. support I fully agree with the reasons stated by Godefroy. The Ethnologue source is completely flawed since the criteria vary according to the language. A serious encyclopedia cannot rely solely on this source.
  8. support Ethnologue is not a reliable source at all. For example when considering Italy it states that in Italy a substantil part of the population is not native speaker of Italian! FALSE and MISLEADING! I'm Italian and I can assure you that in Italy people when speaking UNFORMALLY can use regional words and accent but they are able to understand and speak standard Italian NATIVELY! Then, these data penalize FRENCH a lot: recent data show there are about 200 milions 1st or 2nd language speakers of French (and 2nd language is a different thing than "foreign language"), francophone Africa counts more native franch-speakers every year, if we count speakers of French as a foreign language this figure rises to 500 milions (source: official site of the French Governement) PS: anyway the Italian Wikipedia solution of listing a number of different charts based on different sources in a very good idea. --Easyboy82 23:37, 10 July 2007 (UTC)
    yeah, but the most useful list is still the Ethnologue one. The others go to rank 17 at most. We cannot reproduce ten list with 200 entries each, that's madness. You'll just have to accept that the nature of language is fractal as it were: there is no way of defining it objectively, but lingustic communities are still a reality that can be charted in this way. The problem is that ther is no better source than Ethnologue, for all its shortcomings. If there was a better source, we'd use it, but as it is, we have to be thankful we at least have the SIL data. dab (𒁳) 13:14, 31 July 2007 (UTC)
  9. support. Ethnologue sources are not credible and are ridiculous ! Ethnologue sources destroy the "List of languages by number of native speakers"(Ethnologue sources are not a neutral sources but only an american point of view) Please, help and save that topic about languages in the world. Oldealliance 06:57, 3 August 2007 (UTC)

oppose

  1. OpposeThis is the kind of list that encylopedias (e.g. Encarta) have. It's useful information. Personally, I quite like the way the one in Spanish wikipedia is arranged, but that system will only work for the top few languages, as there are very few sources for less-spoken languages. It's also quite a lot of work. The only list that I know that goes as low as 10 million is the encarta one. Drmaik 05:21, 20 June 2007 (UTC)
    The ranking here is not based on Encarta. It is based on the Ethonologue, which has serious credibility problems. Godefroy 12:53, 20 June 2007 (UTC)
  2. Oppose - i oppose deletion wholehearteded how about a new proposal, lets put in all available estimates and list languages by the lowest credibale estimate available?Cholga 00:19, 22 June 2007 (UTC)
  3. Oppose It's not that I don't think this article needs a serious change, but I don't think that the proposed replacement has much useful merit. It would be difficult to interpret, and little use to visitors who simply want to compare language statistics.
    1. Equally though, I don't like the page as it is. There is already a page for Ethnologue list of most spoken languages, so this page relying heavily on those statistics is unnecessary (not to mention the debatable merit of the Ethnologue itself). I would be all for a list of speakers of a language, period; native or not. If such figures could be found, it'd be a more favourable replacement. Patch86 15:52, 30 June 2007 (UTC)
  4. Oppose It seems over-optimistic to me that putting the list in alphabetical order would really deter the nationalists. Furthermore, the basic question "which languages are the "biggest?" is a valid question which any self-respecting encyclopaedia SHOULD attempt to answer. The fact the available sources are not as reliable as we would like doesnt invalidate the question. I personally think the native-speaker criterion is the most meaningful; I dont really understand why some people have a problem with it. Jameswilson 23:01, 5 July 2007 (UTC)
  5. Oppose If you list it that way you're still presenting the same data, so you have to ask yourself: "In an article entitled "List of languages by number of native speakers", would it be more useful and logical to present it in a descending order or alphabetical. Frankly, I think most people are coming to this page to get a general idea of the size order of the world's major languages, so alphabetical order would only be counterproductive in presenting the data in the clearest manner. As long as there is a disclaimer at the top saying that all numbers are necessarily rough estimates, there is nothing wrong or POV with presenting the data in such an order. Joshdboz 23:13, 7 July 2007 (UTC)
  6. Oppose - I'd like to echo what others say - this list is deeply problematic, but it's still basically worth doing, and the way to make i t better isn't to give up on it. john k 15:50, 9 July 2007 (UTC)
  7. Oppose - I oppose for the same reason as everyone else. --Stefán Örvarr Sigmundsson 03:47, 17 July 2007 (UTC)
  8. Of course oppose - As per Jameswilson and my own personal reasons, I have referred to this page countless times for projects and so forth. Max Naylor 20:32, 17 July 2007 (UTC)
  9. oppose obviously. This list is useful, it just has to be very clear about its own shortcomings. But we shouldn't let the rank numbers run beyond the 10 M tier: that's supremely pointless. dab (𒁳) 13:10, 31 July 2007 (UTC)

neutral

Punjabi

Punjabi is ranked at 60 million, the "western Punjabi" statistic. There are 20 million "eastern punjabi" speakers, and that number is not included in the total punjabi native speakers count. Western and Eastern Punjabi is the same language, separated by a political border (Indo-Pakistan border). They may differ slightly, but that is to be expected with very similar dialects of the same language. To make an excellent analogy, this is like the variety in Spanish between, say, Spain and Mexico. I think Punjabi should be ranked by the 80 million because of the small difference (mostly religion/politics based) between western and eastern punjabi, and because of the precedent set by including all or most dialects of spanish under one umbrella figure. Thank you, user:Virsingh 68.38.87.65 12:10, 9 July 2007 (UTC)

I agree that Punjabi is generally considered to be a single language, whatever Ethnologue may say about it. john k 15:48, 9 July 2007 (UTC)

I'm a little new to wikipedia...how would i go about gaining legitamacy for moving punjabi to the 87 mil mark? Do I wait a few days and see if its unanimous, or do I make a heading saying Proposal and then see if people agree? Virsingh 21:24, 9 July 2007 (UTC)

in the interest of consistency, we should not rank "Hindi" according to the narrow Ethnologue definition, and then list Punjabi by a wider definition than Ethnologue. Delineating language boundaries will always be arbitrary, and there is no point in debating this here, we have to take some sort of scheme and stick to it. This article follows Ethnologue, which isn't a great source for this sort of thing, but the best we have readily available. By the same token, we would have to list German not at Ethnologue's "narrow" 95 M, but at the wider 123 M. We have to recognize that there simply is no way of doing this objectively, and that people will be unhappy with the list no matter what we do. dab (𒁳) 13:06, 31 July 2007 (UTC)

Ordering of some of the Amharic and Hausa.

In reviewing the information presented for the Amharic and Hausa it would seem that the order presented on the list is wrong any way you look at it. I am no expert on the subject, but if you say Amharic is ranked 36 with 17.4 million (ethno) 27 million native (32.7% Ethiopia [1994 census] and 2.7 million emigrants), 10% (7 million) as a second language = 34 million total and Hausa is ranked 42 with 24.2 million speakers, plus 15 million second langauge speakers for a total of 40 million. Hausa has more first lanauge speakers and more first+second langauge speakers than Amharic. Perhaps we should go through all the rankings to make certain the rankings given match the numbers in the explanation. 70.189.39.195 04:09, 30 July 2007 (UTC)

Follow up on the rankings on the Amharic/Hausa

I re-ranked the number of speakers in this list 10-30 million speakers based on the two different estimtes given in the table. The first set is based on the SIL estimate, the second set is based on the other information box and the third set is the rankings as origianaly present for comparison. For the second estimate I used the estimate of 'native' speakers only, if given in the 'total speakers' column. I used the upper value when a range of values was presented. Finally I give the original ranking. All columns sorted in descending order based on the estimate.

Langauge SIL estimate Language 2nd Estimate ranking in list Ranking Sundanese 27.0 Kurdish 31.4 Amharic 36 Romanian 26.3 Sindhi 28 Sundanese 37 Sindhi 24.5 Amharic 27 Romanian 38 Hausa 24.2 Sundanese 27 Kurdish 39 Malay 23.6 Romanian 26 Dutch 40 Pashto 22.8 Dutch 25 Pashto 41 Uzbek 20.1 Pashto 25 Hausa 42 Dutch 20.0 Hausa 24 Indonesian 43 Yoruba 20.0 Oromo 24 Oromo 44 Igbo 18.0 Tagalog 22 Tagalog 45 Amharic 17.4 Uzbek 20 Uzbek 46 Oromo 17.2 Yoruba 19 Sindhi 47 Indonesian 17.1 Lao 19 Yoruba 48 Tagalog 17.0 Cebuano 18.5 Somali 49 Assamese 15.4 Malay 18 Lao 50 Nepali 15.0 Igbo 18 Cebuano 51 Cebuano 15.0 Indonesian 17.1 Malay 52 Hungarian 14.5 Serbo-Croatian 17 Igbo 53 Shona 14.0 Malagasy 17 Serbo-Croatian 54 Zhuang 14.0 Somali 16 Malagasy 55 Madurese 13.7 Nepali 16 Nepali 56 Sinhalese 13.2 Assamese 15 Assamese 57 Greek 12.0 Shona 15 Shona 58 Czech 12.0 Khmer 14 Khmer 59 Fula 11.4 Zhuang 14 Zhuang 60 Serbo-Croatian 11.1 Madurese 14 Madurese 61 Malagasy 10.5 Hungarian 14 Hungarian 62 Somali 9.8 Sinhalese 13 Sinhalese 63 Quechua 8.3 Fula 13 Fula 64 Khmer 8.0 Tamazight 65 Tamazight 65 Kazakh 8.0 Haitian Creole 66 Haitian Creole 66 Haitian Creole 7.8 Czech 12 Czech 67 Kurdish 6.0 Greek 12 Greek 68 Tamazight 3.5 Kazakh 12 Kazakh 69 Lao 3.2 Quechua 10.4 Quechua 70

Statistics

is Statistics Include languages are from origin arabic and other arabic languages which use between arabs..?

like

Literary Arabic language

Bathari Arabic language

Harsusi Arabic language

Maltese Arabic language

Tigrinya Arabic language

Jibbali Arabic language

Mehri Arabic language

Soqotri Arabic language

Darija Arabic language

Of the above only Literary Arabic and Darija are Arabic; and Maltese developed from Arabic but is not Arabic. Even Soqotri is South Arabian and not Arabic or even a decendent of Arabic. The rest are all Semitic but not Arabic. What is your point? --Maha Odeh (talk) 11:48, 20 July 2008 (UTC)

Where's Luxembourgish?

??? --MosheA 21:27, 3 August 2007 (UTC)


Isn't this list original research from the beginning to the end. People move languages up and down as it suits them, almost never giving any motivations or any sources to justify why one languages has certainly acquired a substanstial number of new speakers or tragically lost a few millions. And what's with Serbo-Croatian according to this list, there are 11 millions who speak Serbo-Croatian. That's the number of speakers for Serbian, so either the language should be called Serbian with separate listings of Croatian and Bosnian or the speakers of all three languages formerly known as Serbo-Croatian should be put in, resulting in well over 20 millions. JdeJ 18:48, 10 August 2007 (UTC)

The logical explanation is that it was probably listed as Serbo-Croatian at one time, then a user decided they should be separate, changed it to Serbian and reduced it to 11 million speakers; then a second user switched it back to Serbo-Croatian again but forgot to rechange it to 20 million. You're probably the first person to notice the mistake. Lyreto 05:40, 11 August 2007 (UTC)
It seems as if for every language we have the Encarta estimate, the Ethnologue estimate and then something called "other estimates". It almost defies belief, but it seems like the ranking is done on these other estimates. So anybody can put in whatever number they want and then the ranking is done on that. Although both Encarta and Ethnologue are external sources, these are disregarded in favour of these mysterious "other estimates". I'll start deleting them in accordance with WP:OR unless sources are given for them. JdeJ 19:14, 10 August 2007 (UTC)
I think you are being a bit harsh there, JdeJ. The point is that the Encarta and Ethologue figures are so often wrong. They are just a starting point awaiting improvement by those who have access to better info on a given language. It is right that the "other estimates" column should be treated as the best estimate.
Actually the ranks are supposed to be based on Ethnologue, but it appears no one's had the time to look at the lower-ranked languages. Lyreto 05:40, 11 August 2007 (UTC)

It is the net result of all the modifications which various contributors have suggested (based on national censuses or whatever) and others have debated on the talk page over a period of several years now. Until the recent clampdown on unsourced statements many of us often didnt bother to state the sources on the main page (if it was in a little-understood language for example or fully sourced on the wiki page for the language in question) but you shouldnt assume the figures werent subject to scrutiny by those involved in the particular discussion at the time. Granted, sometimes people do come along and stick in any figure they want but those are generally weeded out immediately. Jameswilson 23:18, 10 August 2007 (UTC)

JdeJ, there isn't a team of editors working on this page. There is basically only one editor, who is Drmaik and he can't keep up with every anonymous change. So you're basically arguing to an empty room! Also, I'm not sure what adding fact tags will accomplish, those figures probably were added years ago. Lyreto 05:40, 11 August 2007 (UTC)

Lack of consistency

Seems like there are many different ways to do the ranking here. For some languages, only native speakers are counted. For others, second language speakers are included as well. Some consistency please, we can't have it both ways. Almost by definition, the versions including second language speakers should be deleted. It's hard enough to arrive at a good estimate for native speakers, estimating the number of people who have learned it well enough to be considered second languages speakers is completely impossibly and original research in each and every case. JdeJ 19:10, 10 August 2007 (UTC)

Well, I think I've checked the top 33 or so, which are based on the ethnologue ranking, which was a narrowly-carried decision some time ago. Even maintaining that takes quite a bit of work (well, for me anyway!), as there are several people out there who want to inflate the status of their language. Below 33 (may be down to 50?) I'm not sure that there's much point in ranking languages. This list should have a lower limit: this is not list of languages. 10 million anyone? Top 50? And yes, unsourced 'other estimates' should be removed soon-ish. Drmaik 17:30, 12 August 2007 (UTC)
All of it sounds good. The further down the list goes, the harder it becomes to get any precise data. Top 100 might still be idea, but not any more than that. I must admit to being very sceptical about Ethnologue as a source. Being an academic within sociolinguistics myself, I know much too well how often they get things wrong. Everybody does it some times, and I can understand it with small languages, but they make so severe mistakes even with big languages at times that it almost defies belief. Of course, the alternatives are a bit limited... ;) Using the list provided by Encarta for languages between 10.000.000 to 50.000.000 might be a good idea. Above that, the Ethnologue figures are probably more up to date.
I'd propose to simply remove the column with other estimates. We'd also need to decide for how to do with some languages that are sometimes counted as dialects. It's pretty clear that almost every person living in Lombardy is counted both as a native speaker of Lombardian and Italian. I doubt any census has ever been made on the number of people who speak Lombardian, nor is it known to what degree the speakers actually use it in everyday life.
Another thing that would be urgently needed is to check the factboxes for at least major languages. Some of them are mindblowing. :) Looking at Italian language, we are informed that there are 120.000.000 native speakers of Italian and 200.000.000 speaker in total. Where are they hiding. The Italian diaspora is very large, but not that large given that many (the majority) are no longer native speakers of Italian. Same thing with Greek language, up to 25.000.000 native speakers. Most European languages (sorry for being more ignorant about languages outside Europe) do have sourced factboxes, but some feature these fantasy-figures. JdeJ 19:26, 12 August 2007 (UTC)
well, I think the 'other estimates' column has its place: for census data, for example. Doesn't the encarta data get more similar to the ethnologue the further down one goes? They state SIL as the source, which would explain that (though the Arabic figure in Encarta is just bizarre). In any case, I'd like to limit it to Top 100. I agree on factboxes, but there tend to be edit wars over such things. Drmaik 20:10, 12 August 2007 (UTC)

ok, the problem, as is recognized by everybody:

  • SIL Ethnologue is often outdated, its estimates are rather conservative, and it tends to classify as separate languages what are usually merely considered dialects
  • SIL Ethnologue is the only source we have that aims at providing statistics for all the world's languages.

The task of sorting we are facing here is of a different nature for large languages than it is for small ones. For smaller languages, say, below 5 million speakers, we'll have no choice but rely on Ethnologue. The question is, should we sort by other, more reliable sources for large languages? I have introduced a "Top 20" tier now. That's as arbitrary as any other tier structure, but it happens to nicely cover languages with 60 million native speakers or more, and it dovetails with our Ethnologue list of most spoken languages article. Now, for the people unhappy with SIL ordering, how about focussing on this "Top 20" tier exclusively for now? You can gather the most up-to-date statistics and estimates, and we could re-order it according to these. Once you have done that, you can attack the 30-50 M tier, and maybe even the 10-30 M one, at which point you should stop and leave the smaller languages to SIL. I think that if this is carried out cleanly for all languages in a tier, and not just for a single favourite, this will be uncontroversial. As such, this isn't a content dispute so much as a suggestion for improvement. Since the problems and possible solutions seem well defined, and admitted by the article itself, what remains is really mostly somebody sitting down and sacrificing the time necessary to do this properly. --dab (𒁳) 14:29, 14 August 2007 (UTC)

I think your idea is worth trying: we do have the Ethnologue list of most spoken languages already, so this wouldn't need to have the same ranking. The biggest problem is with the big languages, which tend to be spoken in lots of countries. So I would propose that we discuss changes of figures here, and once we've done a few, we could change the ranking basis. It does sound like a lot of work, but I don't mind working with others on it. But I think we need ideally 3 doing it.

Drmaik 14:46, 14 August 2007 (UTC)

(later) My only worry is that the current system is easy to administer, and seems to have reduced the amount of single-issue fact-changing. The more complex we go, the more open to abuse this whole thing is. Some criteria might help, e.g. census data being the most valued, then studies by serious research bodies (e.g. look at Languages of France for some well-grounded data done for France, done by a French research body (but contested by some French editors, basically becasue they want to inflate the figure)) Drmaik 06:26, 15 August 2007 (UTC)
This sounds like a very good idea. Having spent a number of years studying the sociolinguistic of smaller languages in particular, I'd be happy to help out with those belowe 10M. Of course I'll be happy to help with the bigger ones as well :) JdeJ 17:14, 14 August 2007 (UTC)
Definitely. I like the idea of a top 50 or 25; or even by >50million. I'd avoid using ethnolouge all together. As someone who isn't a linguist, I don't know how respected it is, but the figures are absurdly out of date. 1984 population for the U.K? 1970 census for American Samoa? 1999 for world english figures? Its crap, pure and simple. I think a list of total speakers might be more useful; you could have one column for native, one for second language, and one for total. I don't know how difficult that would be. It seems we only have either 'official language' or 'native speakers', which doesn't actually tell you how many people speak which language. Iorek85 00:56, 15 August 2007 (UTC)

I've just worked on linguistic demography, which should be the article discussing the difficulties involved here. We have to recognize, that for major languages, it will not be possible to get an estimate that is better than to perhaps 10%. This isn't so bad, but of course makes it impossible to give a reliable ranking. The only thing we can do is give "rowspan" rank ranges. E.g., there is no way to be positive on whether Russian or Portuguese have more speakers, but it is clear that together they rank as 7th and 8th, after Bengali and Arabic, and before Japanese:

Comrie (1998) Weber (1997) SIL
1. Mandarin Chinese 836 1,100 873 (1990s)
2.-4. Hindi+Urdu 333 250 364 (1997)[1]
Spanish 332 300 322 (1995)
English 322 330 309 (1984)
5.-6. Bengali 189 185 171 (1994)
Arabic 186 200 206 (1998)
7.-8. Russian 170 160 145 (2000)
Portuguese 170 160 178 (1995)
9. Japanese 125 125 122 (1985)
10. German 100 100 114 (1990s)

--dab (𒁳) 12:20, 15 August 2007 (UTC)


Rank 2 of Hindi

Hi All,

I have calculated the number of native speakers of all the dialects of Hindi as given in Ethnologue tree for Hindi [13] and have come up with this 2nd rank.

Please feel free to recompute again.

Previously some predjudiced and unknowlegeable people had put the number of only one dialect of Hindi (Khariboli) which was only 181 million.

I hope this ends the speculation about the rank of Hindi , as the Ethnologue data clearly shows Hindi as No2 in languages of the world.

thanks and regards,

Bdebbarma —The preceding signed but undated comment was added at 17:27, August 24, 2007 (UTC).

Somebody switched it back. The page Ethnologue list of most spoken languages shows that the Ethnologue figure for Hindi only includes the Khariboli dialect. Also, there are other discussions on this talk page which are about the Hindi figure. Apparently, Ethnologue says that Khariboli is the same thing as Hindi, and gives figures for two other dialects. Whether Khariboli should be called Hindi or not, these dialects are not mutually intelligible, and are listed separately. Someone the Person 21:33, 27 August 2007 (UTC)

indeed. It's a matter of definition, read the Hindi article, and see the disambiguation notice right on top. Hindustani is a dialect continuum, and there is no objective way of saying how many languages there are. Your "Ethnologue tree" lists all of the Indo-Aryan languages. --dab (𒁳) 17:40, 31 August 2007 (UTC)

English language figure too low

English is spoken by way more people than this article suggests. I don't speak English at home with my parents but consider myself a native English speaker as I use it all the time when I leave my house. I think that its incorrect to think that bilinguial children are not native English speakers just because they use another language at home with their parents. —Preceding unsigned comment added by 210.49.197.7 (talk) 01:28, 21 October 2007 (UTC)

The main problem is that Ethnologue is using ancient census data for their numbers, even in their 2005 version. For example they cite a 1984 estimate of U.S. English-speaking population, which is now about 30 million too low. --Delirium 18:45, 28 October 2007 (UTC)

Improving the page

Why doesn't someone add a an extra colomn that would contain the total number of people that live in countries where the particular language in the row is official regardless of whether they are native or not. For example, if you add up the number of people that live in all the countries where English is official you get a figure that gives you a ranking that actually represents the status of the language today. —Preceding unsigned comment added by 210.49.197.7 (talk) 01:34, 21 October 2007 (UTC)

Excellent idea!!!

Robledo —Preceding unsigned comment added by 201.73.79.20 (talk) 12:17, 22 November 2007 (UTC)

Spanish number too high

People that speak Mixtec, Quechua, Guarani, Aymara and so on as a native language should not be counted as Spanish speakers. —Preceding unsigned comment added by 210.49.197.7 (talk) 21:42, 31 October 2007 (UTC)

That's the problem with using Ethnolog as a primary source. They tend to lump all these dialects together under one language, in this case...Spanish. This whole article has become a joke. The English numbers are absurdly low, and the Spanish numbers are incredibly inflated. It's almost like some politically motivated *****-fest to see that Spanish ranks higher than English on here. As stupid as it sounds...And the motivation for that is beyond me. Abalu (talk) 09:44, 14 January 2008 (UTC)Abalu

Arabic too high (please comment)

Of course, that the arabic people represents the number of almost 400 million people, but it is their ethinicity, and in my opinion we can not count for the arabic language a number of speakers similar to the population of the arab world. I really know that is very difficult to define language, but we must consider here what is mutually intelligible. We count in arabic languange a speaker of Marroco and a speaker of Iemen, but if these two guys meet each other they can't understand each other. If we count then together, the we must count all latin together, at least portuguese-spanish together (galician is the same of portuguese), they share 90% of vocabulary and it´s total comprehensible for both speakers, more close then marrocan arabic and iemenite arabic. We must count together, norwegian, swedish and danish, as much as indonesian and malay!! I'm very interested in semitic languages, but, as matter of fact, arabic is not the same language everywhere, and if we consider MSA, then we must consider that portuguese-spanish-italian, are very close when written, the same is worthy, for norwegian, swedish and danish, as much as indonesian and malay. Post your opinion!

Thanks Robledo —Preceding unsigned comment added by 201.73.79.20 (talk) 12:15, 22 November 2007 (UTC)

What you suggest sounds rather like original research. The issues are messy, but are discussed in the article. Drmaik (talk) 13:01, 22 November 2007 (UTC)
First as said, your claim is an original research. I am wondering why we find hundreds of Pan-Arab media. Do we find Italian media targeting Portuguese, or a Brazilian channel speaking Portuguese targeting Brazil and Argentina? Italian or Spanish are languages, they have similarities as other languages do, but what you are stating here are dialects. According to you we should change German too (a German can not understand a Swiss German). By the way it's Yemen not Imen.Bestofmed (talk) 19:01, 7 December 2007 (UTC)

Look, I'm not trying to discuss what means to be an Arab, well in my opinion if all the Arab World is one and united it's great, you have your beautiful traditions and customs which me myself am very fond of. I'm just proposing separating the varieties of Arabic languages that are not mutually intelligible, as someone did with Chinese language in this very same article, which similar to Arabic is a Macrolanguage. I believe when it concerns to language the less important thing is how many persons speaks it, I've seen here people trying only to enlarge the number of speakers of it's own languange just to get higher position in the rank, as it would really matter at all!!! I believe we must spend some effort to make the article as better as we can! What I pointed out is... if we count all varieties and dialects or creoles and pidgins or whatever an Arab speaks it's not the same language if it's not muttual intelligible (my opinion), IF.... we consider then altogether (something like "total Arabic"), THEN.... we must consider altogether ("total Latin") because latin people have many things in common too, like cuisine, dance and many more related to it, plus the language that are very close. That's All!!! PS:. Not trying to discuss anything related to ethnicity or the language itself, just proposing to account the number in a different manner.

Robledo

Is Cantonese a dialect?

Is Cantonese supposed to be a dialect? If not, you should add more languages to it. In my hometown, at least half a million people speak Fuzhou dialect. There are also thousands of dialects in China... —Preceding unsigned comment added by 141.155.107.9 (talk) 23:51, 7 December 2007 (UTC)

No it is classed as a seperate language from Mandarin. If they were dialects, there would be no reason to learn the other. Just because the languages of China use the same script says nothing of their relationship. Enlil Ninlil (talk) 10:44, 31 December 2007 (UTC)
There IS a special relationship among Chinese languages that does not exist among Romance languages or Germatic languages. To understand this one must differentiate between spoken languages and written languages. Mandarin and Cantonese are clearly two different spoken languages; however they are the same written language, specifically they are both Chinese. That is because their "script" is based on ideograms, not phonetics. Although a Mandarin speaker can not understand the spoken words of a Cantonese speaker, they can understand each other's written word, (they can write letters to each other; they both understand each other's newspapers). The written Chinese symbol for a given word is the same whether written by a Mandarin speaker or a Cantonese speaker, yet the spoken word in Mandarin is different from the spoken word in Cantonese. On the other hand, Romance languages and Germatic languages also all use the same “script” (abc), but because their scripts are based on phonetics, (sounds combined to make spoken words), Spanish, Italian, English and German are all different spoken languages AND different written languages. Hplwas (talk) 19:02, 24 January 2009 (UTC)

Izon language

The largest of Ijoid languages is missing. —Preceding unsigned comment added by 88.195.46.112 (talk) 10:38, 31 December 2007 (UTC)

Someone needs to create a cited article or improve the current ones. Enlil Ninlil (talk) 10:47, 31 December 2007 (UTC)

Altaic?

KOREAN LANGUAGE is related to Altaic. TOTAL KOREAN LANGUAGE SPEAKER IS 80-88 MILLION. PLEASE UPDATE THE NUMBERS. —Preceding unsigned comment added by Americanprofessor (talkcontribs) 13:57, 25 April 2008 (UTC)


If the table row for Korean says "considered either language isolate or Altaic," then why doesn't the table row for Japanese say anything about how Japanese-Ryukyuan is sometimes considered a subfamily of Altaic? Someone the Person (talk) 21:18, 5 January 2008 (UTC)

Indonesian and Malay

The respective articles for the Indonesian language and the Malay language both describe the former as a variety of the latter. It seems contradictory and inaccurate to keep them separate in this list. I do not believe there is any political controversy about their being the same language, they are simply given different names for convenience when describing the separate official dialects.GSTQ (talk) 00:34, 18 January 2008 (UTC)

Estimeits of toutal spiik'rs

Added the high estimates of teh first 39 languages and got 7,03*109, so there's at least about 500 million people who speak two languages as well as the first one. 88.195.46.112 (talk) 06:37, 21 February 2008 (UTC)

Tajikstan????

Why does it say that 300 million people speak Tajik? It only lists it as the official language of Tajikstan, whose population is given as only 6 million. The edit was added today. --Whiteknox (talk) 21:52, 7 March 2008 (UTC)

English speakers data from 1984?

http://en.wikipedia.org/wiki/Ethnologue_list_of_most_spoken_languages

It says the English stats (the same 309m) are from 1984. THAT makes sense, but having that here is insane. It casts doubt over the whole article. Stats from that long ago are useless, it's only slightly more than the current population of the US alone! CAn someone use the data from the OFFICIAL censuses of the countries concerned? I mean, look at India alone http://en.wikipedia.org/wiki/List_of_Indian_languages_by_number_of_native_speakers. Ethnologue think there are 188m Hindi speakers, the OFFICIAL 2001 CENSUS OF INDIA (which I think may have a tad more credibility) lists 422m —Preceding unsigned comment added by 60.234.210.9 (talk) 11:58, 2 April 2008 (UTC)

Behold! The "most wrong" article in Wikipedia!

What a discredit

1- The sources are totally out to date.

2- There are macrolanguages considered in the list and not the mutually understandable varieties.

3- There are languages that were added by a different language already on the list.

4- There are people editing this article, thinking that the most important in a language is it's number of speakers and not considering what really matters, that is, the criteria to create the list.

5- There isn't a clear criteria to organize the language as matter of fact.

6- This list is totally biased.

If I count the number of changes made in this article every time I visit it, it would be a number greater than the number of speakers of Chinese.

Anybody willing to study just a bit can create a better and not biased list.

Sorting by columns

Sorting the table by «Other estimates» column does not work correctly and needs to be fixed. Svmich (talk) 02:23, 3 May 2008 (UTC)

Source

Why is Ethnologue used instead of Encarta or the CIA factbook? Which prick decided that an Evangelical Christian organization was more reliable than the CIA or Encarta? —Preceding unsigned comment added by 63.216.117.85 (talk) 04:12, 7 May 2008 (UTC)


Use of another source is fine, if referenced, and if you want to go to the effort of adding it to the table.

This article is SOO messed up right now it's not even funny. The existing sources aren't even reported accurately, some joker put French up to #4, and I was almost about to track it down to report it as vandalism, but just fixed the numbers and haven't really taken time to figure out how many other complete screw ups there are and re-rank the table correctly.

Getting the existing sources quoted correctly has higher priority than adding another source, IMHO, and THAT is a mess.

Badlandz (talk) 03:57, 31 May 2008 (UTC)

Messy page

This page is so messy, and changes everyday! e.g. French, first it said 500 million speakers with some knowledge and now 600?! there are 100 million speakers of French every day or what, and the same with many other languages like Spanish and all others. SIMPLY DELETE THIS PAGE! it's so unrealistic and lies to the people that read it. 71.196.102.141 (talk) 00:28, 11 June 2008 (UTC)

Yes this page should be deleted. Some people just seem to want to promote their language no matter what the cost. —Preceding unsigned comment added by ItemSeven (talkcontribs) 13:44, 16 June 2008 (UTC)

This is so wrong

Even a 8 year old can see how wrong this ISL thing is....

The subject is important but the article is Totally WRONG. —Preceding unsigned comment added by Emadd (talkcontribs) 01:16, 24 June 2008 (UTC)


Dutch

The Dutch number is way to low, it has to be 27 million in stead of 20 million.


Yes this is right —Preceding unsigned comment added by 85.144.100.44 (talk) 16:47, 12 September 2008 (UTC)

Afrikaans

There are 16 million people in the world who speak Afrikaans, not the 6,1 million as this page says.


Suggestion

Would it be possible to inlude how many people spoke each of the languages say ten or fifty years ago? People would be able to see which languages are growing fastest, which ones are declining, which ones are stable and so on. —Preceding unsigned comment added by 58.161.69.75 (talk) 14:48, 28 June 2008 (UTC)

Where is the Ethnologue list

Could the great administrator of this mess please show me the list of Ethnologue 2005, I just visited the page and was not able to find any 2005 list. The numbers on selected languages profiles I checked was different from the numbers listed here. Thank you. DanishWolf (talk) 23:14, 30 June 2008 (UTC)

The problem is that Ethnologue is used at all. Ethnologue is many things, some good and some bad, but it is not a reliable source for the number of speakers of any given language and definitely not for the classification of languages. Much of it is done by happy amateurs who take sources that they interpret themselves, much like Wikipedia. To make matters worse, much of Ethnologue still uses population figures from the early 80s. I don't always put too high a trust in Wikipedia, but in comparison with Ethnologue this is a prime academic source. JdeJ (talk) 19:10, 1 July 2008 (UTC)
There are unarguably a lot of problems with Ethnologue. For one, not all these numbers are 2005 estimates, but rather the most recent estimate as of 2005. For some smaller languages this could be the early nineties. However, as far as I know no one has a better estimate for most of these languages than Ethnologue, so we are out of luck unless you can find a more complete source.Chris Quackenbush (talk) 16:38, 30 July 2008 (UTC)
Ethnologue is reliable, but just as the previous editor stated, it is outdated for many of the languages. Unfortunately, for some that's all we have for now. Kman543210 (talk) 23:30, 30 July 2008 (UTC)

Turkish

Turkish the 5th language in the world? You got to be kidding! This article really needs to be protected. Aaker (talk) 23:39, 3 August 2008 (UTC)

By the order of native speakers, the ranking from 1 to 10 should be Chinese, English, Hindustani, Spanish, Russian, Bengali, Portuguese, Serbo-Croatian (disputed, right?), Arabic, and Japanese. Turkish is around 22nd.-- Hello World! 07:26, 5 August 2008 (UTC)

German

I don't believe by some sources (For example http://www.vistawide.com/languages/top_30_languages.htm) that there are 200 million German users. Let me take Austria as an example. Austria, being a country that have 8.3 millions people, among them about 7.5 million say Bavarian (considered as a German dialect) and 7 million say Standard German. Will there be 7.5M + 7M = 14.5 million German users? The number of first-language users an the number second-language users should not be over-simplifically added. Suming up all German-speaking countries, I doubted that the number of German users was nearly doubled. -- Hello World! 07:35, 5 August 2008 (UTC)

Turkish 250,000,000...who turkish nationalist wrote this ****?

...... —Preceding unsigned comment added by Feta (talkcontribs) 22:22, 7 August 2008 (UTC)

Just revert and ignore this figure; it's vandalism. There has been constant vandalism to the Turkish language article as well as this one from someone or some people changing it to 250 million. There is one source that's in Turkish that gives this number of Turkish speakers, but it is not reliable, so that's probably where this made up number comes from. Again, just revert. Kman543210 (talk) 23:51, 7 August 2008 (UTC)

Ethnologue (most recent data)

There has been an IP editor who continually changes the numbers for several top 20 languages back to the 14th edition of Ethnologue. The Ethnologue 15th edition had changes in some of the total native speaker figures from the 14th edition, and in some cases, the number decreased where people may have thought it should increase. It's not our job to judge the new figures and decide we're going to display old ones because we don't like the new ones. English, French, Spanish, and Russian I've noticed were decreased. We need to keep the most updated version in the Ethnologue column even if it doesn't make sense to us.
There is also a source from a high school that people keep citing as the SIL that should not be used: Most Widely spoken languages. It is a 3rd-party source of original sources, so the original sources need to be used and cited instead. Also for French, it adds creole languages to the total which are not classified as French, so they should not be added.
I don't have a grasp about how Arabic and Hini/Hindustani are being calculated. Does anyone know how these are being calculated because both of them are obviously fusions of several different entries. The different dialects should be given in the info box, or at least in a footnote, to indicate this so it can be verified. Kman543210 (talk) 10:03, 12 August 2008 (UTC)

French language

The number of French-speaking speakers, given by Ethnologist, is ridiculous because France counts 64 million inhabitants. It is necessary to add the millions of French speakers living all over the world (notably in about fifty countries of the Francophony, the French-speaking community). The number indicated by Ethnologist ( 2005 ) dates 1999 and is not any more updated since this date. Wikipedia has to remain a reliable encyclopedia, to put the French language in the 19th position is false, stupid and ridiculous.
Don't be afraid ! The French language does not wish to dominate the world but to be only placed in the good place of this list ! Busway (talk) 06:17, 31 August 2008 (UTC)

Many of the numbers from Ethnologue are out of date, but that is why this page has additional columns for MSN Encarta estimates as well as any additionally sourced estimates in the far right column. English is given as 309 million world wide, and that is obviously incorrect; however, we cannot just make up numbers for the Ethnologue column; it's just one estimate. One of the sources that people keep trying to include for French combines Haitian Creole as well which is not categorized as French and should not be included. Remember that Wikipedia is about verifiability and not truth. Any Ethnologue (SIL) estimates must use the most current, 15th edition source. Kman543210 (talk) 06:27, 31 August 2008 (UTC)
  • The reference about : "author=Gordon, Raymond G., for his work =Ethnologue: Languages of the World, publisher=SIL International 2005" give statistics of 1968 !!! Read Raymond G. and you discover France with 51,000,000 inhabitants (situation of France in 1968, Now (2008) it's 64,000,000 inhabitants + millions french-speaking people in the world wide.
  • Now a better reference : 79,572,000 french mother tongue : Ethnologue (SIL), 1999, as cited in http://www2.ignatius.edu/faculty/turner/worldlang.htm ; figure includes Haitian Creole French
Busway (talk) 18:51, 2 September 2008 (UTC)

I'm not sure where the 1968 date is coming from, but that is not where the Ethnologue number is from (no where on that page does it say the numbers are coming from 1968; some are as recent as 2004). The number was decreased between the 14th and 15th edition for both English and French, and I'm not sure why. Maybe it's a different method in calculating "native speakers". Not everyone in France speaks French as their first language; even if someone speaks French fluently, he may not consider it as his native lanaguage (there are other native languages in France like Occitan). The Ethnologue column is for the most recent figures from the most recent version of the source which is the 15th edition located at [14], and it is not for original research or other sources. Other sources can be used, but not for this particular column. The http://www2.ignatius.edu/faculty/turner/worldlang.htm source cannot be used for this column. It uses Ethnologue as a source, but it is a synthesis as well as it uses an older version. Haitian Creole is not French and cannot be included in the figures just as Spanish-, Portuguese-, and English-based creole languages are not included in those figures; they are classified under different languages. Kman543210 (talk) 01:30, 3 September 2008 (UTC)

Today, nobody is occitan mother tongue in France. Occitan is a French dialect spoken in the Middle Ages in the south part of France. At present everybody speak french and can learn the occitan as second or third language in France and everywhere else...
About 1968, France got 51,000,000 inhabitants in 1968, or when you read Ethnologue, it give 51,000,000 inhabitants for 2004 !!! Or France in 2004 = 63,000,000 and today in 2008 France = 64,000,000 inhabitants.
About Haitian Creole : Haitian Creole is a french dialect. If it cannot be included in french language, nevertheless french speaking as mother tongue in the world (Belgium, switzerland, Canada and Québec and other people in the world) is more than the stupid and ridiculous 64... given by Ethnologue !
When we read the position of French, we notice an obvious descrimination on this language. 19th is pathetic, truth place is almost in the 11th or 12th position ! Other sources give about 100,000,000 French speakers (as mother tongue) in the worldwide. 200,000,000 as second language and about 500,000,000 with a knowledge about french.
O.K. I change the last column to inform the reader on other sources with a more neutral point of view. Busway (talk) 15:34, 6 September 2008 (UTC)

South America ranking

Someone should reconsider the data that was used to provide the ranking of most spoken languages in South America. Brazil's population accounts for about 50,3% of the total population in South America (191,8 million in 382,4 millions, according to Wikipedia) and then you count about 400,000 million speakers in Paraguay and other thousands of Brazilians living in other South American countries. Instead, the Spanish-speaking population is very small in Brazil, as the main minority languages here are German and Italian dialects and perhaps Japanese, not Spanish at all.

Besides, I am pretty sure Aymará and Guaraní do have more speakers in South America than French and English, which are used by smallish countries with very few poulation among the 382 millions of South Americans. Instead, Guaraní has about 7 million speakers and Aymará 2 or 3 million native speakers. 189.13.30.247 (talk) 04:18, 7 September 2008 (UTC)

Automatic update of families ranking

Is it possible to create a second ranking, dynamically linked with the current one, which shows the families. For example :

- "Romance" = Spanish + Portuguese + French + Italian + Romanian = 652,46 millions.

- "Germanic" = English + German + Dutch + Danish + Swedish + Norwegia + icelandic = 444,310 millions...

JackPotte (talk) 22:34, 18 October 2008 (UTC)

Given the amount of subdivision in language families, I think a "rowspan" table would be a good idea, e.g.
Indo-European
1,096,770,000
Romance
652,460,000
Germanic
444,310,000
and all the major families are sorted by population, and the minor families are sorted within their major families by population. I'll test out the idea in a user subpage. Someone the Person (talk) 20:39, 10 December 2008 (UTC)

Recommend Archiving

Hello all. Would anybody be opposed to me archiving most of this talk page? It looks like we should keep the language section and sections created in 2009. Any others you would like me to keep visible? Let me know. – Novem Lingvae (talk) 13:39, 3 February 2009 (UTC)

Done. – Novem Lingvae (talk) 13:42, 11 February 2009 (UTC)
  1. ^ "ethnic population"; SIL divides what is considered "Hindi" by other sources into numerous sub-languages. SIL's "Hindi" is Kharboli only.