User:Ziko/Handbook-General

From Wikipedia, the free encyclopedia
Handbook of Multilingual Wikipedia



Part A. General Issues

Part B. Language Editions:
B 1 Indoeuropean Languages B 2 Other Languages

Part C. Auxiliaries


This part of HMW (title page) deals with general issues about the language editions of Wikipedia, not with single language editions as such.



Conditions under which a Wikipedia language version exists[edit]

The Free Encyclopedia Wikipedia is available in over 250 language editions, although the term "edition" may be misleading. It is not a fixed-text corpus which is translated into other languages, but over 250 individual Wikipedias. Although the basic concept is always the same, the variant rules or practice lead piecemeal to significant differences. The variations exist in both the content and the articles on various topics.

A language edition comes about through those interested in the Wikimedia Foundation making a proposal. An examination by a committee determines if the language is at least in principle deemed "eligible". Then, in a separate website, the Incubator, the interested parties must demonstrate their ability actually to create encyclopedic content. Only then will the green light be given for the language edition as the ultimate decision on its establishment is made by the Board of the Wikimedia Foundation.

Wikipedia is something of a sociolinguistic experiment: Provide a linguistic community with free web space and the concept for an encyclopedia, and you see the results. The strength of a language version may also say something about a language community's text production capability.

The incipient Wikipedia's language community[edit]

The first condition for the existence of a language version is the existence of a language community. The following factors support the growth of a Wikipedia:

  • a large number of speakers
  • many speakers with a high educational level
  • favorable economic conditions
  • leisure time
  • widespread access to the Internet, preferably at home via a flat-rate pricing model
  • an encyclopaedic tradition, which means the existence of role models and familiarity with this means of presenting information; the particular language is already being used for purposes similar to those of Wikipedia
  • freedom of speech

By itself, a language community's total number of speakers means little. From the total number must be subtracted those people who cannot contribute to a Wikipedia at all: minors, some sick or disabled people (for example, dementia patients), prison inmates and the illiterate. Furthermore, it requires a multitude of skills, such as confidence in one's own language skills, computer literacy and some social skills. Moreover, not all people show the same eagerness to contribute, unpaid, to a free encyclopedia. This considerably reduces the percentage of people who might participate in the project. This group with "Wikipedia potential" should be the target group for Wikipedia promotional activities that try to recruit new authors.

The character and resources of a language community will determine its "Wikipedia potential." If we first consider a language such as German, with a large and a well-educated population of native speakers utilizing a standardized register of the language, then the Wikipedia potential of the following cases, in terms of their linguistic communities, may be proportionately less or more:

  • With Third World languages, the Wikipedia potential is very small, because many people who might have liked to participate will lack both an Internet connection and the necessary leisure time (among other factors).
  • Dialects - especially, for example, those spoken in dialect regions of Germany where many people have less formal education than average. Moreover, the speakers of the dialect form are unused to the formal task of employing their dialect to compose an encyclopedia, so their Wikipedia edition will also have a smaller potential.
  • Planned or constructed languages and Latin: Self-selected communities which one voluntarily joins, out of linguistic interest, are generally more supportive of educational advancement than ethnic communities; accordingly, their Wikipedia potentials are great. Therefore, these Wikipedia editions are often more advanced than their rather small language communities might lead one to expect.

Law and technology[edit]

One of the Wikimedia servers in Paris

The operator for all language versions of Wikipedia is the Wikimedia Foundation, a charitable organisation headquartered in the United States. It collects donations essential for the operation of the server computers. This frees the emerging Wikipedia Communities from a great deal of work and many concerns.

Principles of Wikipedia[edit]

A language edition is expected to conform to the principles of the first (English-language) Wikipedia. Occasional rule violations are apparently not punished, though. The main principles relevant here are:

  • Wikipedia is about creating an encyclopedia, nothing else; it is a collection of certain content in a certain text type.
  • The contents of the Wikipedia (texts, pictures, etc.) must be "free", which requires a "free license". Traditional encyclopedias, however, claim copyright.
  • The ban on "original research" should actually be already covered by the characteristics expected of an encyclopedia.
  • This also applies to the principle of neutrality, but it is good to see it explicitly declared.

Linguistic delimination[edit]

When applying for a language edition at Wikimedia Foundation, it is important to define exactly what language the new language edition will deal with. Especially in the case of languages without adequate language planning, the naming of the edition or the introduction of specific standards can exclude or integrate varieties of the language. For example, nds.WP is meant to cover all of Northern Germany where Low German is spoken (historically, in fact). The adoption of a certain orthography, however, excluded a linguistic dialect widespread in Westphalia.

Wikipedia community[edit]

Israeli Wikimedia meeting, Herzliya, 2006

As a Wikipedia community the Wikipedians define those who are involved in the language edition concerned; you can also speak in the plural, of the communities of the different language editions or the sister projects. A precise definition does not exist.

Theoretically, you could equate entry into the community with the act of registration. This would exclude unregistered users, justifiably, because the vast majority of them make just a few edits. For affiliation with the community of a language edition a certain regularity of participation is needed; for example, at least a weekly look at one's own user talk page. A more precise definition for a regular user is still pending. One suggestion:

  • Someone who has made at least one edit within a week;
  • whose first edit was at least six months earlier;
  • who has made at least ten edits in total (based on Erik Zachte's definition of "Wikipedians" in the Wikimedia statistics);
  • and who is in fact able to speak that language, as indicated by the edit, by information on the user page or through discussion contributions.

It is also necessary to talk about the nature of participation, whether only edits count - and, if so, what edits - or whether activities outside of Wikipedia are meant too (for example, lobbying and promotional work). Tasks of Wikipedians or Wikimedians include:

  • The actual composition of articles,
  • The proofreading and incremental improvements,
  • Formatting and organization,
  • Choice of most specific categories appropriate to the subject matter,
  • Insertion of suitable images found in a search of Wikimedia Commons or after uploading their own photography or vector illustrations to the Commons,
  • The creation or improvement of help pages, etc.,
  • Assistance to new Wikipedians through the mentoring program and answering other questions (support team),
  • Promoting and enforcing "law and order" - for example, by blocking vandals or through refrral to arbitration,
  • Public relations, in the sense of making known, through Wikipedia, the influence of public opinion or of important interest groups,
  • The recruitment of new Wikipedians, and
  • Other activities to improve Wikipedia or to further the free growth of knowledge.

At the other end of the chain of thought is the exit from the community through death, exclusion (being banned due to rule violations) or withdrawal by choice or prompted by a change in life circumstances. Apart from a core of permanent users, a significant fluctuation in the Wikipedia community could be expected.

Inner differentiation of the community[edit]

A more detailed study should pay attention to the internal differentiation of a community. With regard to language versions, and especially smaller ones, the mastery of the language is of great importance. That leads to the division into Members of the Linguistic Community on one hand and Foreign Helpers on the other.

Members of the Linguistic Community master the language concerned so well that they can contribute with encyclopedic content, the business core of the community. If necessary it is possible to distinguish between native, second language and foreign language speakers.

With regard to second and foreign language speakers one can have a look whether an idealistic motive plays a role, such as the support for a Wikipedia in a lesser resourced language. That makes the difference between a German living in East Africa shows a special commitment to the Wikipedia in Suahili, and a German living in the Netherlands who sporadically contributes to the Dutch language edition. The foreign language speakers are basically members of the linguistic community, even if they leave more linguistic errors compared to native speakers. They are also capable to participate in discussions.

Users GerardM (left) and Siebrand, WCN 2008 in Utrecht

Foreign Helpers, however, do not or barely speak the language concerned. The border lies between language level 2 and 3. They appear either sporadically, mainly to contribute to a special kind of content: they add web links or photos (often made by themselves), something that does not requires language skills. Other Foreign Helpers are active in a particular area, which makes them leave edits in many language editions. Well known Foreign Helpers are Siebrand with his Siebot that automatically sets interwiki links, and GerardM, who is inter alia busy with the translation of MediaWiki expressions.

Comparability of language editions[edit]

For comparisons of Wikipedia language editions mostly used are the electronic statistics made by Erik Zachte (here at the state of January 2008, unless otherwise indicated); they can be regarded as the official statistics of the Wikimedia Foundation.

Erik Zachte on the WCN in Utrecht, 2008

But the statistics are not readily suited for comparisons because it is not easy to draw conclusions. There is the general difficulty to draw from purely informetric data conclusions about the encyclopaedic value of a text.

The number of articles of a language edition are mostly used then editions are ranked; that is why some Wikipedians are eager to raise this number by creating articles in an artificial way. Therefore, it would be more than questionable to use such data unreflectively for comparisons.

So-called bots make it easy to raise the number of articles. A bot is a small computer program with which edits to a Wikipedia can be carried out automatically. A bot, for example, helps with technical formatting or the correction of typos. The typical tasks of bots include the linking of an article with its counterpart in an other language edition, such as "Elephant" in the English Wikipedia with "Olifant" in the Dutch one. These links are called interwikis. Bots can not only be used for secondary cleanup work, but even create articles ("bot-generated articles") schematically based on database information. Such serial articles are well suited to a raise the number of articles.

In the statistics, there is a feature that shows the share of bot edits in the total number of edits per Wikipedia. For small Wikipedias this share tends to be quite high, often with forty or fifty percent or more. This may be a consequence of the interwiki linking. Generally, however, the bot edits flow into the statistics together with human edits so that you cannot see what is the result of human or bot activity. This is a general problem of the Wikimedia statistics (the Poplar Bluff syndrome, see below).

Articles[edit]

A Wikipedia article is a technical unit, which can quite good be counted as such. The number of articles is used as a main or sole criterion for ranking purposes.[1] Erik Zachte on his Statistics pages stresses that the number is not so important because some Wikipedias have only very short articles, other Wikipedias have less but longer articles. He also mentions the problem of "bot generated articles". That's why he lists up by the number of internal links,[2] which actually does not solve the latter problem because pseudo articles have internal links, too.

The "bot generated articles" are created automatically based on information from databases. Typically the users behind those bots create articles about cities or countries. Such articles receive very basic schematic information about the town, mostly in one sentence and an infobox.

Because of this shortness one speaks also about "geographical stubs", although a "geographical stub article" can also be a man made and encyclopedical article. Besides that, the idea of stub says that the article is expandable and will eventually be expanded. This serves usually as an excuse, because the user must be aware that the development of such a bot created stub is at most a matter of the very distant future. Therefore those articles are here called mini articles rather than stubs.

"Poplar Bluff" and the discussion about the Volapük-Wikipedia[edit]

A variation of the geographical mini article is a much longer and at first glance obviously natural article, but which was also created automatically by a bot. The, once again, completely schematic information was processed into whole sentences: "[name] is a city in [County, Texas], [state]." Dazu gibt es Kapitel mit geografischen oder demografischen Grunddaten. Since the census of [year], the total population includes [number] inhabitants." There are sections or chapters with basic geographic or demographic data.

A lot of such articles can be found in the Wikipedia of Volapük, a planned (or "constructed") language that is understand by maximal thirty people in the whole world. Based on the article Poplar Bluff (on a small town in Missouri) one can speak about articles of the "Poplar Bluff"-type.

Geographical mini articles and Poplar Bluffs were discussed especially in autumn 2007, on the cross-project Wikimedia site Meta Wiki. On September 21st someone had called the closure of the Wikipedia in Volapük because a Volapük Wikipedian had created one hundred thousand articles of that type. A second proposal, of December 25th, requested the deletion of these articles.

Both times many Wikipedians supported Volapük-Wikipedia, including the Low German Wikipedian Slomox, who claimed that in the vo.WP there are also many good articles. Slomox himself had in his Low German Wikipedia created geographical mini articles and Poplar Bluffs en masse.[3] The same applies to the discussants Maksim (Esperanto), Chabi (Catalan) and Eukesh (Nepali).

Wikipedia founder Jimmy Wales commented on January 1st 2008 that the vo.WP does not support the mission of Wikipedia: Provide the sum of all human knowledge to everybody. Finally, there is nobody who speaks fluent Volapük. The creation of articles by bot also does help people who like Volapük. It can be recognized that vo.WP does not serve readers, but writers. He would not welcome closing the language edition but the deletion of these articles. Anyway, vo.WP should not appear on the ranking list of the biggest Wikipedias.[4]

Pseudo articles and their dangers[edit]

Besides geographical mini articles and Poplar Bluffs, there are other types of articles, whose encyclopaedic character is highly questionable. Types of other pseudo articles are:

  • One-sentence articles that are more appropriate for a dictionary. Example: „En Leicht iss en Funeral“, the content of the article Leicht in the Pennsylvianian Dutch Wikipedia (pdc.WP).
  • Other articles similar to data base entries, e. g.: „De 02363 eß de Tellefoon Vüürwaal fun Datteln, en Wëßfal,“ in the article 02363 of Ripuarian Wikipedia (ksh.WP), the dialects around the German town of Cologne. Other examples would be series of chemical elements or small astronomical objects.
  • Articles in the wrong language. Example: The text of the article Varsseveld (a village in the Netherlands) in ksh.WP (Varsseveld) was simply copied from of the Dutch Low Saxon Wikipedia (nds-nl.WP). More common, in some Wikipedias in languages of the Third World, are texts copied from the English Wikipedia. These articles were probably created in the hope that someone will translate it into the language concerned.
An example of vandalism at Wikipedia

Bot-generated serial articles, whether about geographical or other subjects, may make sense in the initial construction phase of a Wikipedia, so that users will be freed from monotonous work. This is particularly true for articles about cities in the language area of the particular Wikipedia language edition community items for communities in the language area. It would therefore be wrong to suspect too hastily manipulative intentions. But especially the massive creation of articles about cities outside of that language area is likely born by a false prestige thinking of some Wikipedians.

The danger of pseudo articles is that they make no good impression to outsiders, they can deter potential authors. Moreover, not only the creation cost work, but also the aftercare, such as upgrading and the removal of vandalism damage.

"Real" articles[edit]

Articles with encyclopedic content and certain minimum requirements are called "real" article here. Anhand der Funktion Zufälliger Artikel kann man ein Sample erstellen und den Prozentsatz „wirklicher“ Artikel pro Sprachversion errechnen. Based on the Random article feature, you can create a sample and the percentage of "real" articles per language edition can be calculated.

In a small study by the author, with over fifty languages, it seemed, in March / April 2008, that in this way most language editions lose at least ten percent of their articles. For large Wikipedias this value is rather small, but for smaller often quite high. In the Corsican Wikipedia it was even ninety percent.

Notes[edit]

  1. ^ Wikimedia Statistics, last seen 2008-03-30.
  2. ^ Wikimedia Statistics, last seen 2008-10-17 (quote below the chart).
  3. ^ meta:Proposals for closing projects/Closure of Volapük Wikipedia, last seen 2008-05-15. On nds.WP Slomox in January 2007 had announced the creation of articles by bot and did not meet any resistance.
  4. ^ meta:Proposals for closing projects/Closure of Volapük Wikipedia, last seen 2008-05-15.