Вікіпэдыя:Два альфабэты

Зьвесткі зь Вікіпэдыі — вольнай энцыкляпэдыі

Гэтая старонка прызначаная для абмеркаваньня пытаньня аўтаматычнай падтрымкі дзьвюх розных альфабэтных вэрсіяў (кірыліца і лацінка) нашай Вікіпэдыі.

З-за таго, што значную ролю ў пытаньні ўбудаваньня такой падтрымкі ў MediaWiki граюць людзі, якія ня ведаюць беларускай мовы, абмеркаваньне пажадана весьці на ангельскай мове.

first of all, sorry for my bad english, I'm not a linguist :)

This page is about feature that can allow coexistence of two automatically-converted alphabet version of be.wikipedia; this should allow reading and editing articles in both cyrillics and latin alphabets. Also it can become useful for many other bialphabet languages (like Serbian, Tatar, Ukrainian etc).

The test site that shows some basics of both-side translations is already online: test site, thanks to User:Zhengzhu. If you see any obvious mistakes there, don't hesitate to mention them.

Some useful points.

  1. There are different writing systems in Belarusian; and all of them at least nominal are allowed in be:. But here is the difference: while taraszkevica and narkamauka couldn't be easily converted to each other, lacinka is fully convertable. So I thought it should be cool to have Lacinka version of whole Wikipedia for those who prefers this script, and for promotion of Lacinka itself (I myself prefer Lacinka to other systems, but it is unfortunately underused).
  2. There is already a feature like one I wanted to see - zh.wikipedia has some convertions between Simple and Traditional scripts. So I thought it shouldn't be difficult to implement one for be:. After I put my suggestion on wikitech-l mailing list, User:Zhengzhu (creator of zh: convertion system) responded, and we began to discuss the issue. First result is test site with some conversion rules applied.
  3. So the problem of reading in Lacinka seems to be solved (mostly); the most interesting problem is implementing of editing feature. It is not so simple and needs to be discussed.

Here are some points of this conversion feature which I thought of and which I want to stress, and if anyone doesn't agree please discuss.

  1. First of all, nothing should be changed for those users, who prefer cyrillics. Everything everywhere on the site should stay like it was, if they don't switch to latin. According to this point, other points are built.
  2. Nothing should be changed in database.
  3. Those, who prefer latin script, should be able to switch to it on each page and in their user preferences. After switch they should be able to see no cyrillics at all on the whole site, everything should be converted.
  4. When someone clicks "edit" on latin-converted page, he should get converted to lacinka text in edit box, be able to write smth in lacinka, then click "save", after which text should be converted back to cyrillics and saved to database, so for others there is no difference in which script he edited the page. The main problem with this point is about initially latin strings in articles, but it is surely solvable.
    1. How Belarusian text could be distinguished from none-Belarusian in generic way? Sure we could use templates to limit conversion algorithm scope, but in current situation? Same for Cyrillic text on other languages (like Russian, Ukrainian) - it must remain Cyrillic during edit. --EugeneZelenko 14:45, 12.04.2005 (UTC)
This is only my speculation: Most cases should be distinguishable using a Belarusian word list. Those that can not be distinguished this way (hopefully only a small number) can be handled by special markups in the source text.
For editing, the way the Chinese wikipedia works right now is to store the text in mixed form. If this modle is copied directly to Belarusian, that means people can edit either using Cyrillics or Latin or both, and there will be no conversion when writing to the database. The conversion only happens when the page is rendered. But Monk has a different opinion on this and he thinks the text should all be converted to Cyrillics before writing to the database. I have no knowledge about the language itself, so I don't know which one is more appropriate. -Zhengzhu 16:21, 12.04.2005 (UTC)

I hope these points will be enough for everyone not to be against the feature at all; if you don't want it - you may not notice it.

Here are some suggestions of how to implement it. These are quotes from my e-mail correspondence with User:Zhengzhu (he agreed to make it public). These suggestions are to be discussed to reach concensus.

  • About how everything should look like. I think, all data should be saved in cyrillics (in database, and messages in language file; So no language-be-cyrillics and language-be-latin files will be needed). If user asks for cyrillic page (by default), page is shown without any conversion. If he asks for latin page, everything visible on the page is converted to Latin (including title, content, system messages, links names - but not including links themselves, template names and so on). If user on Latin edits page, it should show him converted to Latin version (everything - including links, templates and so on - imagine he has no cyrillics support at all). When he presses "Save", the page goes through some process of latin-to-cyrillics conversion (not converting interwiki, initially latin parts and so on; there is a problem how this could be done) and is saved in the database. -- Monk
  • At http://zh.wikipedia.org/ you'll notice three additional tabs at the right-hand of the top tab bar: "不转换" for unconverted display, "简体" for simplified conversion, and "繁体" for traditional conversion. These conversions don't change the source text or alter editing; it's for rendered page display and user interface only. The implementation also supports a markup that allows manually specifying the conversion of a specific word/phrase in situations where the mapping table is not sufficient. For example, -{zh-cn:foo; zh-tw:bar}- will show up as "foo" when "简体"(Simplified Chinese) is selected, and "bar" when "繁体"(Traditional Chinese) is selected; -{foo}- will show up as "foo" no matter what language variant is selected (this should be sufficient for things like quotes, math, etc). There is also a way of customizing the mapping table through a page in the MediaWiki: namespace. -- Zhengzhu
  • My suggestion (repeat, and clarify): when user is on latin, and clicks "edit", he sees "cyrillics-to-latin-for-edit" converted text in edit box. This conversion differs from just cyrillics-to-latin one by one major thing: it marks every initially latin string with special marks (for example " [[<lat>en:Interwiki</lat>]] "). At the time of saving parser does "latin-to-cyrillics" conversion, which converts everything except symbols between marks, removes marks and saves cyrillic text in database. -- Monk
  • Say I am editing in latin, and I write something about IBM the company which does not appear in the article before. When saving the article, there has to be some way to tell the system that "IBM" does not need to be converted to cyrillics, maybe using the mark that you suggested. There may be other problems like this and I want to make sure most if not all are considered before we committ to a specific solution. -- Zhengzhu
  • At ZH we have the same concern about "harmful" change, which is one of the reasons why we chose to store the source of the article in mixed form. The concern was this: if some conversion is made before writing to the database, there is a possibility that the conversion can be wrong (either there are cases not considered when building the conversion table, or the code is buggy). So what is written to the database may not be what the contributor have intended. -- Zhengzhu

--Monk 16:30, 11.04.2005 (UTC)