Request for comments/Help for starting project terminologies
I've noticed some languages have started to build their own wordlists. That is good and fine, but how it is done is not in the spirit of translatewiki.net. Like message documentation it should be a collaborative work where anybody can enjoy the results.
For this reason I'd like to start project, for which the aim is to build monolingual wordlist/terminology in English for (some) of our projects. This kind of terminology would greatly help translators in choosing correct translations. It is also much easier to translate premade terminology than build one from scratch, since more than a half of the work is already done (collecting the terms and defining them, maybe even adding some relationship trees).
We don't currently have software to work with this kind of data, but I think we can start with simple word lists. Having actual data here is a good driving force for creating applications that use it.
I think this is a great idea. Some sort of glossary is often much needed, and it's especially so now that people here can translate multiple project and don't know all of them well. What do you mean with "simple word list", though?
Some projects don't always use a consistent terminology and this doesn't help; we should probably try to fix this as well, and perhaps some simpler process is needed.
I think that it would be even more useful and important to define the "official translations" for each project and language, so that people know how to translate technical words and they use a consistent translation. This could be done perhaps choosing the most used words or group of words in the source language and then finding their most common translations with our internal "translation memory", leaving some space for manual correction and discussion.
This is a very good idea as it sometime can be very hard to maintain a consistent terminology in a project, especially when there are several translators working on it.
As for how it could be done, as I seen it the ideal way would be that each project had a terminology list in English that then could be translated to the different languages. And then be just as a kind of glossary when translating.
And what would be really nice is if the software automatically could detect words from the glossary, and show it together with the translation in the translation window.
Automatically listing glossary items in translation pages would require a complete list of grammar forms of each glossary item of the source language. We start with English, but there are translators using translations other than English as their source languages.
I was bold and started Terminology page. We should probably start by collecting existing word lists in this wiki (or elsewhere).
Sure. We could then see if there's some overlap and they can be combined in some way.
With regard to Terminology/mediawiki, it's really a shame that there's no MediaWiki glossary (apparently, only w:en:Wikipedia:Glossary and Wikibooks:Help:Glossary), and we should create one, but probably not here, rather on mw:, where it would be of general use; on the contrary, I don't think that creating a complete MediaWiki glossary in all languages is really feasible.
The glossaries we need here should include only those terms whose translation is not obvious: terms with a very technical meaning, or words without a true equivalent, which are hard to translate or end up being translated in a number of different ways; such terms are a subset of the complete glossary + the list of other used words (which may have a clear meaning in English but still be hard to translate).
We could pick up from the general English glossary and word list the words we think are hard to translate and start to make a huge table with a row per term and a column per language; perhaps not all languages will need to establish an "official translation" for all/the same terms, but if there's some overlap it will be useful (and should address Purodha's concern).
A glossary entry must list at least these things:
- a word, phrase or expression,
- a context (usually a software product or an extension or group of those, but may be an logical area in a program, or "unspecified", where the more specific context overrides the less specific in search queries)
- a language plus possibly an area and a sccript.
Do we need more?
Definition. Without definition it is not easy to spot overlapping and ambiguous terms and to actually makes the messages more clear. I also think we should start with monolingual terminology.
I agree that a terminology would be useful and I also agree that we should start with a monolingual English (or in future any other source language of a project) glossary for each project.
Eventually we will need to devise a way for each language to build a bilingual glossary out of the main monolingual glossary efficiently. Having the monolingual glossary to build on is going to cut out a lot of work in setting up bilingual glossaries and it will mean that language teams are not doubling-up on the work of defining terms. It will also mean that all glossaries will be compatible.
I think it will enhance the projects themselves too, to have a glossary for its users. If we can produce a good glossary here, then we should be able to return this to the project as a service to their users. Our task is to make or improve on the English glossary already made at the project, with advice from the project developers and being reviewed by them. Then, we can create bilingual glossaries of the terms themselves, for our own use. Optionally, we could also translate the definitions and supply the project with translated glossaries, if our translators are willing to translate the definitions also. Translated definitions are going to be useful here in the future in the languages which are used as fallback by other languages, if we acquire translators who work primarily with the fallback instead of English.
I amended Terminology with aims and guidance what we are trying to do and how.
I agree with having Definitions, at least where needed for disambiguations; and of course, they would not hurt otherwise. I agree to start with English. I do not mind adding other languages as soon as an original glossary entry is stable, but we should carefully design a method to to expand glossary entries
- without confronting users with an endless list of all possible language translations,
- allowing users to view translations in selected languages in parallel,
- allowing several equivalent translations, or synonymes,
- allowing definitions or explanations to be translated, too, so as to support translators who understand less or no English.
My first idea to create a glossary had been to set up a namespace "Glossary:" with translateable pages holding the expressions, and probably /qqq pages holding explanations, but that seems not sufficient now.
Just so we're all clear cause I really don't see it mentioned to clearly: The aim is that not only we have a list of the English terms, but that every English term has a translation to all the different languages that the translators can work from!
Yes, but a good monolingual terminology is prerequisite for that. No point translating them before that.
The problem with a monolingual wordlist is that it doesn't immediately encourage everyone's participation wiki-style. Perfecting a monolingual wordlist takes forever and will never succeed completely.
Defining everything in terms of a single language (English) also creates avoidable problems. Many English terms actually describe several different concepts depending on context. For instance, "to check" can have three meanings: 1. to mark, select; 2. to verify; 3. to examine, look up. In most (or all) other languages these are all translated with different words. So they should be three entries rather than one (or perhaps more; I might have missed something).
I don't think it's actually necessary to start off with a monolingual glossary and I disagree that translating is pointless before the monolingual glossary is complete. On the contrary, starting off multilingual would make problems such as the aforementioned example immediately obvious so they don't need to be fixed later.
I think this would be a good moment to mention OmegaWiki, which is basically a large-scale implementation of the above idea: each concept ("DefinedMeaning") has its own entry in the database with its translations ("expressions") in as many languages as possible. The key is that the central data unit is the concept a.k.a. DefinedMeaning, not a word in any particular language.
In order to make our central glossary we could either simply use OmegaWiki which is ready and functioning now, and already contains many computer terms -- or the admins could decide to install the Wikidata extension, which it uses to do its magic, here on this site so we can roll our own.
Even without using them or their software, I think OmegaWiki is a powerful example of how a truly multilingual glossary can work. If they can do it on that scale, surely a relatively minor thing like a software translation glossary is trivial by comparison.
Defining everything in terms of a single language (English) also creates avoidable problems.
That's why the terms should be defined.
I don't think it's actually necessary to start off with a monolingual glossary
We have to start somewhere, and I'm afraid that unless we direct attention we never get anything into useful state.
I think this would be a good moment to mention OmegaWiki,
I know about OmegaWiki, I even wrote my candidate's thesis about OmegaWiki and translatewiki.net. The conclusion was that it is not suitable for us. Terminology is different from what OmegaWiki is doing. And we are building a terminology, not a (multilingual) glossary. There are some ideas we could borrow, I don't deny that. Currently I'm exploring if we can take advantage of Semantic MediaWiki.
The problem with a monolingual wordlist is that it doesn't immediately encourage everyone's participation wiki-style. Perfecting a monolingual wordlist takes forever and will never succeed completely.
I also think that this is a bit of a worry. Would it help to break the job of writing the monolingual glossary, particularly the glossary for MediaWiki core, into sections of say 100 terms? After a monolingual section was done, it would be prepared and added to the glossary formatted for translation, in whatever format is decided upon. Once translation work is begun on the first section, we will also gain experience on how many revisions are needed to the definitions, and how often we can expect to have to split a term into two carrying different definitions.
Very good idea! Japanese has its translation guideline (draft) including en-ja wordlist for mediawiki and other project:
We need easier and more simple ways (MediaWiki Extension etc.) to make the language glossary.
I cannot see how else than in data base tables glossary data could be held so as to be useful for automated lookups, thus we shall need another extension (or an additional part of Extension:Translate) for their management.
Also, to make a glossary useful, we shall have to deal with grammaratical forms of words for those languages having ones. For exmple, in English, we have both "page" and "pages" which should technically share a glossary entry. In other languages, such as Finnish, we likely have a dozen, and in Turkish several dozen of such forms per word. As long as we translate from English only, this is not a huge burden, since there are not really many grammatical forms for most words, and the vast majority is following few simple rules. On the target side, at least for my work it would be feasible and useful, to have a list of all words translated via glossary with all their potential grammar forms that could appear in our contexts, which one could just click in the order needed to combine them into a translated sentence. --Purodha Blissenbach 04:08, 2 August 2011 (UTC)
Each noun in Finnish can have over 2000 different forms.
Purodha, if I understand you correctly, you are suggesting expanding the use of a glossary to partially automate the translation process. I don't think that this would be worth the effort to set up, and in the case of Welsh is not practical, owing to the complexity of the rules for initial mutation. The human translator can choose a verb form or noun form in a split second. So I don't think that we should develop our glossary/terminology much further than is already planned.
However, designing an easy way to access the English definitions and the agreed translations (the root forms of nouns and verbs) during translation work would be useful. I think that is what you are discussing in your first paragraph above.
Whilst discussion is still continuing on the design of suitable software for the glossary/terminology, I will continue to contribute to the building of the English glossary in its present table form, on the assumption that all the raw material, when edited and agreed by consensus, can be converted to its final format when that has been set up.
Well, yes, you are right, I was looking at supporting the translation process by some options to click instead of type, similar to what we have alerady with TM (Translation Memories)
This would be hardly helpful, when the number of possible choices becomes too big. For my work, they're usually limited to three to five for nouns, which is just easy to handle. My intention was not, to ask for so much at this time, but rather keep something in mind as a potential in the future that should not unnecessarily be sacrificed too easily.