Portal talk:Ku/LiquidThreads
- [View source↑]
- [History↑]
Contents
Thread title | Replies | Last modified |
---|---|---|
Grammar demo | 0 | 15:13, 9 January 2023 |
GRAMMAR | 13 | 19:33, 26 December 2022 |
Portal talk:Ku-latn | 1 | 23:04, 30 November 2022 |
Amûra Termînolojiyê | 1 | 23:30, 26 November 2022 |
Gotûbêjên Guhartin/Guhertin | 0 | 10:47, 21 November 2022 |
MagicWords | 0 | 01:57, 21 November 2022 |
Mobîl | 1 | 02:15, 26 December 2020 |
Li ser wergerên "device û tool" | 0 | 13:41, 16 June 2018 |
OpenStreetMap | 0 | 15:43, 15 January 2017 |
Toolserver Tools - Kolossos Kml On Openlayers | 0 | 20:09, 27 May 2012 |
Silav @User:Ghybu. Jon Harald Søby demoyek li ser patchdemoyê [1] çêkir. Fikrê te çi ye? Tiştek divê were guhartin heye?
P.s: Tenê definite/singular û feminine hatiye çêkirin. Me tiştek bo yên nêr nekir.
@Balyozxane: Erê, min ê kiribûya lê min ji bîr kiriye: bnr. [1]
- Lêker
Ji bo lêkeran, tiştekî vitira hewce ye, wekî, Template:Gender:
{{lastletter:$1|Text for consonant letter|Text for vowel letter|Optional text}}.
Mînak: Ji bîr neke ku ev tenê pêşdîtineke [[:$1]] {{lastletter:$1|e|ye|(y)e}}.
- Navdêr
Ji bo navdêran zanîna "tîpa dawî" û "zayend" jî pêwîst e (bnr. modula Wîkîferhengê):
Bnr [2].
Lîsta $1(y)ê hat jêbirin.
: li vira zayenda parametreya $1 pêwîst e ji ber ku bo navdêra nêr "$1yî" ye,
wekî Lîsta $1{{Gender:$1:|(y)î|(y)ê}} hat jêbirin.
Ji bo navdêran li ser phab: çêkirina/sererastkirina $wgGrammarForms (Grammar) hewce ye, bnr:
- https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/languages/data/grammarTransformations/
- https://doc.wikimedia.org/mediawiki-core/master/php/LanguageOs_8php_source.html
- Encam
Şabloneke vitira hewce ye:
{{Grammar:def/indef-case|$1}}
def
: definite (en)/binavkirî;indef
: indefinite (en)/nebinavkirî- case (en)/rewş:
nominative
/navkî,construct
/îzafe,oblique
/çemandî,vocativ
/bangkirin
Mînak:
- ($1: mê, yekjimar)
{{Grammar:indef-construct|sêv}}
--> sêveke - ($1: nêr, yekjimar)
{{Grammar:def-oblique|gund}}
--> gundî - ($1: nêr, pirjimar)
{{Grammar:def-oblique|gund}}
--> gundan - ($1: nêr, yekjimar)
{{Grammar:indef-oblique|gund}}
--> gundekî
- Pirsgirêk
Zayend û jimara parametreyan ($1, $2, ...) tên zanîn?
It's a common case in translations that we don't provide the grammatical gender or grammatical number of translated items, along with their translations (the grammatical number may be different from what was in the original source in English).
When they are later reused to compose other messages containign them in a varaible, nothing is transmitted (this does not occur with users' gender which has a specific support, provided that these users have registered a local account with their stored preferences).
Similar problem occurs when translated item have some beginings/endings; when they are used to compose other messages in sentences, we may need to contract the surrouding words, possibly remove spaces when the contraction needs to uses an apostrophe (e.G. in French, or Italian, but sometimes as well in english), or even mutate a part of that term given in the value of a parameter.
More generally we have no way to translate a term not just into another term, but to a term with additional properties (gender, plural, case, initials/finals treated specially), or to a set of contextually selectable variants (that the "GRAMMAR:" parser function could recognize to select or generate the appropriate form). Those additional properties (or specially tagged translations having several contextual alternatives required) are not transmitted: all translated items are currently "opaque". We have no way to add variants or properties that a true linguistic engine would need.
The assumption that a translation of any term from language A to language B is necessarily unique is false. For now we need instead to ask to each project to add more translations units to be translated separately, hopping that the target software will manage them correctly (this is already the case even for users gender, and even when using numbers wotih plural rules).
A generic solution would be to be able to add, for each translation unit and for each translation language, the possibility to define additional properies or variants, and being able to tag them properly so that they cecome selectionable (the current support with "PLURAL:" and "GENDER:" is too limited, and with "GRAMMAR:" it is often impossible to implement correctly without make false guesses, or using a possibly large dictionnary that will be difficult to maintain).
That's a problem that will be interesting to solve with the future multilingual Wikipedia (with help of new modules in Wikifunctions, or with Lua modules for Scribunto in specific wikis), that will offer such possibility. But that should also be integrated in the "Translate" extension of MediaWiki. For this to work, we need to be able to define suitable "tags" in each language for selectors of variants. The number of tags may be open (not the same from one lanuage to another); these tags exists and are used in some Wiktionnaries for identifying its entries. Such thing has been attempted separately in Wikidata for tagging and identifying the various "forms" that a "lexeme" can take.
And we should send and hear some comments and solutions about this frequent problem to the Unicode CLDR project for internationalization libraries (CLDR has exactly the same problem but proposes no standard solution at all for now; this would require updating some LDML specifications and increase its own data model). For now only Wikidata propose something that will be integrable (but for deployment in applications, they need stability of the "tags" they use, because they won't query Wikidata each time: we need a set of stable tags for each language that can be used to select variants, whereas translated message bundles will possibly contain multiple selectable variants for the same message).
Some project may also need to provide their own subset of tags, but will need to document them, so that translators will know how to tag the needed variants, and if variants dependant of some tags are required, or optional and can be infered rom other defined variants (this is already the case for plural forms, which is just a more specific subproblem, but which is only partially solved!) and add the possibility for each variant to return not juqt the test but also some additional tags to allow a safe reusability.
Note that I have alsready submitted such concepts to other projects, including for the future Multilingual Wikipedia and Wikifunction, in CLDR, in Wikidata with their new lexemes, and in Wiktionnary. Definining a sommon set of tags (and mantaining a regisry of these tags to preserve their reusability) is something important that should be considered in the future for internalization and translation.
These linguistic "tags" may also have other uses than being purely grammatical. They could carry contextual information, as well as user preferences (such as the "formal", or"informal" language register, or a "scientific" vs. "vernacular", "simplified", "slang", or "obsolescent/old", with more specific and possibly ordered preferences depending on the targetted audience/age), or other stylistic information (e.g. to avoid the repeition of some words or long expressions, by allowing them to be abbreviated by pronouns or shorter expressions, possibly abbreviations, or to allow the use of alternative but equivalent terms for example to avoid aliterations and get a more "fluid" and "natural" language that a human would prefer to use and read). And in fact, these tags would also provide some additional documentation to the source messages about their intended use (e.g. as a verb for an action to be taken or as a noun) and some other technical restrictions (such as max length, format, HTML allowed/disallowed, restricted subsets of characters for example in identifiers)... All these use cases require being able to add variants. and being able to select them contextually and coherently.
We, humans, do that naturally and constantly in our languages, without thinking much about this, but they follow some language-specific logic that CAN be computed using productive rules (if "tags" are standardized and maintained in some register like Wikidata). And in fact, if you've followed the proposals for the future multlingual Wikipzdia, the proposed syntax will require using a large set of "functions" that will need to be named or to use such named tags in parameters, and will be able to handle and reduce posibly large sets of possible variants (including automatically derived terms and exceptions): the source text will not be English, but written in a metalanguage entirely made of such metadata tags, and the first thing that a translation engine usually makes is to try guessing that metadata form. Human linguists do the same thing with they analyse the text and "tag" them with semantic or grammatical meta-information.
And we all do that kind of analysis (most often unconciously in our "native" languages) to juge the "quality" or "beauty" of any spoken or written text (poests, famous writers, humorists, singers and other artists are expert at doing this analysis conciously, with hard work when trying improve their texts and carry their intent or emotion or preception/view of any topic; politicians, merchants, advertizers, and recruiters are expert at doing this for their own goals).
Yes that's what I thought: gender and number are not always given.
On the other hand I think it is possible to create a model like "Gender" (a new MagicWords?) which tells you if the last letter of a word is a consonant or a vowel and to act consequently.
- The magic keyword "GENDER:" is already taken but for a limited usage: its's first parameter is a username on the local wiki (and by default it is the user reading the page, whose human gender may be unknown or volutarily chosen by that user to be neutral). It is completely unrelated to the grammatical gender (which may even follow other rules, e.g. for non-humans, unanimated concepts, or for reasons of style/politeness/respect or irrespect, or that sometimes changes depending on plurals or uncountability).
- We have a keyword "GRAMMAR:" whose first parameter is already a "tag", but that doesnot accept list of tags, some required that may be needed, some other optional to alter the renered variant).
But we have NO way to return BOTH the text of the translated variant, AND classification tags on output; only one input tag is accepted.
- The magic keyword "PLURAL:" has no input tags, it only takes a numerical value on input.
Returning tags only on output will be inefficiant: it requires returning all possible variants, each one with their own tags for external selection, or return a some functional object that will be able to generate the selection without having to enumerate all possible variants. For common cases, like derived terms in the conjugation of verbs, or grammatical cases, and more complex clases like German verbs with their "detachable" particles, or pronominal verbs, we need input tags. Output tags should be used for the generated sets of variants. then we'll need some orchestrator to reduce the sets of variants and generate the calls of functions that will make the selctions. But for translators this is a complex task to do: we need to allow them to just being able to add variants and complement each output with one or more tags that can can easily select form a set of known tags; and these tags must make sense to human translators (so tag names must follow some known and documented convention). That's not somethinb easy to design in a snigle basic "magic keyword" in MediaWiki, except for very limited use cases.
IMHO, using tags on input rather than output will be more powerful and will offer a simpler interface. And making all tags on input requires using a "functional" syntax, but with possibly unordered parameters (the input would just be some unordered set of tags. Adding variants for a given message to translate just requires each added variant to have distinct sets of tags, including the empty set which would be the default translation (that should be always usable in isolation, e.g. as items in a bulleted list or in an index, but not as a "title" or section header as it adds specific some usage context requiring its own generative tag fior example for its capitalization).
If we accept the fact that "tags" cannot be named with any space in them (only alphanumeric or hyphens), then such solution is implementable in MediaWiki using the existing "GRAMMAR:" magic keyword as:
{{GRAMMAR: |tag1 tag2 = variant1 |tag1 tag3 = variant2 | ...}}
and it would be important to remember that the order of those tags is not significant (so "tag1 tag2" or "tag2 tag1" are equivalent. Optionnally some wildcard may be used but it would complexify the task for translators. Tags should be wellknown, documented and not generated as they want, so that each one can be validated.
But this new syntax (only to be used in the middle of a translated message, may as well use another separate keyword like "SELECT:", "VARIANTS:" and so on.
For the case of variants making the whole content of the message, we don't necessarily need any syntax to be exposed to translators, as they may just have a "+" button to add variants and an extra field where they can add suitable tags (the validator would just have to check that sets of tags are distinct for each added variant of the message, and would store them in some "canonical" order). And the UI may offer facilities to easily select "known" tags from a repository appropriate for each language, so that translators have less difficulties to use select and use them. That syntax would be hidden.
Any translated message (even if it has a single variant listed) could have one or more tags added to them (including the single default variant): those tags would then be able to provide the needed semantics that also allows generators (like conjugators for verbs) to work with them. Internally each variant stored would be used as a pure functional object, capable to use tags given on input and the defined text of the variant, to generate the simple text output. Basic i18n libariies or applications would ignore the input tags and would work only with the defined text (but would loose the semantic of that text if they use it to generate other texts).
The main difference is that stored translations would not longer be opaque texts
@Ghybu: Yes, that is possible, I have done so for genetive GRAMMAR in Norwegian; when words are put in genitive in Norwegian, we just add an "s", unless the word ends in s, x or z, then we add an apostrophe instead. It should be easy to do something similar for Kurdish where the output depends on whether the final letter is a vowel or not.
There are some complexities: the content you test for a final s may be formatted (so at end of the you may find HTML tags around the text, or some image, or some existing apostrophe-quotes for the MediaWiki syntax of bold/italic styles). The last displayed character in the content may also be some punctuation, or it may be hidden because that content was generated or formated with a template transclusion or function call in a module (and that template or function may also not provide the metadata for the grammatical or lexical semantics of what it returns with the same template call: the template or function would have to do that change itself, other wise the parsing will be complex and may be faulty).
Here again we fall on the assumption that a single (wiki)text result is sufficient. But then where do we store and return the necessary metadata that allows doing correctly further processings? This could be in the same (wiki)text, however this requires defining an encoding syntax for that (could be some hidden tags that get stripped at end of processing, or some JSON or XML syntax, magic keyword, along with some escaping mechanisms for safer encapsulation... as long as further processes can handle it)
Silav User:Ghybu, dema bi amûra Project:Terminology gadget gotûbêjekê didî destpêkirin, gotûbêj diçe Portal talk:Ku-latn, gelo em rûpela ku-latn bi kar bînin an jî ji kesekê re bibêjin bila vir bê bikaranîn bo ku-latn?
Silav @Balyozxane:, rûpela daxwazan li vira ye: Support. Bi rastî koda zimanê kurmancî serê min diêşînî, li ser înternetê tevlîhev bûye: ku, ku-latn, kmr. Çend caran "ku" ji bo zimanê kurmancî ye çend caran jî ji bo zimanê soranî ye (tîpên erebî). Bo mînak li ser geroka Google Chromê "lang=ku" tîpan li gorî zimanê soranî sererast dike (arabic font) ji ber vê yekê li ser Chromê tîpên kurmancî kirêt derdikevin.
Pirsgirêkên din:
- Herwiha kara koda "ku-Arab" çi ye? Kurmancî tenê bi tîpên latînî tê nivîsandin...
Bi min koda zimanê kurmancî tenê wekî "ku" were bikaranîn an jî wekî "kmr"? "ku-latn" çi ye, nizanim: kurmancî? an ji soranî bi tîpên latînî?!?
Silav, Amûreke bi navê Terminology Gadget hatiye çêkirin, ji bo bikarhênerên nû çêkirina lîsteyeke termînolojiyan dê gelek bifêde be. Loma bi ya min hûn jî bi kar bînin. Li vir tê çalakirin.
Ping: User:Ghybu, User:Bikarhêner, User:Guherto
Gotûbêjên li ser vê wîkîpediya, wîkîferheng
@Balyozxane and Bikarhêner:. Silav ! Peyvên taybet (wekî MagicWords) li ser Phabricatorê tên sererastkirin. Binêrin [1]
@Bikarhêner, Balyozxane, and Guherto: Bi dîtina min wergerên mobîlan xuya nakin, na? mk. [1] / [2]
Exceptionally, I made a translation with "ku" and not with "ku-latn" because OpenStreetMap uses "ku", see: http://toolserver.org/~kolossos/openlayers/kml-on-ol.php?lang=ku