OK, what a mess.just wanna warning: my post will be long.
check it as detail as you can to know about Chinese language and its romanisation, if you wanna get involved into this.
A. Important things about Chinese RomanisationI. "ISO 7098:2015".1st thing of all, know things about ISO 7098:2015 as much as you can.
ISO 7098:2015 explains the principles of the Romanization of Modern Chinese Putonghua (Mandarin Chinese), the official language of the People's Republic of China as defined in the Directives for the Promotion of Putonghua, promulgated on 1956-02-06 by the State Council of China. This International Standard can be applied in documentation of bibliographies, catalogues, indices, toponymic lists, etc.
all contents in this document are important, you may know some before. and there's two parts i wanna specially mention for you, they are like:
1. In automatic romanizing working progress, there're two ways for Chinese Romanisation:
a. semi-automatic romanisation from Chinese words separated by following proper rules.
b. automatic romanisation from Chinese characters one by one.
2. During this period of time, most of other countries aside of PRC can't fully accept that romanizing Chinese characters into separated words according to combinations between Chinese characters, because the works of finding and dealing with the concept of Chinese words are complex, also the grammar of Chinese sentence can even blur it.
after thousand of thoughts, they decide to do the romanization work from Chinese characters one by one.↑ this is my opening, just mark it and go on.
II. How special Chinese is as a kind of language.according to the way characters comprise words, languages can be divided into
alphabetic language and
ideographic language, with alphabet and ideogram as their own characters.
a. alphabetic language is simple, most of you can easily know its concept. also, most of languages exist now, are alphabetic language. they are comprised with proper alphabet of their own. as i known:
- Cyrillic alphabet (eg. Russian)
- Hebrew characters (eg. Hebrew)
- Arabic alphabet (eg. Arabic)
- Armenian character (eg. Armenian)
- Georgian character (eg. Georgian)
- Old Geez abjad (eg. Old Geez) ←already dead
- Devanagari script (eg. Sanskrit)
- Tamil alphabet (eg. Tamil)
- Kana script (eg. Japanese)
- Hangul script (eg. Korean)
- Thai script (eg. Thai)
- Tibetan script (eg. Tibetan)
- Mongolian script (eg. Mongolian)
... and
tons of other alphabetic languages which may not be widely used or just dead.
b. ideographic language is like, every single character was born from some exact thing or matter, this is very different from alphabetic language.
however, as i known, language that is ideographic language are:
- Egyptian hieroglyphs (eg. Ancient Egyptian) ←already dead
- Cuneiform script (eg. Ancient Sumerian) ←already dead
- Seal hieroglyphs (eg. Ancient Indian) ←already dead
- Maya hieroglyphs (eg. Ancient Mayan) ←already dead
- Chinese characters (eg. Chinese)
and
NO MORE.
if you want to know why language system is like that, then that's a long story, i wont start telling them here.
the reason i pick up those truth above, is because
i want you guys know the chinese language's specificity and leading to how different romanisation is done between alphabetic language and ideographic language.
III. “Transliteration” and “Transcription”1. still, let's see what the most important 12 international transliteration standards aside of Chiniese's are:
- ISO 9-1995: Information and Documentation: Transliteration of Cyrillic characters into Latin characters – Slavic and non-Slavic languages
- ISO 233-1984: Documentation: Transliteration of Arabic characters into Latin Characters
- ISO 233-2-1993: Information and Documentation: Transliteration of Arabic characters into Latin characters – Part 2: Arabic language – Simplified transliteration
- ISO 233-2-1999: Information and Documentation: Transliteration of Arabic characters into Latin characters – Part 3: Persian language – Simplified transliteration
- ISO 259-1984: Information and Documentation: Transliteration of Hebrew characters into Latin characters
- ISO 259-2-1994: Information and Documentation: Transliteration of Hebrew characters into Latin characters--Part 2: Simplified transliteration
- ISO 3602-1989: Documentation: Romanization of Japanese (kana script)
- ISO 9984-1996: Information and Documentation: Transliteration of Georgian character into Latin characters
- ISO 9985-1996: Information and Documentation: Transliteration of Armenian characters into Latin characters
- ISO 11940-1998: Information and Documentation: Transliteration of Thai
- ISO 15919-2001: Information and Documentation: Transliteration of Devanagari and related Indic characters into Latin characters
- ISO TR 11941-1996: Information and Documentation: Transliteration of Korean scripts into Latin characters
see? these nearly contained all of alphabetic language i've mentioned before.
and because of that,
transliteration between their scripts and Latin characters can be easily done, no matter which one's character set is larger.
and again, because of that,
retransliteration can be done easily as well. also, this is the basic rule of transliteration.
for example: Cyrillic word "окружающая среда" (means "envirment") can be directly conversed into "okruzhayushchaya sreda" with its proper transliteration rule:
о→o
к→k
р→r
у→u
ж→zh
а→a
ю→yu
щ→shch (even used four Latin characters to make sure there's no various meaning)
а→a
я→ya
с→s
р→r
е→e
д→d
а→a
↑really simple right? just do automatic transliteration and all works will be perfect.
conversely, if you see "okruzhayushchaya sreda" in Latin characters, you can do transliteration that make it "окружающая среда" with no trouble.
the transliteration is reversible.follow the rule and i can do this even i know nothing about Cyrillic characters or Russian.
like i see "обстановка" i can transliterate it into "obstanovka" directly, even thou i have no idea what that word means.
this situation is also perfect match to all transliteration works between Latin characters and other alphabetic language.2. now, let's see Chinese.
remember the word "envirment"?
in Chinese, it's "环境“.
now tell me, how can you transliterate it into Latin characters, even if you know the transliteration rule and Pinyin system very well?
the deep reason of the transliteration work can be easily done between Latin characters and other alphabetic language, is that their character set is really small.
there're 26 Latin characters in total.
and there're 38 Cyrillic alphabet in total.
it's easy to do the mathematic mapping between them (even using 4 words like "shch" for "щ") and build an easy rule for transliteration system.
how much Chinese characters are there?
- at least 80 thousand. and still as much as 8 thousand frequently used ones.because it's a kind of, or i want to say,
the only living ideographic language.
so what?
so, that effects romanisation very much.
it has a completely different level of buiding a romanisation rule to what alphabetic languages do.
Transliteration won't do, we need to do "Transcription".when we do transcription from Chinese characters into Latin characters, we need to use Pinyin system to help us.
there're 405 syllable, so yeah, we can finally do it, with similar rules as alphabetic languages did:
it's easy to transcript Chinese characters' pingyin into Latin characters.like "环境" reads "Huan Jing" (i decide to get rid of phonetic symbols for pinyin for now, before it become more complex.), then that's the exact Latin version of that Chinese word.
however, this is not reversible.for example, if i see "Huan Jing" in Latin characters for pinyin, i can't transcript into Chinese.
i don't know if it is "环境" or "幻境" or "幻镜" or whatever other thousands of possible meanings.
all Latin characters of pinyin will occur that, and it means all of them can have various meanings.
if it's a sentence, the situation will be worse.
for example: "Wu Huan Jing Ran Zhe Me Yuan", which has "Huan Jing" in it.
but it's chinese is: "五环竟然这么远", which means "Fifth ring road is unexpectly far from here"... which has 0 connection to "环境“(envirment).
we chinese ourselves even cant understand what those words said in a short time, if they are all written in Latin characters of pinyin one by one.and this just mess this whole system up.
there're already lots of chinese language specialists noticed this, and all i've written above are all old age conversations.
they already gave a solve: do romanisation from Chinese words separated by following proper rules.like, if i met "环境", i transcript it into "huanjing".
thou there're still lots of varity meaning, words can be clearly recognized in a Latin character line.
we can easily pick up two or three Latin characters of pinyin which can be combined as a Chinese word, that helped reading the sentence a lot.wonderful right?
not really...
basic Chinese grammar is simple, just like English maybe.
but, for Chinese language's own ideographic language property: every characters, and their combinations of words, and interchanges/flexible uses happened among them, can make all those meanings different.and what's more, that just produced a lot works of dealing "what is proper Chinese words".
examples to show how hard transcription into separated words method can be:
a. "他好说话" in Chinese, its pinyin written one by one is "Ta Hao Shuo Hua".
one version of meaning: "他 好说话" means "he is an easy going person", and the separated version of transcription is "Ta Haoshuohua".
another version of meaning:" 他 好 说话" means "he is volubility", and the separated version of transcription is "Ta Hao Shuohua".
b. "他谁都赢不了“ in Chinese, its pinyin written one by one is "Ta Shui Dou Ying Bu Liao".
one version of meaning: "他 谁都赢不了" means "he can beat nobody", and one version of meaning: "他 谁 都赢不了" means "nobody can beat him". sadly, i don't even know how to transcript that sentence into correct Latin characters of pinyin, before i studied deep into the
The Basic Rules of the Chinese Phonetic Alphabet Orthography (汉语拼音正词法基本规则) or other rules like that..
and by the way, not all of that rule is solid. rules of pinyin usually changes to fit more special situations.
that shows how complex we gonna deal with Chinese words:
it's already complex enough to have those sentence understood, it'll drive we people crazy if you ask them to romanize it with words separated.
like if i saw "五环境内": i'm gonna get rid of my intuition with the obvious word "环境"(enviroment) and analyse the sentence; then i know it should be regarded as "五环 境内"(within fifth ring road precinct); then finally i output the result "Wuhuan Jingnei". this is already sick, even with 4 simple Chinese characters.
if i saw things like "阴晴圆缺", ”七里香", "非常道" which has vague concept in various Chinese language system (i'll mention this later in detail), i'll easily be mad if someone ask me romanize it.
i'm sure most Chinese CAN'T do this very well, and i think other foreigners will be worse at it for sure.
although the method of transcription from Chinese separated words helps people read Chinese sentences easier, it's a really really tough work to do that transcription.besides, there's no official standards of transcription yet. that orthography of rules just help people do it, but will not automatically do it.
(what's more discouraging, is that Chinese words sometimes dont have exact meaning.
that'll be further complex, i'll just stop here.)
all those truth above stated that:
transcription of Chiniese characters into Latin characters is a really tired and tough work to do. it cost lots of dedication and time, and required rich reserve of Chinese language knowledge, which not much of people can do.
it's all because Chinese is a kind of ideographic language, you need to know the exact meaning of every morpheme by analysing the whole sentence before you separate those characters into exact words, if you really want separated Latin characters after transcription.B. Relation to osu community nomination systemCrystilonZ wrote:
Other languages that use the Chinese script are irrelevant to this proposal.
We are only talking about Standard Mandarin here and Mandarin is not equivalent to Chinese.
We only use 'Chinese' in the draft for simplicity. The wording will be changed if this is implemented.
↑i don't know if CrystilonZ know the whole Chinese language family clear enough, so i'll add some additional things as basic background knowledge here.
ISO 639 code setsDocumentation for ISO 639 identifier: zho
Identifier: zho
Name: Chinese
Status: Active
Code sets: 639-2/T and 639-3
Equivalents: 639-1: zh
639-2/B: chi
Scope: Macrolanguage
Type: Living
Denotation: See corresponding entry in Ethnologue.
The individual languages within this macrolanguage are
- Gan Chinese [gan] → 赣语
- Hakka Chinese [hak] → 客家话
- Huizhou Chinese [czh] → 惠州话
- Jinyu Chinese [cjy] → 晋语
- Literary Chinese [lzh] → 文言文
- Mandarin Chinese [cmn] → 官话(普通话)
- Min Bei Chinese [mnp] → 闽北话
- Min Dong Chinese [cdo] → 闽东话
- Min Nan Chinese [nan] → 闽南话
- Min Zhong Chinese [czo] → 闽中话
- Pu-Xian Chinese [cpx] → 莆仙话
- Wu Chinese [wuu] → 吴语
- Xiang Chinese [hsn] → 湘语
- Yue Chinese [yue] → 粤语
ok, so, things above are just for electric area. there're still lots of other native language in PRC.
and i just don't post PRC's official native language list here, in case make things more complex.
since people like CrystilonZ may insist that Mandarin Chinese is the main target and other Chinese systems have none business with it, let's start from the concept level of "macrolanguage":
it actually has a property of "same standard pronunciation and style of writing".
and to Chinease as the macrolanguage, its standard, is just Mandarin Chinese.so the truth is, all Chinese language families DO has a common standard, and also with hundreds and thousands of connection to it. when you are talking about some other Chinese family menbers, it always be effected by Mandarin system, which is the exact center of the whole topic.
if you wanna get rid of every other Chinese language families, then you need to give another complete romanisation rule, to solve some problems may happened in transcription process. otherwise, Mandarin Chinese's is automatically an official solving way. in case of that, be shall be care about this one's effection to other Chinese language families.
and also, the so called "Cantonese" is actually a concept of "languages spoken in Guangdong Province“, contained "Min Zhong Chinese", "Hakka Chinese“ and "Yue Chinese". people just usually use its narrow sense of concept: almost regard "Cantonese" as "Yue Chinese".
what's more, native language spoken in Taiwan is a kind of Min Nan Chinese, in case some ignorant one jumps out.with all those knowledges above, we can move on:
I. How to deal with Mandarin Chinese transcription with words from other Chinese language families, but also already became a part of it?1. Chinese archaism
it's a part of Literary Chinese, but also become a part of Mandarin Chinese.
some of them even changed meaning, and it's hard to distinguish.
if Literary Chinese is regarded as another individual language aside of Mandarin Chinese, then when meet words like "空穴来风", "闭门造车", "人尽可夫", etc, how to deal with these?
2. multi-Chinese based songs
for example, there's a Chinese song called "好心分手", one of its version is sang by both Yue Chinese and Mandarin Chinese.
so Yue Chinese romanized version is "Hou Sam Fan Sau/Housam Fansou" (actually this is jupting, a special kind of pinyin)
and Mandarin Chinese romanized version is "Hao Xin Fen Shou/Haoxin Fenshou".
both of them are spoken exactly correct, then how to deal with these?
3. with Chinese families that no romanisation rules supported
for example, there's a Chinese song called "外滩18号", which is sang by three kind of Chinese language: Mandarin Chinese, Wu Chinese and "Southwestern Hakka" (an official native Chinese language of PRC).
so it can be romanized like:
Mandarin Chinese: "Wai Tan Shi Ba Hao/Waitan Shibahao"
Wu Chinese: "Nga Thae Tze Ba O/Ngathae Tzebao"
Southwestern Hakka Chinese: "Vai Tan Si Ba Hao/Vaitan Sibahao"
i'm not sure if those ones are correct (just typed here with searching dictionary of native romanisation) aside of Mandarin ones, but it can still have chance to have the romanisation of their own part, right?
then how to deal with these?
II. Even if we shall transcript Mandarin Chinese from separated words into Latin characters, who is the one help those mappers mapping a Chinese song?it has some part:
- is this a Mandarin Chinese song?
- maybe from official settings or sites, not a big deal. but will not do if you map some cult song. - how to get the right romanized characters?
- ask some Chinese staff/mapper/player? i doult any of them have time/ability to do it. - how to make sure those things i got is correct?
- some kind of same as the one above, if that person exsist and can do his job endlessly, he will be really welcomed to this system.
you may think most of Chinese words may not complex like that, but if you wanna build a reasonable system for rules, it should be strict.and it's not you become the person who do this kind of work, you can hardly imagine if it's hard to do it or not.
C. SummaryI. Opinions1. even international level groups can't do lots of romanisation for Mandarin-Latin transcription from separated words.
it's feasible, for it's truth. but it's efficiency is really really badly low. Chinese staffs will be weary/tired out to death if they really do this. because as you see what i've explained, it's a tough work with a tough progress to do.also i even can predict that someone wanna find a right answer of correct Mandarin romanisaton for month, and still dqed after he found the answer he got is still wrong. then it may block people mapping Chinese songs, personally i think that's really a bad news.
2. Mandarin Chinese and Cantonese has standard romanisation rules, but not other Chinese families. it's hard to complete one of you don't care all of them, for every single one of them has a common standard pronunciation and style of writing: Mandarin Chinese.
i
n case of that, rebuilding the Mandarin Chinese romanisation system in to a better and complete one will be a really hard work to do, and it's for sure out of osu community's range.3. Chinese osu community already argued this for several times long time ago, and the result is still: keep the current state.
II. Conclusion do romanisation from one by one Mandarin Chinese characters is the best way SO FAR.until we find some genius invent a dictionary of Mandarin-Chinese-characters-Latin-characters romanisation, and upgrade the efficiency a lot more than current one.
and also, this is the exact thing what international groups do right now. (they only combine proper nouns like people's or place's name, etc.)
--------------
simple extra p.s. here:
to CrystilonZ, and other people who know little things about Chinese:
i think you had some wrong idea about Chinese characters, for i've seen written these:
CrystilonZ wrote:
Similar to Japanese, one Chinese character does represent one single syllable. However, a word is not necessarily comprised of one syllable (like Japanese, Chinese is a polysyllabic language).For example 图书馆 (túshūguǎn) as a whole means library, and writing 'li bra ry' would defeat the purpose of Romanisation by not resembling the structure of languages using the Roman alphabet.
Chinese is far different from Japanese. the syllable thing you are talking about may be just the differences between Japanese's Hiragana or Katagana, but not that true for Kanji part.
(btw, you may already know that a part of Japanese language system is just the exact Chinese.)
and now after reading all things i wrote above, you may know Chinese is not only a kind of polysyllabic language, but also the only living ideographic language.
"图书馆" reads "tú shū guǎn" and means "library", true.
However, "图书" reads "tú shū" and means "library book" or just “(picture) book", you ever know that?
this is far different from that you can't separate an English word in most cases: but you DO can separate a Chinese word, because every single character of Chinese can be a word.
eg.
图→graph, graphic, or lots of other meanings;
书→book, writing, letter, or lots of other meanings;
馆→shop, embassy, galleries or any building that showing something it wants to.
so, the one-character-one-word method is a solid reasonable metod for Chinese romanisation.
with knowledge of these, hope you can restructure your idea about Chinese, for helping you understand previous romanisation part.
--------------
hope all of these things could help you know more about Chinese romanisation.
also if you have any confusion about anything above, you are always welcomed to ask.