Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree, but it seems tricky like it would be tricky to strike exactly the right balance between unifying too much and too little.

The arguments for Han unification could have just as well been applied to unifying the Nordic languages – the Swedish Ä is really exactly the same letter as Danish/Norwegian Æ (except that Swedish words never ever use the latter and presumably Danish words never use the former), so it could be argued that it should have the same codepoint, forcing us to use country-specific fonts and making it impossible to use both languages in a simple text file.

So, for example, running "git log" on a project with an international contributor base would render either Swedish och Danish names incorrectly, depending on which font the user has selected. That would look extremely strange to me, so I imagine Chinese/Japanese/Korean users feel similarly.

(Luckily for us Nordic users, our letters were already separate characters in ISO-8859-1, which Unicode is a superset of.)



Ä and Æ are more different from eachother than the characters in Chinese and Japanese which have been merged. They are used for the same thing, they share etymology, but they are not the same letter.

By the way, Unicode is about scripts, not languages. If we started distinguishing by language, we might need to start remembering that china doesn't have a single language. Duplicating all those characters again to cover Mandarin and Cantonese and Wu and Hakka and Hokkien?

Should we have a different letter for Æ in Bokmål and Nynorsk? How about Riskmål?

The equivalent of Æ and Ä in Han scripts have not been unified. The equivalent of a in Futura and a in Verdana have. There's tons of chinese characters, so a few of them are bound to be borderline and maybe contentious. But overall, Han unification was unavoidable for a project like Unicode.


> Ä and Æ are more different from eachother than the characters in Chinese and Japanese which have been merged.

> They are used for the same thing, they share etymology, but they are not the same letter.

That second quote applies equally to Ä and Æ.


It applies to Ä and Æ... which is what the parent said. It doesn't apply to 気 and 气 and 氣. Those are all the same thing.


気 气 and 氣 are actually not merged by han unification, and could be described as similar to similar to Ä and Æ: shared etymology, same meaning, some languages decided to use a simpler character because the old one was too complicated to write.

The variations on characters which have been merged are usually even closer than that. More like the single or double storey "a", or the single or double loop "g".


Well, maybe if the double-story "a" were used exclusively in one country but considered incorrect in another.

I think what you're missing is that Han unification never particularly concerned itself with whether characters are "the same" or not. Somebody basically just decided that some differences were important and others weren't. I mean, the difference between 語 and 语 is analogous to printing and cursive, but for whatever reason it made the cut. Meanwhile the SC and TC versions of 骨 are approximately mirror images, but to show you I'd need two different fonts.

Anyway the merged differences are not just analogous to different fonts.


> I agree, but it seems tricky like it would be tricky to strike exactly the right balance between unifying too much and too little.

That may be true, but not unifying at all must certainly fall under unifying too little.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: