Brian Krebs writes on how browsers choose to display IDN. The issue here is of course spoofing valid URLs with visually similar letters. You probably would notice the lame attempt in the department line but some of the international characters are very similar or indeed identical. Depending on your personal preferences it might be a good idea to use punycode instead. Could save you a headache later.
https://krebsonsecurity.com/2018/03/look-alike-domains-and-visual-confusion/
Here are some of the applicable RFCs:
(Score: 4, Interesting) by requerdanos on Sunday March 11 2018, @02:58PM (4 children)
Pues, la parte en que en la idioma española, y en otras idiomas tambien, hay mas que veintiseis letras, por ejemplo. En eso solo refiere a idiomas escritas en letras latinas; idiomas como ruso y griego tienes letras--si, son letras--que son fuera du tu idea de lo que son las letras. Hay miles de milliones de gente que no usan el alfabeto que usas. No se puede decir 'de alfa á omega' sin letras que tu no reconoces.
ясно, что алфавит - это не то, что вы думаете.
有些語言甚至不使用字母。
Even given the depth of your inability to define the word "letter" in a broadly useful way, however, Unicode is a very poor answer because of all the characters which are frankly visual duplicates, and many more that are inexact duplicates but duplicates nonetheless. A good answer would allow visiting something like española.com or 那些愚蠢的西方人.net or избирательные-хакеры.org with each glyph being not only unique, but mapped to a unique code point.
Unicode isn't anything like that, and using it is a gaping security hole that enables sophisticated-seeming but dead-simple spoofing. Unicode-enabled fake domain + letsencrypt would have an undetectability factor of something like 90%. It's vulnerability by design.
(Score: 3, Insightful) by coolgopher on Monday March 12 2018, @12:50AM (3 children)
Unicode lost its way a long time ago. Unfortunately. It's still better than having to deal with the old "code page" approach though I think. Or is it? I didn't get emojis in my code pages at least...
(Score: 0) by Anonymous Coward on Monday March 12 2018, @03:54AM
Dude. Please, stop. Some sort of warning before posting that stuff.
Too many years of dealing with "cockup pages"
(Score: 0) by Anonymous Coward on Monday March 12 2018, @09:42AM (1 child)
Actually, I think in *this* case - DNS - the code page idea is not that far off. But instead of the overlapping code pages we used to have, have code pages map onto the unicode space, and let the user select which code pages to show as unicode letters, with the rest being shown as punycode. Of course with sensible defaults depending on the system language.
That way, a Russian would be able to see domain names in Cyrillic, and a Chinese would be able to see domain names in Chinese, but the Russian would see Chinese domain names (which he can't read anyway) as punycode.
(Score: 2) by requerdanos on Monday March 12 2018, @02:38PM
Believe it or not, reading more than one alphabet is really not uncommon. (Heck, I am from the linguistically and geographically ignorant United States of America, and I can read more than one alphabet--Latin, Cyrillic, some Greek--imagine how much more so someone from countries where "literate" implies at least "bilingual", and that's most of them...)
Take your Russian friend in your example. China is his neighbor to the south. Even if he isn't fluent in Chinese, he can still pick out some Chinese here and there. He can tell the difference between the similar "人" and "入", for example.
Which punycode kills, stone cold dead. If you start obfuscating domains with punycode, then suddenly all the Chinese domains--all the non-Latin, non-Cyrillic domains--look alike and he can't readily tell one from another, even though he could before. That makes things arguably worse, not better.
Plus, there is no punycode for Latin characters, many of which are dead ringers for his native Cyrillic ones. "да.example" and "дa.example" might look the same, but one's all Cyrillic and the other is a mixed Cyrillic and Latin phishing site. The only way punycode would show this difference would be if he's looking at Cyrillic-alphabet sites in punycode ("xn--80ah.example" vs. "xn--a-gtb.example"), which would be insane (he would like to be able to read the address bar) and additionally, unhelpful, because neither of those is legible and so both register as "X N dash dash gibberish." So this approach makes him feel warm, fuzzy, and protected, while leaving him wide open to alphabet attacks Latin vs. Cyrillic. That makes things arguably worse, not better.
Only knowing, recognizing, or speaking one language or alphabet is not a condition most people are in, even if you and/or most of your neighbors may be.