Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday March 11 2018, @10:39AM   Printer-friendly
from the söylêntnéws.org dept.

Brian Krebs writes on how browsers choose to display IDN. The issue here is of course spoofing valid URLs with visually similar letters. You probably would notice the lame attempt in the department line but some of the international characters are very similar or indeed identical. Depending on your personal preferences it might be a good idea to use punycode instead. Could save you a headache later.

https://krebsonsecurity.com/2018/03/look-alike-domains-and-visual-confusion/

Here are some of the applicable RFCs:

  • RFC 3490 - Internationalizing Domain Names in Applications (IDNA)
  • RFC 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)
  • RFC 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
  • RFC 4690 - Review and Recommendations for Internationalized Domain Names (IDNs)
  • RFC 5890 - Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
  • RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol
  • RFC 5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)
  • RFC 5893 - Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)
  • RFC 5894 - Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale

Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by requerdanos on Sunday March 11 2018, @02:58PM (4 children)

    by requerdanos (5997) Subscriber Badge on Sunday March 11 2018, @02:58PM (#650927) Journal

    which bit of " ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case" do you fail to understand?

    Pues, la parte en que en la idioma española, y en otras idiomas tambien, hay mas que veintiseis letras, por ejemplo. En eso solo refiere a idiomas escritas en letras latinas; idiomas como ruso y griego tienes letras--si, son letras--que son fuera du tu idea de lo que son las letras. Hay miles de milliones de gente que no usan el alfabeto que usas. No se puede decir 'de alfa á omega' sin letras que tu no reconoces.

    ясно, что алфавит - это не то, что вы думаете.

    有些語言甚至不使用字母。

    Even given the depth of your inability to define the word "letter" in a broadly useful way, however, Unicode is a very poor answer because of all the characters which are frankly visual duplicates, and many more that are inexact duplicates but duplicates nonetheless. A good answer would allow visiting something like española.com or 那些愚蠢的西方人.net or избирательные-хакеры.org with each glyph being not only unique, but mapped to a unique code point.

    Unicode isn't anything like that, and using it is a gaping security hole that enables sophisticated-seeming but dead-simple spoofing. Unicode-enabled fake domain + letsencrypt would have an undetectability factor of something like 90%. It's vulnerability by design.

    Starting Score:    1  point
    Moderation   +2  
       Interesting=2, Total=2
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 3, Insightful) by coolgopher on Monday March 12 2018, @12:50AM (3 children)

    by coolgopher (1157) on Monday March 12 2018, @12:50AM (#651140)

    Unicode lost its way a long time ago. Unfortunately. It's still better than having to deal with the old "code page" approach though I think. Or is it? I didn't get emojis in my code pages at least...

    • (Score: 0) by Anonymous Coward on Monday March 12 2018, @03:54AM

      by Anonymous Coward on Monday March 12 2018, @03:54AM (#651191)

      Dude. Please, stop. Some sort of warning before posting that stuff.
      Too many years of dealing with "cockup pages"

    • (Score: 0) by Anonymous Coward on Monday March 12 2018, @09:42AM (1 child)

      by Anonymous Coward on Monday March 12 2018, @09:42AM (#651268)

      Actually, I think in *this* case - DNS - the code page idea is not that far off. But instead of the overlapping code pages we used to have, have code pages map onto the unicode space, and let the user select which code pages to show as unicode letters, with the rest being shown as punycode. Of course with sensible defaults depending on the system language.

      That way, a Russian would be able to see domain names in Cyrillic, and a Chinese would be able to see domain names in Chinese, but the Russian would see Chinese domain names (which he can't read anyway) as punycode.

      • (Score: 2) by requerdanos on Monday March 12 2018, @02:38PM

        by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @02:38PM (#651343) Journal

        the Russian would see Chinese domain names (which he can't read anyway) as punycode

        Believe it or not, reading more than one alphabet is really not uncommon. (Heck, I am from the linguistically and geographically ignorant United States of America, and I can read more than one alphabet--Latin, Cyrillic, some Greek--imagine how much more so someone from countries where "literate" implies at least "bilingual", and that's most of them...)

        Take your Russian friend in your example. China is his neighbor to the south. Even if he isn't fluent in Chinese, he can still pick out some Chinese here and there. He can tell the difference between the similar "人" and "入", for example.

        Which punycode kills, stone cold dead. If you start obfuscating domains with punycode, then suddenly all the Chinese domains--all the non-Latin, non-Cyrillic domains--look alike and he can't readily tell one from another, even though he could before. That makes things arguably worse, not better.

        Plus, there is no punycode for Latin characters, many of which are dead ringers for his native Cyrillic ones. "да.example" and "дa.example" might look the same, but one's all Cyrillic and the other is a mixed Cyrillic and Latin phishing site. The only way punycode would show this difference would be if he's looking at Cyrillic-alphabet sites in punycode ("xn--80ah.example" vs. "xn--a-gtb.example"), which would be insane (he would like to be able to read the address bar) and additionally, unhelpful, because neither of those is legible and so both register as "X N dash dash gibberish." So this approach makes him feel warm, fuzzy, and protected, while leaving him wide open to alphabet attacks Latin vs. Cyrillic. That makes things arguably worse, not better.

        Only knowing, recognizing, or speaking one language or alphabet is not a condition most people are in, even if you and/or most of your neighbors may be.