Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday March 11 2018, @10:39AM   Printer-friendly
from the söylêntnéws.org dept.

Brian Krebs writes on how browsers choose to display IDN. The issue here is of course spoofing valid URLs with visually similar letters. You probably would notice the lame attempt in the department line but some of the international characters are very similar or indeed identical. Depending on your personal preferences it might be a good idea to use punycode instead. Could save you a headache later.

https://krebsonsecurity.com/2018/03/look-alike-domains-and-visual-confusion/

Here are some of the applicable RFCs:

  • RFC 3490 - Internationalizing Domain Names in Applications (IDNA)
  • RFC 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)
  • RFC 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
  • RFC 4690 - Review and Recommendations for Internationalized Domain Names (IDNs)
  • RFC 5890 - Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
  • RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol
  • RFC 5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)
  • RFC 5893 - Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)
  • RFC 5894 - Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale

Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by coolgopher on Monday March 12 2018, @12:50AM (3 children)

    by coolgopher (1157) on Monday March 12 2018, @12:50AM (#651140)

    Unicode lost its way a long time ago. Unfortunately. It's still better than having to deal with the old "code page" approach though I think. Or is it? I didn't get emojis in my code pages at least...

    Starting Score:    1  point
    Moderation   +1  
       Insightful=1, Total=1
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 0) by Anonymous Coward on Monday March 12 2018, @03:54AM

    by Anonymous Coward on Monday March 12 2018, @03:54AM (#651191)

    Dude. Please, stop. Some sort of warning before posting that stuff.
    Too many years of dealing with "cockup pages"

  • (Score: 0) by Anonymous Coward on Monday March 12 2018, @09:42AM (1 child)

    by Anonymous Coward on Monday March 12 2018, @09:42AM (#651268)

    Actually, I think in *this* case - DNS - the code page idea is not that far off. But instead of the overlapping code pages we used to have, have code pages map onto the unicode space, and let the user select which code pages to show as unicode letters, with the rest being shown as punycode. Of course with sensible defaults depending on the system language.

    That way, a Russian would be able to see domain names in Cyrillic, and a Chinese would be able to see domain names in Chinese, but the Russian would see Chinese domain names (which he can't read anyway) as punycode.

    • (Score: 2) by requerdanos on Monday March 12 2018, @02:38PM

      by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @02:38PM (#651343) Journal

      the Russian would see Chinese domain names (which he can't read anyway) as punycode

      Believe it or not, reading more than one alphabet is really not uncommon. (Heck, I am from the linguistically and geographically ignorant United States of America, and I can read more than one alphabet--Latin, Cyrillic, some Greek--imagine how much more so someone from countries where "literate" implies at least "bilingual", and that's most of them...)

      Take your Russian friend in your example. China is his neighbor to the south. Even if he isn't fluent in Chinese, he can still pick out some Chinese here and there. He can tell the difference between the similar "人" and "入", for example.

      Which punycode kills, stone cold dead. If you start obfuscating domains with punycode, then suddenly all the Chinese domains--all the non-Latin, non-Cyrillic domains--look alike and he can't readily tell one from another, even though he could before. That makes things arguably worse, not better.

      Plus, there is no punycode for Latin characters, many of which are dead ringers for his native Cyrillic ones. "да.example" and "дa.example" might look the same, but one's all Cyrillic and the other is a mixed Cyrillic and Latin phishing site. The only way punycode would show this difference would be if he's looking at Cyrillic-alphabet sites in punycode ("xn--80ah.example" vs. "xn--a-gtb.example"), which would be insane (he would like to be able to read the address bar) and additionally, unhelpful, because neither of those is legible and so both register as "X N dash dash gibberish." So this approach makes him feel warm, fuzzy, and protected, while leaving him wide open to alphabet attacks Latin vs. Cyrillic. That makes things arguably worse, not better.

      Only knowing, recognizing, or speaking one language or alphabet is not a condition most people are in, even if you and/or most of your neighbors may be.