Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday March 11 2018, @10:39AM   Printer-friendly
from the söylêntnéws.org dept.

Brian Krebs writes on how browsers choose to display IDN. The issue here is of course spoofing valid URLs with visually similar letters. You probably would notice the lame attempt in the department line but some of the international characters are very similar or indeed identical. Depending on your personal preferences it might be a good idea to use punycode instead. Could save you a headache later.

https://krebsonsecurity.com/2018/03/look-alike-domains-and-visual-confusion/

Here are some of the applicable RFCs:

  • RFC 3490 - Internationalizing Domain Names in Applications (IDNA)
  • RFC 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)
  • RFC 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
  • RFC 4690 - Review and Recommendations for Internationalized Domain Names (IDNs)
  • RFC 5890 - Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
  • RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol
  • RFC 5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)
  • RFC 5893 - Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)
  • RFC 5894 - Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale

Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by isj on Sunday March 11 2018, @03:56PM (7 children)

    by isj (5249) on Sunday March 11 2018, @03:56PM (#650938) Homepage

    Perfect example of idiots bending over backward to appease the politically correct crowd - and exposing people to exploits.

    It's a perfect example of firefox considering the larger implications.

    You comment seems to indicate that you haven't understood the purpose of punycode and why people would want to use their non-english script.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by Runaway1956 on Sunday March 11 2018, @04:45PM (6 children)

    by Runaway1956 (2926) Subscriber Badge on Sunday March 11 2018, @04:45PM (#650955) Journal

    At this point in time, English is the leading language used on the internet, with Chinese trailing a respectable second place, and Spanish a very distant third. https://www.internetworldstats.com/stats7.htm [internetworldstats.com]

    It would seem reasonable to use punycode by default, especially when packaged for use in primarily English speaking countries. If you've read TFA, then you already know that other browsers will render the address into meaningful gibberish by default. Why does IE, Edge, Opera, Safari, and Chrome all get it right, but Mozilla does not?

    You and requerdanos both make cases for using a different default configuration in countries most affected by punycode.

    I can't speak for eastern European users, and certainly not for Asians - but relatively few Americans are savvy enough to understand this language issue. Default configuration for downloads in the US really should offer this protection.

    Funny thing about Pale Moon, though. When I went to the test page, my address bar looked odd. Before the little green padlock, I had the xn--80a7a.com/ clearly visible, and after the lock it showed address the same as the TFA. Firefox, however, only displays the green lock, and https://www.са.com/ [www.са.com]

    It appears that Palemoon has it right. I get my warning that maybe ca.com isn't really ca.com, but it will render the script so that you, who uses other languages and script, can see what it really is.

    • (Score: 2) by requerdanos on Sunday March 11 2018, @05:51PM

      by requerdanos (5997) Subscriber Badge on Sunday March 11 2018, @05:51PM (#650980) Journal

      It would seem reasonable to use punycode by default

      Only, as you mention, if your native language doesn't use non-ascii characters; otherwise punycode turns readable domains into machine-readable but human-unreadable gibberish, effectively removing the feature of "The domain of the current site appears in the address bar" for all sites whose domains are in your native language.

      The gibberish may be "meaningful," but a string of "meaningful numbers and letters" in lieu of a human-readable name does not go easy on the eyes. It could arguably make spoofing easier; if someone is checking to make sure that the site is displayed as punycode gibberish, then they are not going to recognize a spoof site which also shows up as gibberish unless they compare character-for-character the punycode itself. That isn't really a part of anyone's workflow, and a kludge that requires that is a workaround, but no solution.

    • (Score: 3, Interesting) by isj on Sunday March 11 2018, @05:51PM (2 children)

      by isj (5249) on Sunday March 11 2018, @05:51PM (#650982) Homepage

      It appears that Mozilla changed their algorithm since I last checked (which admittedly was a decade ago): https://wiki.mozilla.org/IDN_Display_Algorithm [mozilla.org]
      Mozillas updated algorithm detects mixed scripts, but that doesn't help for са.com where the each component does not mix scripts. I don't know which algorithm chrome/safari/opera/... use. I quick test show that chrome does not show punycode for some ccTLDs that allow non-english characters. So my guess is that mozilla allows non-english characters in .com while the other browsers do not.

      I think that defaulting to show raw punycode for non-english letters is an arrogant attitude. That ignores approximately 85% of the world population which natively use non-english letters. If you disagree then please stop using those fake letters Y, W, X, J - they are not part of the Latin alphabet :-)

      I think the browsers should consider not only the TLD but also the user's capabilities. Limiting the non-punycode display characters for US downloads to A-Z (and possibly Ñ and Ç too - there are quite a lot of Spanish speakers in the US) sounds like a reasonable idea. Or use the language preferences to infer which characters the user knows (although setting language preferences in the browser is quite rare for normal users). Combine that with IDN whitelists for ccTLD.

      Then there is the problem that the generic TLDs are being used for multiple scripts. That is where the main problem is. If the generic TLDs didn't allow mixing scripts with similarly-looking glyphs then there would not be a problem. But they do, so we have a mess. I'm in favor of abandoning the generic TLDs and only use ccTLDs - but that is not going to happen.

      • (Score: 0) by Anonymous Coward on Sunday March 11 2018, @11:19PM (1 child)

        by Anonymous Coward on Sunday March 11 2018, @11:19PM (#651110)

        Make it keyboard-dependant (as tfa hints at).
        If the url could be typed without using the compose key on the current keyboard, display as characters. Otherwise, display as punycode.

        • (Score: 2) by requerdanos on Monday March 12 2018, @03:06PM

          by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @03:06PM (#651355) Journal

          Make it keyboard-dependant (as tfa hints at).... "the current keyboard"...

          I have two keyboards connected, one US layout (It's a Microsoft Natural keyboard, the only Microsoft product that I am fond of), one Russian. All the time.

          Your solution makes no sense. Even if we say "the current keyboard layout" instead of "the keyboard", typists generally only change layouts in preparation for typing something, not reading something, not for following a link. Most web browsing doesn't use keyboard input.

          On a deeper level, there is no way to define "THE language" someone speaks, because people speak/read many languages to a certain degree. Even most Americans can pick out a handful of foreign words, which by definition is "reading another language."

          In the same way, it's very poor engineering indeed to design an information system that assumes "the keyboard" or "the monitor" or "the mouse" or "the printer" for the simple reason that any given system may have zero or more of those things. In past decades, these lessons were learned and applied, and now all your major operating systems properly allow for more than one of each with no special hoops to jump through. Deliberately un-learning such a lesson doesn't seem like progress.

          My system I'm typing on right now has the two (very different) keyboards, three monitors, a three-button mouse, a five-button mouse, and a Wacom pad also providing mouse input. The machines to its immediate right and left are headless no-keyboard no-mouse machines that I use via ssh. All of them use the same two laser printers, one color, one black and white.

    • (Score: 2, Funny) by Anonymous Coward on Sunday March 11 2018, @06:43PM

      by Anonymous Coward on Sunday March 11 2018, @06:43PM (#651005)

      At this point in time, English is the leading language used on the internet,

      Ich spreche Deutsch, du insensible Klumpen!

    • (Score: 0) by Anonymous Coward on Monday March 12 2018, @04:05AM

      by Anonymous Coward on Monday March 12 2018, @04:05AM (#651194)

      WTF
      On a mobile browser that is www.ca.com
      Highlight the url and it shows what looks like encoding.
      Why doesn't it detect that the local language is not the url language and provide warning
      The opportunities for spoofing are surreal