Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday March 11 2018, @10:39AM   Printer-friendly
from the söylêntnéws.org dept.

Brian Krebs writes on how browsers choose to display IDN. The issue here is of course spoofing valid URLs with visually similar letters. You probably would notice the lame attempt in the department line but some of the international characters are very similar or indeed identical. Depending on your personal preferences it might be a good idea to use punycode instead. Could save you a headache later.

https://krebsonsecurity.com/2018/03/look-alike-domains-and-visual-confusion/

Here are some of the applicable RFCs:

  • RFC 3490 - Internationalizing Domain Names in Applications (IDNA)
  • RFC 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)
  • RFC 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
  • RFC 4690 - Review and Recommendations for Internationalized Domain Names (IDNs)
  • RFC 5890 - Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
  • RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol
  • RFC 5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)
  • RFC 5893 - Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)
  • RFC 5894 - Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale

Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by opinionated_science on Sunday March 11 2018, @02:19PM (12 children)

    by opinionated_science (4031) on Sunday March 11 2018, @02:19PM (#650918)

    How about simply changing the non-ascii codes, to red say?

    I'm willing to wager most of the "fool you with similar character" URL are directed specifically at the "I expect ASCII" crowd (aka me!).

    Logically this must be the case - everyone else gets ASCII by default....

    Does this sound too hard? Display a URL with character class/colors etc...

    Starting Score:    1  point
    Moderation   +2  
       Interesting=2, Total=2
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 2) by requerdanos on Sunday March 11 2018, @03:35PM

    by requerdanos (5997) Subscriber Badge on Sunday March 11 2018, @03:35PM (#650932) Journal

    I'm willing to wager most of the "fool you with similar character" URL are directed specifically at the "I expect ASCII" crowd (aka me!).

    Let's imagine a site in the Russian-language world called привет.com (привет ~= "privyet" ~= "hi"). (This exists, with only a parking page at http://привет.com/.) [привет.com]

    If we replace the Cyrillic "в" with the latin "B", or worse, replace the Cyrillic "е" and "р" with latin "e" and "p", we get lots of variants that look identical, or almost identical, to our hypothetical original.

    So perhaps it's not the sophisticated world directing "fool you with similar" attacks at the "I Expect ASCIIs" but the "criminal element" directing "fool with similar" attacks at all-and-sundry.

    Changing color based on which Unicode page a character is from would admittedly reveal this just as well, But Ivan Pa-Russki and many others would have to put up with their address bar being an ugly error-red indicating "normal" and friendly ordinary black meaning "someone is trying to fool you."

    Punycode probably isn't a universal answer--the friendly "привет.com" becomes "xn--b1agh1afp.com" in punycode (Blag one a fop? Blog one a fip?). Which would be sort of like "google.com" always showing up as "qz--jkl2h298398j.com" for us ASCII folks. I.e. similar problem to the red-coding, but worse because instead of turning letters red, it renders them unreadable.

  • (Score: 3, Interesting) by isj on Sunday March 11 2018, @03:44PM (10 children)

    by isj (5249) on Sunday March 11 2018, @03:44PM (#650934) Homepage

    What about domains in .ru and .au? Why should the Russians/Ukranians be punished for using their own script?

    ASCII is English-centric. The only languages that I know of that can be correctly represented in ASCII are English, Dutch and Greenlandic.
    Even the languages that use the latin script (and extensions thereof) most of them cannot be represented correctly in ASCII: French, German, Norwegian, Spanish, Italian, Polish, …

    The underlying problems are:
    1: that the non-ccTLDs exist. Eg .com should have been transformed into .com.us a long time ago.
    2: some ccTLD registars have inadequate or no rules for which scripts can be used.

    Some ccTLD registrars have the rule that you can only use scripts and characters that are commonly known in the languages used in that country.

    With the non-ccTLDs .com and all the new non-specific domains (.xyz, .guru, ...) we have the problem that multiple scripts can be used in domain names. The article doesn't cover the details of Firefox' IDN checks. Firefox has a list of TLDs where the registrar has sensible rules. So eg. .dk is whitelisted because the rules there only allow the latin script with the extensions æ/ø/å.

    The mixed-script checks that the article talks about can only be applied on each domain component separately, so that doesn't help with eg.асе.com where no mixing occurs within each component.

    I'm working on related problems (eg. a blog on diacritics [privacore.com] here), but there we have sort of the opposite problem: users type something simple and we have to extend the meaning to include other glyphs/codepoints (but we don't mix latin with cyrillic because that wouldn't make sense).

    • (Score: 4, Interesting) by opinionated_science on Sunday March 11 2018, @06:36PM (6 children)

      by opinionated_science (4031) on Sunday March 11 2018, @06:36PM (#651001)

      that's not punishment, they simply have a different language.

      I speak multiple languages and not all the characters are ascii.

      But guess what if I go to "www.amazon.com" , I know it should be ascii, and so *any* URL that has a NON-standard character set should show this based on my LANG preference.

      I am sitting here with declare -x LANG="en_US.UTF-8"

      Someone in Russia would have (I imagine, someone correct me!) "LANG=ru_RU.utf8"

      Hence, the hierarchy of languages is clear. I'm not saying it should be mandated, but an option for "show non standard chars" would go a long way to combating click-jacking as the majority of languages are non-ascii.

      Is this so unreasonable?

      • (Score: 3, Interesting) by isj on Sunday March 11 2018, @07:29PM (5 children)

        by isj (5249) on Sunday March 11 2018, @07:29PM (#651025) Homepage

        I mostly agree with you.
        Note: My use of .ru TLD was a bad choice. The domains in .ru are using transliterated Russian letters, while the real Russian TLD is .рф

        I think it is reasonable to require that if an average Russian goes to the президент.рф site which has cyrillic letters in it then the cyrillic letters are shown as they should - not as raw punycode.
        And if an average German goes to www.bücher.de which has latin-1 letters in it then the latin-1 letters are shown as they should - not as raw punycode.

        Now, if an average American goes to the президент.рф site? Well, since the TLD has a strict script policy (only cyrillic is allowed) it would be okay to show the cyrillic letters. Or the raw punycode. Either would be fine IMHO.

        What about са.com (or any other TLD with loose script policy) ? This is where the idea of showing what the user should be familiar with as fine glyphs, and the unfamiliar stuff as punycode seems like a good idea. It would as you put it go a long way against click-jacking. The average American would see xn--80a7a.com while the average Russian would see са.com.

        But then you have a nasty problem: The opposite case (plain ascii ca.com) the average Russian would see uhm... (you can't punycode-encode plain a-z) some clear indication that it is not cyrillic. But that would be silly because it is quite common. Are Russians tricked by cyrillic-looking glyphs, or are they just more aware of it? Inquiring minds want to know...

        • (Score: 3, Touché) by FatPhil on Sunday March 11 2018, @08:36PM

          by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Sunday March 11 2018, @08:36PM (#651054) Homepage
          Fortunately nothing looks like Cyrillic, so B or Β will never be though of as В, nor Η or H as Н, nor Τ or T as Т.
          --
          Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
        • (Score: 2) by requerdanos on Monday March 12 2018, @12:41AM (1 child)

          by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @12:41AM (#651134) Journal

          Either would be fine IMHO.

          Looking at one, I see "president dot R F", and looking at the other, I would see "X N dash dash meaningless gibberish".

          Sure, I know tastes vary, but--I can read one of those and can't read the other. Regardless of what the machine might be able to read.

          • (Score: 2) by isj on Monday March 12 2018, @12:58AM

            by isj (5249) on Monday March 12 2018, @12:58AM (#651147) Homepage

            Now, if an average American goes to the президент.рф site? Well, since the TLD has a strict script policy (only cyrillic is allowed) it would be okay to show the cyrillic letters. Or the raw punycode. Either would be fine IMHO.

            My imperfect phrasing. What I meant was that I can see pros and cons of each approach in this particular unusual case and I don't have a strong opinion on that.

        • (Score: 2) by requerdanos on Monday March 12 2018, @12:47AM (1 child)

          by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @12:47AM (#651136) Journal

          Are Russians tricked by cyrillic-looking glyphs, or are they just more aware of it?

          I do know that I've made the odd Russian-language post on this very site, to make a point (a sad tendency I have that sometimes casts my maturity in doubt), and been rebuffed by the lameness filter *unless* I substituted Latin characters for a certain percentage of the Cyrillic ones. The look the same, read the same, and though I am no Russian, they would sure fool me.

          As a side note, it is amazing to me how much more slowly I type while using a Russian keyboard/keyboard layout than I do while using US-International layout. Is it just me?

          • (Score: 2) by isj on Monday March 12 2018, @01:22AM

            by isj (5249) on Monday March 12 2018, @01:22AM (#651160) Homepage

            I'm hoping that some Russians will chime in. I have no idea if there are the reverse phishing attacks using latin letters against cyrillic users.

            Regarding keyboard layout: I imagine that it depends on what you type and how familiar you are with the keyboard layout. If you have been programming for a while then I imagine using any non-latin keyboard would be much slower due to lack of muscle memory. Typing on a french keyboard is no fun either if it is not your primary keyboard layout. It once took me 8 tries to type my password correctly on that abomination.

    • (Score: 0) by Anonymous Coward on Sunday March 11 2018, @11:24PM (2 children)

      by Anonymous Coward on Sunday March 11 2018, @11:24PM (#651113)

      As i wrote above: match the current keyboard.
      If all characters in the url map to keys of the current keyboard, then just site the characters.
      This assumes that the user can tell the keys on his keyboard apart - quite reasonable imho.

      Otherwise: punycode.

      • (Score: 2) by requerdanos on Monday March 12 2018, @01:12AM

        by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @01:12AM (#651155) Journal

        As i wrote above: match the current keyboard.

        On my desk is an en-us keyboard. In the keyboard drawer is an ru-ru keyboard. Both connected by the magic of USB. I switch among keyboard layouts with the scroll lock key--and the "current keyboard layout" is on a per-window basis, with the layout often switched when I want to type something, not "as soon as I start reading something."

        I only have two different keyboard layouts plugged in, but I have lots of free USB ports.

        No matter what keyboard I plug in or use, punycode is readable by no one.

      • (Score: 2) by isj on Monday March 12 2018, @01:12AM

        by isj (5249) on Monday March 12 2018, @01:12AM (#651156) Homepage

        If the user's keyboard uses a different script (latin, cyrillic, arabic, hebrew, thai, ...) than the URL then that could work for some cases. However, you are overlooking the reverse scenario:a Russian with a cyrillic keyboard visiting www.example.com. In that case the whole URL should be shown as punycode, but that sounds a bit silly.