Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday March 11 2018, @10:39AM   Printer-friendly
from the söylêntnéws.org dept.

Brian Krebs writes on how browsers choose to display IDN. The issue here is of course spoofing valid URLs with visually similar letters. You probably would notice the lame attempt in the department line but some of the international characters are very similar or indeed identical. Depending on your personal preferences it might be a good idea to use punycode instead. Could save you a headache later.

https://krebsonsecurity.com/2018/03/look-alike-domains-and-visual-confusion/

Here are some of the applicable RFCs:

  • RFC 3490 - Internationalizing Domain Names in Applications (IDNA)
  • RFC 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)
  • RFC 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
  • RFC 4690 - Review and Recommendations for Internationalized Domain Names (IDNs)
  • RFC 5890 - Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
  • RFC 5891 - Internationalized Domain Names in Applications (IDNA): Protocol
  • RFC 5892 - The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)
  • RFC 5893 - Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)
  • RFC 5894 - Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale

Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by martyb on Sunday March 11 2018, @11:13AM (31 children)

    by martyb (76) Subscriber Badge on Sunday March 11 2018, @11:13AM (#650884) Journal

    As the person who performed the testing of the implementation of UTF-8 and Unicode support on SoylentNews, I am curious what experiences other Soylentils may have in this area.

    How did you perform your testing?

    What tools did you find helpful?

    What test data or even test suites did you use?

    Besides the RFCs, what other documents did you find helpful or instructive?

    --
    Wit is intellect, dancing.
    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 3, Troll) by FatPhil on Sunday March 11 2018, @11:44AM (25 children)

    by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Sunday March 11 2018, @11:44AM (#650889) Homepage
    Fuck everyone. ASCII everywhere, all the time. Don't like it? Invent your own internet!
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2, Troll) by FatPhil on Sunday March 11 2018, @12:11PM (11 children)

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Sunday March 11 2018, @12:11PM (#650896) Homepage
      If you disagree, mod me with a "disagree", or present a counter-argument.

      Modding me "troll" for simply stating my *entirely justifiable* opinion is cowardly.

      The fact that punycode exists is all the proof you need that DNS was never intended to support non-ASCII. The second someone mentioned the idea of expanding the alphabet the wiser thinkers said "you'll get spoofing if you do that" - that was decades ago. We, or they, didn't listen, and now we've got an unresolvable mess, just because some PC types wanted to be "inclusive". Fuck inclusivity - which bit of "<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case" do you fail to understand?
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 4, Touché) by c0lo on Sunday March 11 2018, @01:04PM (2 children)

        by c0lo (156) Subscriber Badge on Sunday March 11 2018, @01:04PM (#650907) Journal

        Modding me "troll" for simply stating my *entirely justifiable* opinion is cowardly.

        "Fuck everyone" at the beginning of of what it is supposed to be an argumentation one can agree or disagree with?
        Well, fuck you.

        If you want a discussion, be civil. Show a minimum level of respect necessary to have a discussion - otherwise all I can get is "If you disagree with me, fuck you. Mind you, I'll fuck you even if you agree with me".

        There. It is spelled clear enough for you to get it?

        --
        https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
        • (Score: 1, Insightful) by khallow on Sunday March 11 2018, @03:01PM

          by khallow (3766) Subscriber Badge on Sunday March 11 2018, @03:01PM (#650929) Journal
          You know what I consider rude? Discounting [soylentnews.org] an idea and insulting those who hold it as being "delusional" without actually having thought about it. I tend to respond appropriately [soylentnews.org] to such things.

          If you want a discussion, be civil. Show a minimum level of respect necessary to have a discussion - otherwise all I can get is "If you disagree with me, fuck you. Mind you, I'll fuck you even if you agree with me".

          There. It is spelled clear enough for you to get it?

          What an interesting idea. I think it's a bit crazy for the internets though.

        • (Score: 2, Troll) by FatPhil on Sunday March 11 2018, @03:59PM

          by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Sunday March 11 2018, @03:59PM (#650939) Homepage
          "Fuck everyone" was simply shorthand for "Everyone who doesn't use the character set that DNS, and the internet in general, was designed around can go fuck themself" I apologise for making it too fucking condensed.
          --
          Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 4, Interesting) by requerdanos on Sunday March 11 2018, @02:58PM (4 children)

        by requerdanos (5997) Subscriber Badge on Sunday March 11 2018, @02:58PM (#650927) Journal

        which bit of " ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case" do you fail to understand?

        Pues, la parte en que en la idioma española, y en otras idiomas tambien, hay mas que veintiseis letras, por ejemplo. En eso solo refiere a idiomas escritas en letras latinas; idiomas como ruso y griego tienes letras--si, son letras--que son fuera du tu idea de lo que son las letras. Hay miles de milliones de gente que no usan el alfabeto que usas. No se puede decir 'de alfa á omega' sin letras que tu no reconoces.

        ясно, что алфавит - это не то, что вы думаете.

        有些語言甚至不使用字母。

        Even given the depth of your inability to define the word "letter" in a broadly useful way, however, Unicode is a very poor answer because of all the characters which are frankly visual duplicates, and many more that are inexact duplicates but duplicates nonetheless. A good answer would allow visiting something like española.com or 那些愚蠢的西方人.net or избирательные-хакеры.org with each glyph being not only unique, but mapped to a unique code point.

        Unicode isn't anything like that, and using it is a gaping security hole that enables sophisticated-seeming but dead-simple spoofing. Unicode-enabled fake domain + letsencrypt would have an undetectability factor of something like 90%. It's vulnerability by design.

        • (Score: 3, Insightful) by coolgopher on Monday March 12 2018, @12:50AM (3 children)

          by coolgopher (1157) on Monday March 12 2018, @12:50AM (#651140)

          Unicode lost its way a long time ago. Unfortunately. It's still better than having to deal with the old "code page" approach though I think. Or is it? I didn't get emojis in my code pages at least...

          • (Score: 0) by Anonymous Coward on Monday March 12 2018, @03:54AM

            by Anonymous Coward on Monday March 12 2018, @03:54AM (#651191)

            Dude. Please, stop. Some sort of warning before posting that stuff.
            Too many years of dealing with "cockup pages"

          • (Score: 0) by Anonymous Coward on Monday March 12 2018, @09:42AM (1 child)

            by Anonymous Coward on Monday March 12 2018, @09:42AM (#651268)

            Actually, I think in *this* case - DNS - the code page idea is not that far off. But instead of the overlapping code pages we used to have, have code pages map onto the unicode space, and let the user select which code pages to show as unicode letters, with the rest being shown as punycode. Of course with sensible defaults depending on the system language.

            That way, a Russian would be able to see domain names in Cyrillic, and a Chinese would be able to see domain names in Chinese, but the Russian would see Chinese domain names (which he can't read anyway) as punycode.

            • (Score: 2) by requerdanos on Monday March 12 2018, @02:38PM

              by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @02:38PM (#651343) Journal

              the Russian would see Chinese domain names (which he can't read anyway) as punycode

              Believe it or not, reading more than one alphabet is really not uncommon. (Heck, I am from the linguistically and geographically ignorant United States of America, and I can read more than one alphabet--Latin, Cyrillic, some Greek--imagine how much more so someone from countries where "literate" implies at least "bilingual", and that's most of them...)

              Take your Russian friend in your example. China is his neighbor to the south. Even if he isn't fluent in Chinese, he can still pick out some Chinese here and there. He can tell the difference between the similar "人" and "入", for example.

              Which punycode kills, stone cold dead. If you start obfuscating domains with punycode, then suddenly all the Chinese domains--all the non-Latin, non-Cyrillic domains--look alike and he can't readily tell one from another, even though he could before. That makes things arguably worse, not better.

              Plus, there is no punycode for Latin characters, many of which are dead ringers for his native Cyrillic ones. "да.example" and "дa.example" might look the same, but one's all Cyrillic and the other is a mixed Cyrillic and Latin phishing site. The only way punycode would show this difference would be if he's looking at Cyrillic-alphabet sites in punycode ("xn--80ah.example" vs. "xn--a-gtb.example"), which would be insane (he would like to be able to read the address bar) and additionally, unhelpful, because neither of those is legible and so both register as "X N dash dash gibberish." So this approach makes him feel warm, fuzzy, and protected, while leaving him wide open to alphabet attacks Latin vs. Cyrillic. That makes things arguably worse, not better.

              Only knowing, recognizing, or speaking one language or alphabet is not a condition most people are in, even if you and/or most of your neighbors may be.

      • (Score: 2, Troll) by realDonaldTrump on Monday March 12 2018, @09:00AM (1 child)

        by realDonaldTrump (6614) on Monday March 12 2018, @09:00AM (#651260) Homepage Journal

        We only use 26 letters for Internet, for the Web addresses in Internet, the domain names. I put big letters all the time. But Browser always changes them to small. Very difficult & confusing if DONALDJTRUMP.COM was a different site from DonaldJTrump.com! And there would be A LOT of Fake & Hoax sites!

        • (Score: 1, Troll) by realDonaldTrump on Monday March 12 2018, @04:13PM

          by realDonaldTrump (6614) on Monday March 12 2018, @04:13PM (#651390) Homepage Journal

          One of the Moderators doesn't know Internet. Doesn't know the alphabet. And doesn't want anyone else to learn. That's OK, I love poorly educated people.

      • (Score: 2, Informative) by realDonaldTrump on Monday March 12 2018, @04:25PM

        by realDonaldTrump (6614) on Monday March 12 2018, @04:25PM (#651400) Homepage Journal

        WRONG! 26 letters in the Internet alphabet. Not 52.

    • (Score: 2) by maxwell demon on Sunday March 11 2018, @12:16PM (1 child)

      by maxwell demon (1608) on Sunday March 11 2018, @12:16PM (#650897) Journal

      Well, the biggest problem with inventing your own internet is getting other people build and to use it. I think I'll finish my work on a time machine first. ;-)

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 3, Informative) by requerdanos on Monday March 12 2018, @12:32AM

        by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @12:32AM (#651131) Journal

        the biggest problem with inventing your own internet is getting other people build and to use it.

        Use it, sure; adoption would be a problem until a critical mass was reached.

        But build it? Most devices capable of operating on an internetwork have the hardware (ethernet, wifi, etc.) and software (networking stack that can perform tcp/ip) built right in or easily available.

        Instead of connecting your devices to the Internet, connect them instead to your internet. The infrastructure (the links between nodes, not the routing, dns, etc.) is going to be largely the same; besides leased lines between locations, you could even tunnel links across another network (such as the Internet).

        I would hope that a person starting his or her own internetwork would start with something like ipv6 (and not, not, not ipv4) but I expect the opposite.

    • (Score: 3, Interesting) by Anonymous Coward on Sunday March 11 2018, @02:17PM (9 children)

      by Anonymous Coward on Sunday March 11 2018, @02:17PM (#650917)

      If you disagree, mod me with a "disagree", or present a counter-argument.

      Modding me "troll" for simply stating my *entirely justifiable* opinion is cowardly.

      Ok.

      Fuck everyone. ASCII everywhere, all the time. Don't like it? Invent your own internet!

      Uh... gimme a second.

      The fucking fact that you're too fucking stupid to learn a fucking foreign language shouldn't fucking doom the rest of the fucking world to keep fucking using fucking ancient, US-only fucking standards. If you don't fucking like fucking non-ASCII parts of the fucking Internet, you're more than fucking welcome to keep the fuck away from them. Fuckity fuck.

      ... There. I think that was the level of discourse you wanted. If you're willing to participate in a polite discussion instead, please say so. Thank you.

      • (Score: 2) by requerdanos on Sunday March 11 2018, @03:15PM (1 child)

        by requerdanos (5997) Subscriber Badge on Sunday March 11 2018, @03:15PM (#650931) Journal

        There. I think that was the level of discourse you wanted. If you're willing to participate in a polite discussion instead, please say so. Thank you.

        As kind as that was of you to adapt to circumstances and offer your flexibility and patience, I have never found such an effort, nor such an offer, to be properly appreciated. Know that at least one person appreciates it.

      • (Score: 2) by FatPhil on Sunday March 11 2018, @04:03PM

        by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Sunday March 11 2018, @04:03PM (#650940) Homepage
        What made you think I don't know any foreign language, or that I'm from the US?

        You're a presumptive arse - I'm simply being *practical*.
        --
        Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 3, Interesting) by HiThere on Sunday March 11 2018, @07:52PM (5 children)

        by HiThere (866) Subscriber Badge on Sunday March 11 2018, @07:52PM (#651034) Journal

        I think his, poorly stated, point was that URLs should only contain ASCII-7 characters. I'm not, however, certain. If I'm correct as to what he meant, then there are valid arguments in favor of it. E.g., it not only avoids ambiguities, it allows significant URL compression when compared to the alternatives. And ambiguous URLs are dangerous.

        That said, an alternative that answers some of the objections would be to specify a font in which there were no ambiguous URLs to be used for the display of URLs. Unfortunately, the only ones I've encountered do something like display a numeric code for many valid URL codes. Also that would negate the possibility of compression, though admittedly URLs are generally short enough that this wouldn't be very significant in most circumstances. But if you knew that the codes were ASCII-7 alphanumerics you could use a byte for each character, with one bit for parity. And there would be several unused characters that could be used for control codes. This gives almost-optimal compression.

        So there is a clear case of the requirement that URLs should contain only ASCII-7 characters, and mainly alphanumerics. And there are arguments against allowing a fuller unicode implementation, as that would, at minimum, mean you could no longer specify parity. And it also provides techniques to allow spoofing.

        N.B.: This is not a claim that there is not a valid counter-argument, but rather that I haven't encountered one that impressed me.

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 2) by FatPhil on Monday March 12 2018, @09:16AM (4 children)

          by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Monday March 12 2018, @09:16AM (#651262) Homepage
          URLs are a related issue, DNS was the matter in hand, but in general my opinions are similar.
          The internet was internetting first, if the rest of the world wants to play, it should adapt to the internet, not have the internet adapt to it.
          --
          Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
          • (Score: 2) by massa on Monday March 12 2018, @07:28PM (3 children)

            by massa (5547) on Monday March 12 2018, @07:28PM (#651484)

            You do realize the "rest of the world" internet is far bigger than the USofA internet, don't you?

            • (Score: 2) by FatPhil on Tuesday March 13 2018, @07:37AM (2 children)

              by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Tuesday March 13 2018, @07:37AM (#651718) Homepage
              Since when has "more populous" meant "better"?

              As I said initially - let them invent their own internet.
              --
              Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
              • (Score: 2) by massa on Tuesday March 20 2018, @02:21PM (1 child)

                by massa (5547) on Tuesday March 20 2018, @02:21PM (#655391)

                We did. And we even let you USofAns in :-)

                • (Score: 2) by FatPhil on Tuesday March 20 2018, @04:34PM

                  by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Tuesday March 20 2018, @04:34PM (#655465) Homepage
                  Don't taint me with that association. I'll welcome with open arms any USian who wants to get the fuck out of the shithole they were cursed to be born in, but apart from that, the US can disappear up its own septic arsehole for all I care.

                  You see there's no hypocrisy in my statement - I happily promote the American Standard Code for Information Interchange as being what the internet was built around despite not being American.
                  --
                  Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2, Troll) by realDonaldTrump on Monday March 12 2018, @08:44AM

      by realDonaldTrump (6614) on Monday March 12 2018, @08:44AM (#651255) Homepage Journal

      So true! China, VERY WEAK country when Internet was invented. And in China they have VERY SPECIAL writing, they don't use letters, they use little pictures. And they use our numbers, those are the same. We use our letters & numbers for Internet because the Internet is American. Invented in America, belonged to America for a long time. Until Obama VERY STUPIDLY decided to turn over our Internet to foreigners.

      China is becoming very powerful. While America has become VERY WEAK. Before, the Chinese Internet used numbers, because they couldn't use their picture writing. 163.com for example, very big site in China, right? But now they want to use their picture writing for the domain names -- people don't know this, a domain name is just the name of an Internet site. Very important part of the Web address. In America we don't use the picture writing, very hard for us to do that. We try to look at the Chinese Internet, very difficult for us, maybe, probably we have to get a Chinese typewriter. But Chinese typewriters are always made in........China!!!! Money leaving our Country, JOBS leaving our Country, bigger trade deficit.

      Folks, we need to TAKE BACK OUR INTERNET. The U.S. should not turn control of the Internet over to the United Nations and the international community. America First!

  • (Score: 2) by c0lo on Sunday March 11 2018, @11:56AM (2 children)

    by c0lo (156) Subscriber Badge on Sunday March 11 2018, @11:56AM (#650893) Journal

    How did you perform your testing?

    You asking for means other than dick niggers?

    --
    https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 0) by Anonymous Coward on Monday March 12 2018, @03:59AM (1 child)

      by Anonymous Coward on Monday March 12 2018, @03:59AM (#651192)

      Psst! YOU FORGOT TO TICK THE POST ANON BOX
      Now everyone knows you are one of the ducks who posts about black people using unPC terminology

      • (Score: 2) by requerdanos on Monday March 12 2018, @02:49PM

        by requerdanos (5997) Subscriber Badge on Monday March 12 2018, @02:49PM (#651347) Journal

        posts about black people

        That's not a post about people of any particular color--that's the self-given name of a troll who made several racially charged, obscenity-laden almost-spam* troll posts here. Once the messages started to get filtered administratively, the troll used various alternate spellings and alternate characters to post the same message for a while in evasion of the filters. The admins won, the troll lost, and the episode was a learning experience similar to how to international domain name problem is also a learning experience.

        You don't need to be anonymous to know that any of this happened; it doesn't help in any way. Remembering a troll's tactics does not make you that troll.

        -----
        * I say almost-spam because the troll would often devote a few words of an otherwise invariant troll post to the topic of the article being trolled. It was odd. The posts were frankly more bizarre than offensive, despite their inflammatory language.

  • (Score: 4, Insightful) by requerdanos on Sunday March 11 2018, @03:48PM

    by requerdanos (5997) Subscriber Badge on Sunday March 11 2018, @03:48PM (#650937) Journal

    Well, comments like this one [soylentnews.org], below, could be incorporated into a test suite. Something's definitely very broken there.

    Let's imagine a site in the Russian-language world called привет.com (привет ~= "privyet" ~= "hi"). (This exists, with only a parking page at http://привет.com/.) [привет.com] [привет.com]

    What, for example, is "привет" and what does it have to do with "привет" or "privyet" or "xn--b1agh1afp"?

  • (Score: 2) by driverless on Sunday March 11 2018, @10:47PM

    by driverless (4770) on Sunday March 11 2018, @10:47PM (#651098)

    In security terms it's not actually that bad, none of the major browsers are vulnerable, the only one that is is a minor also-ran clone of Chrome.