Zero-width characters are invisible, ‘non-printing’ characters that are not displayed by the majority of applications. For example, I’ve inserted 10 zero-width spaces into this sentence, can you tell? (Hint: paste the sentence into Diff Checker to see the locations of the characters!). These characters can be used to ‘fingerprint’ text for certain users.
Well, the original reason isn’t too exciting. A few years ago I was a member of a team that participated in competitive tournaments across a variety of video games. This team had a private message board, used to post important announcements amongst other things. Eventually these announcements would appear elsewhere on the web, posted to mock the team and more significantly; ensuring the message board was redundant for sharing confidential information and tactics.
The security of the site seemed pretty tight so the theory was that a logged-in user was simply copying the announcement and posting it elsewhere. I created a script that allowed the team to invisibly fingerprint each announcement with the username of the user it is being displayed to.
I saw a lot of interest in zero-width characters from a recent post by Zach Aysan so I thought I’d publish this method here along with an interactive demo to share with everyone. The code examples have been updated to use modern JavaScript but the overall logic is the same.
(Score: 3, Interesting) by maxwell demon on Friday April 06 2018, @06:55AM (1 child)
Actually, you can just look at it with less. Then you even get the Unicode code numbers in a readable form:
(Note that the Unicode code points are shown inverted, so you can distinguish them from an ASCII character sequence of the same form).
The Tao of math: The numbers you can count are not the real numbers.
(Score: 3, Interesting) by FatPhil on Friday April 06 2018, @08:00AM
At least less implemented the escaping functionality correctly, locale aware - when you unset LANG you'll get:
F<E2><80><8B>or exam<E2><80><8B>ple, I<E2><80><99>ve ins<E2><80><8B>erted 10 ze<E2><80><8B>ro-width spa<E2><80><8B>ces in<E2><80><8B>to thi<E2><80><8B>s sentence, c<E2><80><8B>an you tel<E2><80><8B><E2><80><8B>l?
Which turns unicode into moar garbage, which I think is fitting.
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves