Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday May 16 2018, @11:51PM   Printer-friendly
from the as-easy-as-that dept.

Most strings found on the Internet are encoded using a particular unicode format called UTF-8. However, not all strings of bytes are valid UTF-8. The rules as to what constitute a valid UTF-8 string are somewhat arcane. Yet it seems important to quickly validate these strings before you consume them.

In a previous post, I pointed out that it takes about 8 cycles per byte to validate them using a fast finite-state machine. After hacking code found online, I showed that using SIMD instructions, we could bring this down to about 3 cycles per input byte.

Is that the best one can do? Not even close.

Many strings are just ASCII, which is a subset of UTF-8. They are easily recognized because they use just 7 bits per byte, the remaining bit is set to zero. Yet if you check each and every byte with silly scalar code, it is going to take over a cycle per byte to verify that a string is ASCII. For much better speed, you can vectorize the problem in this manner:

Essentially, we are loading up a vector register, comparing each entry with zero and turning on a flag (using a logical OR) whenever a character outside the allowed range is found. We continue until the very end no matter what, and only then do we examine our flags.

We can use the same general idea to validate UTF-8 strings. My code is available.

If you are almost certain that most of your strings are ASCII, then it makes sense to first test whether the string is ASCII, and only then fall back on the more expensive UTF-8 test.

So we are ten times faster than a reasonable scalar implementation. I doubt this scalar implementation is as fast as it can be… but it is not naive… And my own code is not nearly optimal. It is not using AVX to say nothing of AVX-512. Furthermore, it was written in a few hours. I would not be surprised if one could double the speed using clever optimizations.

The exact results will depend on your machine and its configuration. But you can try the code.

The counter-rolling can actually be done logarithmically by shifting 1,2,4, etc., eg:

[4,0,0,0] + ([0,4,0,0]-[1,1,1,1]) = [4,3,0,0]

[4,3,0,0] + ([0,0,4,3]-[2,2,2,2]) = [4,3,2,1]

but in this case the distances didn’t seem big enough to beat the linear method.

The distances can even be larger than the register size I believe if the last value in the register is carried over to the first element of the next. It’s a good way to delineate inline variable-length encodings.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by wonkey_monkey on Friday May 18 2018, @08:12PM (2 children)

    by wonkey_monkey (279) on Friday May 18 2018, @08:12PM (#681354) Homepage

    You seem like someone who might have some interesting answers to a question I have:

    I've written a compiler which takes a somewhat esoteric RPN-based language and turns the output into video. It can output RGB values based on pixel coordinates and frame number, which means it can output zooming Mandelbrots [youtube.com] (not realtime speed) and things like this [youtube.com].

    It can also refer back to its last output frame, which makes it ideal for simple cellular automata. I've implemented Life (~2000fps on a 360x360 grid) and made a nice fire effect [imgur.com].

    You seem like someone who might have some good ideas of other effects I could try to implement. So, any suggestions?

    --
    systemd is Roko's Basilisk
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by MichaelDavidCrawford on Friday May 18 2018, @10:09PM

    Warp Life's was _highly_ nonlinear at first. After lots of thought and discussion I have a mostly-linear slide that uses a decaying average to determine what the actual speed of generation updates is.

    --
    Yes I Have No Bananas. [gofundme.com]
  • (Score: 2) by cafebabe on Monday May 21 2018, @07:05PM

    by cafebabe (894) on Monday May 21 2018, @07:05PM (#682324) Journal

    I recommend rendering star-fields at warp. I also recommend using a variation of Warnock's algorithm to minimize processing overhead.

    --
    1702845791×2