SoylentNews Comments | Abusing A CPU’s Adders To Optimize Bit Counting

Abusing A CPU’s Adders To Optimize Bit Counting

posted by Fnord666 on Thursday June 20 2019, @09:43AM

from the abusing-bits-for-fun-and...profit? dept.

Submitted via IRC for Bytram

Abusing A CPU's Adders To Optimize Bit Counting

If you like nitpicking around C code and generated assembly language — and we’ll admit that we do — you shouldn’t miss [Scaramanga’s] analysis of what’s known as Kernighan’s trick. If you haven’t heard of the trick, it’s a pretty efficient way of counting bits.
Like the Wheatstone bridge and a lot of other things, the Kernighan trick is misnamed. Brian Kernighan made it famous, but it was actually first published in 1960 and again in 1964 before he wrote about it in 1988. The naive way to count bits would be to scan through each bit position noting how many one bits you encounter, but the problem is, that takes a loop for each bit. A 64-bit word, then, takes 64 loops no matter what it contains. You can do slightly better by removing each bit you find and stopping when the word goes to zero, but that still could take 64 cycles if the last bit you test is set.
Using the trick, you make the observation that X & (X-1) will always clear the least significant bit of a word. Try a few examples:

X X-1 X&(X-1)

0001 0000 0000

0010 0001 0000

0011 0010 0010

1010 1001 1000

1100 1011 1000

1111 1110 1110

You can probably see where this is going. By computing X&(X-1) you clear a bit on each loop iteration and you only have to go through the number of bits that are actually set.
[...] If you like this sort of thing, be sure to check out [Sean Anderson’s] extensive list of bit hacks. It shows several different ways to count bits and do other common and uncommon tasks with different tradeoffs. For example, you could dedicate a 256-entry lookup table and do the whole thing with one loop per byte with great speed but bad memory utilization. Always a trade-off.

X	X-1	X&(X-1)
0001	0000	0000
0010	0001	0000
0011	0010	0010
1010	1001	1000
1100	1011	1000
1111	1110	1110

Original Submission

Starting Score:

point

Moderation

Interesting=1, Total=1

Extra 'Interesting' Modifier

Karma-Bonus Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

Abusing A CPU’s Adders To Optimize Bit Counting | Log In/Create an Account | Top | 31 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

LUT compromise (Score: 3, Interesting) by istartedi on Thursday June 20 2019, @04:45PM

by istartedi (123) on Thursday June 20 2019, @04:45PM (#858044) Journal

If doing this on a byte, I might go for a shift and a nyble, so you'd have a 16-entry LUT and much better memory use.

As always, code and test if it really matters. I used to do this myself and got some interestingly counter-intuitive results. I definitely wouldn't even consider anything that looped over all the bits if it mattered, and a lot of times it doesn't.

It's been said that premature optimization is the root of all evil. Of course you generally prefer faster algorithms, but don't obsess. Make sure your code works as designed and is secure. Then profile and optimize the parts that are really holding things back if necessary.

The last thing you want is to have a perfectly polished subroutine that ends up being factored out of the final product.

--
Appended to the end of comments you post. Max: 120 chars.

Starting Score:	1		point
Moderation		+1
Interesting=1, Total=1
Extra 'Interesting' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		3

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Abusing A CPU’s Adders To Optimize Bit Counting

LUT compromise (Score: 3, Interesting) by istartedi on Thursday June 20 2019, @04:45PM