Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 8 submissions in the queue.
posted by hubie on Friday June 07, @01:04AM   Printer-friendly
from the programming-ain't-for-man-children dept.

Shuffling a set means randomly choosing an ordered sequence of its elements.

For example, shuffling {A,B,C} means choosing with equal probability one of 3! = 3 × 2 × 1 = 6 permutations: ABC, ACB, BAC, BCA, CAB, or CBA.

Easy-peasy, no?

Which programming problems did you encounter which looked easy, but were really a front for a Gordian Knot of subtle details -- and their consequences?


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Informative) by krishnoid on Friday June 07, @01:16AM (4 children)

    by krishnoid (1156) on Friday June 07, @01:16AM (#1359629)

    True-random numbers are cheap and plentiful on many computers today. Many CPUs support the unprivileged RDSEED instruction, which extracts entropy from an on-silicon, crypto-grade, NIST-compliant, non-deterministic thermal noise source.[11] On Linux, check /proc/cpuinfo to see if your CPU supports it.

    And before on-chip thermal randomness, there was lavarand [cloudflare.com].

    • (Score: 4, Funny) by ls671 on Friday June 07, @02:01AM

      by ls671 (891) on Friday June 07, @02:01AM (#1359635) Homepage

      Are those in the picture you posted butt plugs? You pervert!

      --
      Everything I write is lies, including this sentence.
    • (Score: 1, Interesting) by Anonymous Coward on Friday June 07, @07:20AM (1 child)

      by Anonymous Coward on Friday June 07, @07:20AM (#1359674)

      The advent of pointer-based inputs was a godsend in this regard also. The least significant bits of the mouse position as you move it are a good source of entropy, just not a lot of it. I think there was also some kind of virtual random device on Linux that was using that. Perhaps /dev/random is still there but I haven't thought about it in... longer than I care to admit... because who wants to be reminded they're old?

    • (Score: 5, Interesting) by PhilSalkie on Friday June 07, @11:12PM

      by PhilSalkie (3571) Subscriber Badge on Friday June 07, @11:12PM (#1359747)

      I know someone at a university in New Jersey who set up a physical randomness generator using a flowing water stream and various light sources and photodetectors to generate an output of ones and zeros from the perturbations along the side of the downflowing water.

      His attempts to verify that the generator was truly nulled (didn't generate more 1 bits than 0 bits, didn't make any patterns over short or long terms) found it was a rather sensitive seismometer - it detected footsteps throughout the building, doors opening and closing, and a longer term periodic signal (like a minute or so) which varied its time base over even longer periods (hours and days and months) but was never exactly rhythmic. (They had the detector mounted on a heavy base, but apparently nowhere near heavy enough.)

      Quite a bit of sleuthing later, they decided that it was most likely detecting the impact of waves on the Jersey shore, with the frequency and deviation amplitude varying with the tides, weather, and seasons.

  • (Score: 0) by Anonymous Coward on Friday June 07, @01:17AM (5 children)

    by Anonymous Coward on Friday June 07, @01:17AM (#1359630)
    AD authentication using LDAP.

    Somehow the same binary works on earlier Windows but not on more recent - the binary still runs but it gets a COM error - wrong username or password (even though it's the correct one and works right on older windows).

    Consequences - if I can't figure out what is wrong I might have to stop supporting this feature on newer Windows. Maybe if I figured out how to MITM the TLS using Wireshark I might get more clues.
    • (Score: 2) by ls671 on Friday June 07, @02:03AM (1 child)

      by ls671 (891) on Friday June 07, @02:03AM (#1359638) Homepage

      Install TLS cert in wireshark!

      --
      Everything I write is lies, including this sentence.
      • (Score: 0) by Anonymous Coward on Friday June 07, @06:57AM

        by Anonymous Coward on Friday June 07, @06:57AM (#1359673)
        I might have to get the private key for the TLS cert though. I don't have that at the moment...
    • (Score: 2) by krishnoid on Friday June 07, @02:29AM (2 children)

      by krishnoid (1156) on Friday June 07, @02:29AM (#1359642)

      You could look into how Samba does it [samba.org]. It can act as a full Active Directory server, so they might have some info on how to interoperate with LDAP or act as a proxy.

      • (Score: 1, Informative) by Anonymous Coward on Friday June 07, @06:55AM (1 child)

        by Anonymous Coward on Friday June 07, @06:55AM (#1359672)

        They might work like Windows Server 2008 R2:

        Samba operates at the forest functional level of Windows Server 2008 R2

        But my stuff works fine on Windows Server 2008 R2. Just doesn't seem to work on Windows Server 2022.

        So maybe the newer Windows do certain things differently?

        • (Score: 1, Redundant) by Ox0000 on Friday June 07, @01:34PM

          by Ox0000 (5111) on Friday June 07, @01:34PM (#1359698)

          WS2k8R2 was released 14 years ago (July 2009), WS2k22 was released 2 years ago (August 2021). That's a 12 year span in which you expect a piece of kit that does authentication to not have changed at all. On top of that, WS2k8R2 stopped being supported in 2015 for free with paid support ceasing in 2020, well before the release of WS2k22.

          sources:
          - https://en.wikipedia.org/wiki/Windows_Server_2008_R2#Support_lifecycle [wikipedia.org]
          - https://en.wikipedia.org/wiki/Windows_Server_2022 [wikipedia.org]

          Reread what you're asking for again please, this time more carefully...

          I know the whole mentality of "if it ain't broken, don't fix it" and "why do you keep changing the software so often" but this is windows we're talking about so it kinda is broken and it has bugs. So there's a need to change the software... because of crappy QAfree testers in their "insiders ring".

  • (Score: 5, Insightful) by PhilSalkie on Friday June 07, @01:41AM (1 child)

    by PhilSalkie (3571) Subscriber Badge on Friday June 07, @01:41AM (#1359633)

    I've lost count of the number of systems that I've had to parachute in and fix or that have s**t the bed because they have a tick counter that tracks milliseconds from boot (that's 49.7 days in a uint32) or fifths-of-a-second since some time in 1990 or whatever.

    There are well-known methods of dealing with this (don't compare tick counts, always subtract them and look at the result - so long as your longest possible time interval is less than half of the wrap-around duration) but it's mind-boggling the number of programmers who (just like Y2K - I was receiving Y2K-broken commercial software in March 1999) seem to think that their epoch counter will mystically go on forever and wrap-around won't be a massive failure mode IN UNDER TWO MONTHS. It's not just a rookie mistake, either - see "Boeing 787" for a high-value real-world example.

    This can affect time delay routines, display of time and date, sleep/wake timing, debouncing, and on and on - for things like delays, the failure might show up randomly like a ghost because there's a limited time window around the wrap-point that the effect will occur, so there's no problem, right? Until someone presses the button at just the right moment...

    The overall cost of defining the tick counter as a uint64 is generally so small, even on old 16 bit embedded CPUs, that it boggles the mind how many glitches are out there waiting to be triggered because of a dumb choice to save a couple of bytes.

    • (Score: 1, Funny) by Anonymous Coward on Friday June 07, @10:13AM

      by Anonymous Coward on Friday June 07, @10:13AM (#1359681)

      Tick counters

      I've lost count ...

      But you haven't lost the ticks, right?

  • (Score: 3, Insightful) by PhilSalkie on Friday June 07, @01:56AM (1 child)

    by PhilSalkie (3571) Subscriber Badge on Friday June 07, @01:56AM (#1359634)

    The thing about shuffling a deck of cards (or a list of a million items) is that you'd like an algorithm that does it in a reasonable time frame.

    It's fine to say "I have a crypto-quality RNG" - but what are you going to DO with that? Easy - just generate a number from 1-52, then grab that card from the source list and put it first in the shuffled list. Rinse, repeat - but what about when you hit a spot with no card? If you just walk down the list until you find a card, maybe you've messed up the randomness. Do you just generate another random (will take unknown time as the source list gets sparser and sparser), or re-pack the list so there are no empties (costly like a bubble sort as you're re-arranging cards in memory all the time)?

    A not terrible answer is to do a software riffle-shuffle - split the source list in half, then randomly take a card from one half or the other, walking your way down each list until you've moved all the cards to the sorted list. Then repeat in the other direction - do that a few times and you've got a decently sorted deck. (You can use more than two parts, of course - split the pack into quarters if you like, or Nths if it's a big big list.)

    A line we use in the controls industry is "Just because it works doesn't mean it's right" - there's lots of terrible ways to accomplish a task, you can often find someone out there who's analyzed the problem to death and back, use that expertise. Nothing wrong with standing on the shoulders of giants when you can.

    • (Score: 4, Informative) by Anonymous Coward on Friday June 07, @03:43AM

      by Anonymous Coward on Friday June 07, @03:43AM (#1359652)

      It's fine to say "I have a crypto-quality RNG" - but what are you going to DO with that? Easy - just generate a number from 1-52, then grab that card from the source list and put it first in the shuffled list. Rinse, repeat - but what about when you hit a spot with no card? If you just walk down the list until you find a card, maybe you've messed up the randomness. Do you just generate another random (will take unknown time as the source list gets sparser and sparser), or re-pack the list so there are no empties (costly like a bubble sort as you're re-arranging cards in memory all the time)?

      A good way to do it is roughly described in the article and does not involve any "repacking". The method is often known as the "Knuth Shuffle" because of its description in The Art of Computer Programming although he was not the first to write about it. The article credits the method to Durstenfeld but he was also not the first to write about it (although he might have been the first to write about it in the context of computer programs).

      Basically:
      1. generate a number x (uniformly at random) between 0 and 51 (inclusive). If x is not 0, swap the cards in position 0 and position x.
      2. generate a number x (uniformly at random) between 1 and 51 (inclusive). If x is not 1, swap the cards in position 1 and position x.
      ...
      51. generate a number x (uniformly at random) between 50 and 51 (inclusive). If x is not 50, swap the cards in position 50 and position 51.

      Now the deck is shuffled. If all the selections had equal probability of picking any particular value from their respective ranges, then all possible permutations of cards have equal probability of occurring. The article then discusses how the challenge now is to actually do that.

      Usually we can start with a generator that produces random values uniformly between 0 and 2n−1 (inclusive), for some fixed n (basically all generators worth using on a binary computer have this property). We can assume n is big enough for the purpose (if it is not, then we can construct a wider generator by just concatenating the output of multiple cranks of the original -- the result will still be uniformly distributed).

      Then it is simply a matter of defining a function which takes values in the interval [0, 2n-1] and returns a value in the range [i, 51], for i = 0 through 50. Usually you'll find people writing something like x % (52-i) + i, but as the article describes, this is biased (i is more likely to be output than 51) except when (52-i) is an exact power of 2. The only way to fully eliminate the bias is by rejecting one or more input values and rerolling the generator in some cases.

      It is the same basic problem as dividing, say, 8 candies amongst 5 children: Either one child is getting more candy than another or you eat three yourself and give them one each. Eating the candy corresponds to rerolling the generator and so in that case we'd expect to reroll the RNG 3 in every 8 tries, a bit less than half the time.

      But we can make the problem irrelevant because we can make arbitrarily large numbers of candies. Dividing, say, 4,194,304 candies amongst 5 children doesn't change the fundamental problem, but in terms of fairness it probably doesn't matter much if one child got only 838,860 candies while the rest got 838,861. Or if it truly does matter, then you only need to eat four to give them all equal amounts, corresponding to rerolling the RNG about 1 in every million tries, approximately never.

  • (Score: 3, Interesting) by UncleBen on Friday June 07, @02:02AM (1 child)

    by UncleBen (8563) on Friday June 07, @02:02AM (#1359637)

    It just kept getting more complicated the more I tried to clean up. Turns out there a lot more ways to misspell or abbreviate “university” than simple combinatorial math would indicate.

    • (Score: 2) by ls671 on Friday June 07, @02:14AM

      by ls671 (891) on Friday June 07, @02:14AM (#1359639) Homepage

      I'd assume only the ones matching another existing word would be hard to clean.

      --
      Everything I write is lies, including this sentence.
  • (Score: 4, Insightful) by dwilson98052 on Friday June 07, @02:17AM (4 children)

    by dwilson98052 (17613) on Friday June 07, @02:17AM (#1359640)

    ...but people like to make it sound a lot harder than it is by showing you the dumbest possible examples.

    • (Score: 3, Informative) by crm114 on Friday June 07, @02:46AM (2 children)

      by crm114 (8238) Subscriber Badge on Friday June 07, @02:46AM (#1359647)

      I was thinking the same.... I went over the hardest problems I had to solve - and ... when looking back it came to 1) my own stupidity, or 2) lack of knowledge.

      In a few cases I was allowed to go fix my own stupidity with increased knowledge - with vastly better results.

      • (Score: 4, Interesting) by dwilson98052 on Friday June 07, @03:29AM (1 child)

        by dwilson98052 (17613) on Friday June 07, @03:29AM (#1359650)

        Funny how that works.... each and every time I have to update any of my older scripts I end up rewriting huge sections to streamline things with what I've learned over the years.

        I have one script that used to ingest a single file type and generate a plain text report in about a minute.... over the years I've added several more file types, all formatted differently but I've managed to get the report time down to an average of under 2 seconds with graphs despite the fact that I'm processing about 10x the data as I originally did.

        Most of the improvements came from not writing the files to disk unless absolutely necessary, running individual sections of the report in parallel, and converting all the data types and importing them into sqlite.

        But I also took a close look at some things like less than optimal regexs and ran a bunch of benchmarks on things like sed/awk/grep/printf/echo to find the optimal way of dealing with the data. Sometimes I'd save 100+ms, other times it was less than 1ms on average... the other folks on the team think I'm obsessive and crazy but the end users love how fast it is.

        • (Score: 4, Interesting) by PhilSalkie on Friday June 07, @11:20PM

          by PhilSalkie (3571) Subscriber Badge on Friday June 07, @11:20PM (#1359749)

          I once spent quite a bit of time trying to speed up some software (written in perl, of all things.) Lots of obvious places to improve, but didn't get the results I'd hoped for. Eventually I installed a profiler to see what the heck I was missing, and found that the program was spending 100-epsilon% of its time in a .CSV file parsing module. Since I was generating the .CSV files, I didn't need the full functionality of the parsing module which could handle every conceivable edge case ("OK, now what if you have a quoted string that contains commas, spaces, null characters, unicode, and two kinds of quote marks?", so I switched to a more basic-functionality parsing module. OMG, the difference in the software was astonishing, and I never ever would have guessed that was where all the CPU time was going.

    • (Score: 4, Insightful) by aafcac on Friday June 07, @04:30AM

      by aafcac (17646) on Friday June 07, @04:30AM (#1359656)

      The issue tends to be a lack of clarity about what the problem is and a lack of understanding of the operating conditions. If you haven't got those things, then you're either going to get a simple solution that outright fails or you're never actually going to be done because there's a ton of stuff that didn't occur early enough to be properly dealt with.

  • (Score: 5, Interesting) by DrkShadow on Friday June 07, @03:21AM (1 child)

    by DrkShadow (1404) on Friday June 07, @03:21AM (#1359649)

    A filesystem seems easy. Directories and files. Easy!

    - what is the parent inode of the mountpoint root?
    - is the given "thing", if you haven't recorded whether it's a directory or not, a leaf object or does it have descendents?
    - How can you best get descendents, if you're storing things in a database? (full paths, for each file? that's horribly space inefficient)
    - What happens when you cross into a sub-mount, you have to check and ensure that you're referencing a parent on the same device

    and etc. Filesystems are terribly mucky, when what you're trying to do is maintain a list of objects on them, without implementing a whole filesystem. Handling multiple links, handling device roots, things mounted within a filesystem, how to get the parent (just store the parent), how to get things at a given file path (well, split on slashes and..), how to represent this in databases (apparently there's a WITH RECURSIVE), how to check for the base-case in your recursive SQL query (selecting from a recursive table gives you all intermediate rows as well), and on and on and on.

    Seems simple.

    • (Score: 4, Informative) by krishnoid on Friday June 07, @03:36AM

      by krishnoid (1156) on Friday June 07, @03:36AM (#1359651)

      Everyone who writes a filesystem in FUSE [github.com] gets a chance to consider these problems and review existing implementations, so while it may not make things easy, it does provide references to consult.

      You can also do some interesting things [sqlite.org] using WITH RECURSIVE -- again, with some reference documentation but it is definitely tough to wrap your mind around.

  • (Score: 3, Interesting) by owl on Friday June 07, @03:43AM

    by owl (15206) on Friday June 07, @03:43AM (#1359653)

    When reading the summary, my first thought was "yes", just use a Fisher-Yates shuffle and a good randomness source.

    Then I started reading through the article, and indeed, that is what the article is also suggesting (although the article is referencing Durstenfeld's in place variant of the Fisher-Yates shuffle).

  • (Score: 4, Interesting) by krishnoid on Friday June 07, @03:51AM (4 children)

    by krishnoid (1156) on Friday June 07, @03:51AM (#1359654)

    Say you want to create database tables that describe all buildings on your corporate campus. If you're a developer and don't ask anyone, you can just make something up. Bring actual stakeholders in, and:

    • Facilities: power, water, electrical, gas requirements, addresses
    • Finance: depreciation, property taxes, square footage
    • IT: Wi-Fi router locations, network cable routing and closets
    • HR: breakrooms, restrooms, occupancy, other compliance issues
    • Management: conference room locations, number of hardwall offices

    I'm sure I'm missing other things. I knew someone hired into IT called in to review their international site-to-site networking, and one of his first meetings with the heads of various departments, he asked for the list of all the buildings owned/occupied by the company.

    They chuckled at each other, then told him that nobody had a complete list of their buildings. He had to put together a master list by emailing employees and asking them what building they were in and its street address. I wouldn't be surprised if this happens at a lot of companies.

    • (Score: 3, Insightful) by maxwell demon on Friday June 07, @04:12AM (1 child)

      by maxwell demon (1608) on Friday June 07, @04:12AM (#1359655) Journal

      That's no programming problem, though. If you were given all the relevant information up front, writing the database would be easy.

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 3, Insightful) by krishnoid on Friday June 07, @04:34AM

        by krishnoid (1156) on Friday June 07, @04:34AM (#1359658)

        Still could be a Gordian knot of subtle problems, though. Like representing international addresses, phone numbers, currencies, validation, maybe even international legal classifications of buildings or leases.

    • (Score: 4, Insightful) by PhilSalkie on Friday June 07, @11:25PM

      by PhilSalkie (3571) Subscriber Badge on Friday June 07, @11:25PM (#1359750)

      I had a professor once who advised us to first look at the blank paper forms that we'd be asked to create, then insist on getting the hand-filled-out forms from the users.

      What we were looking for were all the blank fields which never got anything written in them, and all the marginal and inter-line notations that were added because the printed forms weren't sufficient.

      Then (back in the days of printouts generated by the MIS department) we should find the janitor to ask them which reports got thrown out each day without ever being opened, and stake out the photocopiers to find out what reports were getting 100 copies made each night because not enough originals were printed.

      The difference between "What the customer thinks they need" and "What the customer actually needs" can often be extremely large.

    • (Score: 3, Insightful) by evilcam on Thursday June 13, @11:08AM

      by evilcam (3239) Subscriber Badge on Thursday June 13, @11:08AM (#1360343)

      Once upon a time, working for a State Government agency, trying to rationalise their CMDB, I discovered 55 separate entries for the location of users in a 5-floor building in the CBD.
      The different permutations of Street/st, Level/Lvl/L, whether a suburb and postcode were included, a country... Best part was it was the Department of Housing's head office.

  • (Score: 2, Interesting) by Anonymous Coward on Friday June 07, @05:20AM (1 child)

    by Anonymous Coward on Friday June 07, @05:20AM (#1359665)

    Generating valid random floating point numbers is a lot more difficult than it looks up front. Parsing and printing numbers in arbitrary bases is similar - the naive implementation is relatively straightforward but the waters get deep when you want to do thing efficiently or in quantity. I've also thought that it was interesting that there's only really one algorithm for shuffling, whereas sorting has too algorithms many to count. Printing properly formatted and justified tables is a fun problem with a lot of edge cases.

    • (Score: 3, Insightful) by PhilSalkie on Friday June 07, @12:53PM

      by PhilSalkie (3571) Subscriber Badge on Friday June 07, @12:53PM (#1359693)

      I hate it when developers default to quicksort (which is not an order-preserving sort) in situations where the ability to perform multiple consecutive sorts would be useful and intuitive. Order-preserving sorts have gotten the reputation of being inefficient, so they don't get used even when their inefficiency doesn't matter - for instance, in a grocery list app. I'd much rather that app used bubble sort, so I could sort by name, then sort by aisle, and wind up with a list of aisles with what's in each aisle listed alphabetically.

  • (Score: 3, Insightful) by day of the dalek on Friday June 07, @05:55AM

    by day of the dalek (45994) on Friday June 07, @05:55AM (#1359667) Journal

    The VIC-II [wikipedia.org] chip did hardware-based collision detection, which made things easy. It wouldn't surprise me if there are other hardware implementations of collision detection, too.

    Software-based collision detection and resolution seems simple, but doing it well and doing it efficiently aren't as easy as it seems. I actually posted a journal [soylentnews.org] asking about this topic from my previous account.

    The separating axis theorem is used to detect collisions between two convex polygons. It's not complicated, just some math involving vectors. If you have two polygons, move them around, and detect collisions at discrete timesteps, it's rather simple. If you want more accurate and realistic collisions between objects, you can use continuous collision detection. But that's considerably more complex and computationally intensive. It's also challenging to do efficient collision detection if you have a lot of objects in the world and want to detect if any of them are colliding. If you want to test every object in the world against all the other objects, that's O(n2) complexity. There are ways to avoid testing every object against every other object in the world, and quadtrees are one way to do this.

    It seems simple at first to detect and resolve collisions, just a bit of vector math. But it quickly becomes more complicated if you want to increase the accuracy of the collision detection and do it efficiently with a lot of objects.

  • (Score: 1) by shrewdsheep on Friday June 07, @09:44AM (7 children)

    by shrewdsheep (5215) on Friday June 07, @09:44AM (#1359678)

    The problem becomes trivial, if you think about it the right way. If there are N permutations you want to draw uniformely from, you draw a number uniformly between 1 and N and convert it to a permutation. The latter step is left as an exercise to the reader. If the number of permutations becomes large you may have to move to BigInts.

    • (Score: 2, Insightful) by khallow on Friday June 07, @12:41PM (5 children)

      by khallow (3766) Subscriber Badge on Friday June 07, @12:41PM (#1359692) Journal
      Drawing uniformly from [1, ..., N] where N is enormous, is a tricky problem. For the latter problem for the reader, one way would be to take x as your number selected via the uniform distribution on N and k=2, ..., m where N = m! is "large". Then perform the following operations:

      x_1 = x, s_k = x_{k-1} mod k, x_k = (x_{k-1} - s_k)/k

      For example, if I pick 81 at random from 120:

      81 % 2 = 1
      40 % 3 = 1
      13 % 4 = 1
      1 % 5 = 1

      If I use those numbers as guidance to pick from the list of k items (with 0 being in first place), I'm always picking the second item from the list. So 2, 3, 4, 5, 1. Stop after k=m. I think there are common sorts of bias in random number bias that would be exaggerated by this algorithm though. How about?

      x_1 = x, s_k = Trunc(x_{k-1}/(n!/k!)), x_k = x_{k-1} - s_k*k

      Picking the same totally random 81 (I think this one has to be from the range 0 ... 119 actually):

      81/60 = 1
      21/20 = 1
      1/5 = 0
      1/1 = 1

      Almost the same except I pick the first item at the second step. Resulting permutation is 2, 1, 4, 5, 3.

      • (Score: 2, Informative) by shrewdsheep on Friday June 07, @01:01PM (4 children)

        by shrewdsheep (5215) on Friday June 07, @01:01PM (#1359695)

        Here is how you can uniformly draw large uniform integers. Draw M independent bits so that M = ceiling(log2(N)), where N is the range you want to draw from (0, ... N -1) and interpret the bits b_i as an integer b = (b_1, ..., b_M). Then calculate floor(N * b/(2^M)).

        • (Score: 1) by shrewdsheep on Friday June 07, @01:12PM (1 child)

          by shrewdsheep (5215) on Friday June 07, @01:12PM (#1359696)

          Seems like the algorithm is complete. Further exercise: feed the descriptions above to your favorite LLM and have it produce code in your favorite language. For good measure, I suggest brainf*ck, as it will produce large code in line with the large numbers, we are talking about.

          • (Score: 3, Funny) by khallow on Friday June 07, @02:31PM

            by khallow (3766) Subscriber Badge on Friday June 07, @02:31PM (#1359702) Journal
            Whether doing your homework or programming critical systems, SN has the answers you desperately need!
        • (Score: 2, Disagree) by maxwell demon on Friday June 07, @07:12PM (1 child)

          by maxwell demon (1608) on Friday June 07, @07:12PM (#1359725) Journal

          That's not uniform, though. Try with N = 2^k+1 for some k.

          The simplest (but most wasteful) way to get an uniform distribution is to compare your integer b to N, and if it is not smaller than N, throw it away and start over. Otherwise, take it as your result.

          --
          The Tao of math: The numbers you can count are not the real numbers.
          • (Score: 1) by shrewdsheep on Saturday June 08, @08:55PM

            by shrewdsheep (5215) on Saturday June 08, @08:55PM (#1359873)

            I agree that due to rounding, uniformity is not achieved precisely. The approximation can be be made as accurate as desired by increasing b. In practice using 8 bits in excess of what was suggested would be quite accurate. Your solution (called rejection sampling) would actually not be that inefficient as it would require at most double the number of sampling steps on average. Using standardization (as suggested above) would be deterministic in terms of number of instructions, however.

    • (Score: 1, Touché) by Anonymous Coward on Saturday June 08, @12:37AM

      by Anonymous Coward on Saturday June 08, @12:37AM (#1359758)

      If you think about it the right way, the trivial right way is:

      use List::Util 'shuffle';
      @shuffled = shuffle(@list);

      😉

      This way I don't even need to do documentation or long term support for that code. Or type as much.

  • (Score: 4, Informative) by DadaDoofy on Friday June 07, @12:39PM (2 children)

    by DadaDoofy (23827) on Friday June 07, @12:39PM (#1359691)

    Date and time calculations. Anyone who's ever tried knows.

    • (Score: 3, Informative) by donkeyhotay on Friday June 07, @10:17PM (1 child)

      by donkeyhotay (2540) on Friday June 07, @10:17PM (#1359741)

      I was going to say this as well. Some of the more difficult problems I've run into over the years involve date-time "math".

      • (Score: 2) by evilcam on Thursday June 13, @11:03AM

        by evilcam (3239) Subscriber Badge on Thursday June 13, @11:03AM (#1360341)

        Straight up fuck datetime.
        The number of power automate things I've given up on over the years because of DateTime...

  • (Score: 3, Interesting) by pdfernhout on Friday June 07, @02:07PM

    by pdfernhout (5984) on Friday June 07, @02:07PM (#1359700) Homepage

    from the late 1970s: (1st or 2nd editions; 3rd was butchered)
    https://www.bkent.net/catalogsource.htm [bkent.net]
    "For some time now my work has concerned the representation of information in computers. The work has involved such things as file organizations, indexes, hierarchical structures, network structures, relational models, and so on. After a while it dawned on me that these are all just maps, being poor artificial approximations of some real underlying terrain.
        These structures give us useful ways to deal with information, but they don't always fit naturally, and sometimes not at all. Like different kinds of maps, each kind of structure has its strengths and weaknesses, serving different purposes, and appealing to different people in different situations. Data structures are artificial formalisms. They differ from information in the same sense that grammars don't describe the language we really use, and formal logical systems don't describe the way we think. "The map is not the territory" [Hayakawa].
        What is the territory really like? How can I describe it to you? Any description I give you is just another map. But we do need some language (and I mean natural language) in order to discuss this subject, and to articulate concepts. Such constructs as "entities", "categories", "names", "relationships", and "attributes" seem to be useful. They give us at least one way to organize our perceptions and discussions of information. In a sense, such terms represent the basis of my "data structure", or "model", for perceiving real information. Later chapters discuss these constructs and their central characteristics -- especially the difficulties involved in trying to define or apply them precisely.
        Along the way, we implicitly suggest a hypothesis (by sheer weight of examples, rather than any kind of proof -- such a hypothesis is beyond proof): there is probably no adequate formal modelling system. Information in its "real" essence is probably too amorphous, too ambiguous, too subjective, too slippery and elusive, to ever be pinned down precisely by the objective and deterministic processes embodied in a computer. (At least in the conventional uses of computers as we see them today; future developments in artificial intelligence may endow these machines with more of our capacity to cope.) This follows a path pointed out by Zemanek, connecting data processing with certain philosophical observations about the real world, especially the aspects of human judgement on which semantics ultimately depend [Zemanek 72]."

    --
    The biggest challenge of the 21st century: the irony of technologies of abundance used by scarcity-minded people.
  • (Score: 2) by VanessaE on Friday June 07, @03:21PM (4 children)

    "For example, shuffling {A,B,C} means choosing with equal probability one of 3! = 3 × 2 × 1 = 6 permutations"

    Wat... Are programmers getting stupid here? This has to be one of the most basic programming tasks ever... Here's how I'd do it:

    1) create a new, empty array
    2) generate a random number to pick an item in the original array
    3) copy that item from the original array to the end of the new array
    4) in the original array, overwrite that now-redundant item with the one at the end of the array
    5) reduce the length of the original array by 1
    6) repeat from step (2) until the original array is exhausted.
    7) Use the new array in place of the original.

    (4) and (5) are to save CPU by avoiding rippling all of the later items up the array as entries are used-up. This probably only helps if the array uses fixed-length fields where a simple overwrite of an item won't force everything else to move around to accommodate it, and only on slower platforms or where the array is especially large.

    Memory usage and time to shuffle scale linearly with array size. Surely I'm not the first to come up with this method. Come on.

    • (Score: 2) by owl on Friday June 07, @06:15PM (1 child)

      by owl (15206) on Friday June 07, @06:15PM (#1359722)

      You've described, in a somewhat different than typical way, the Fisher-Yates shuffle [wikipedia.org]. TFA ultimately discusses a particular implementation of Fisher-Yates created by Durstenfeld. Your description above is similar to Durstenfeld variation of Fisher-Yates.

    • (Score: 2) by maxwell demon on Friday June 07, @06:57PM

      by maxwell demon (1608) on Friday June 07, @06:57PM (#1359724) Journal

      There is an easy optimization of that algorithm: Instead of writing to a new array, write the numbers into the free spots you're making at the end of the array, which conveniently grows at the same speed as the new array you're building. Although it's probably more common to use the beginning of the array.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 1, Funny) by Anonymous Coward on Saturday June 08, @12:48AM

      by Anonymous Coward on Saturday June 08, @12:48AM (#1359761)

      Here's how I'd do it:

      use List::Util 'shuffle';
      @shuffled = shuffle(@list);

      Advantages:
      1) Less typing
      2) Less documentation needed to be done by me
      3) Less support needed to be done by me

      Disadvantages:
      1) You need to use Perl

      😉

      Of course in a different language there might be a similar library/function that does a similar thing.

      e.g. https://learn.microsoft.com/en-us/dotnet/api/system.random.shuffle?view=net-8.0#system-random-shuffle-1(-0()) [microsoft.com]

(1)