Stories
Slash Boxes
Comments

SoylentNews is people

posted by Woods on Wednesday August 06 2014, @03:32PM   Printer-friendly
from the other-spreadsheet-programs-can-do-it-too dept.

Glyn Moody reports

A few years back, people were rather disturbed to find out about the famous Excel bug, whereby the multiplication of two numbers in Microsoft's spreadsheet gave the wrong number. It turns out there are other circumstances in which Excel (and, to be fair, presumably other spreadsheets) can give incorrect results, but they are unlikely to be encountered in typical everyday tasks. However, in the specialized world of bioinformatics, which uses computers to analyze data about genes and related areas, careless use of spreadsheets can throw up a significant numbers of errors, as this paper in BMC Bioinformatics explains:

Use of one of the research community's most valuable and extensively applied tools for manipulation of genomic data can introduce erroneous names. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] was being converted to '1-DEC.'

Here we have the interesting interaction of two very different fields, where the name of a gene involved in esophageal cancer, DEC1, was interpreted by Excel to mean the date, 1 December. As the paper points out, these kinds of substitution errors are already to be found in key public databases:

DEC1, a possible target for cancer therapy, was incorrectly rendered, and it could potentially be missed in downstream data analysis. The same type of error can infect, and propagate through, the major public data resources. For example, this type of error occurs several times in even the immaculately curated LocusLink database.

As that notes, a gene that might be relevant for treating cancer could well be missed because of this incorrect conversion to a date by Excel. Although it is unlikely that any serious harm has been caused by this yet it's a useful reminder of the dangers of depending a little too heavily on the results of software without checking for corruption of this kind.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Wednesday August 06 2014, @03:39PM

    by Anonymous Coward on Wednesday August 06 2014, @03:39PM (#78056)

    And it can be a little fucker to deal with, especially if you don't happen to notice what it has done. I guess autocomplete is a kinda good idea but it should make sure to notify the user about any changes.

    • (Score: 4, Insightful) by DNied on Wednesday August 06 2014, @04:05PM

      by DNied (3409) on Wednesday August 06 2014, @04:05PM (#78070)

      Whenever software tries to be "too smart", it just ends up being stupid.

    • (Score: 0) by Anonymous Coward on Wednesday August 06 2014, @05:03PM

      by Anonymous Coward on Wednesday August 06 2014, @05:03PM (#78101)

      The first thing I do when I install OO is turn off all the autocomplete, set the file type back to MS defaults, and reset ctrl-d and crtl-l to their correct functions.

    • (Score: 2) by bob_super on Wednesday August 06 2014, @07:05PM

      by bob_super (1357) on Wednesday August 06 2014, @07:05PM (#78164)

      That's my main problem with smartphone autocompletes.
      Try mixing marketing BS, technical terms and two or three languages, and you quickly disable everything.

      So I still use a phone with a physical keyboard to avoid being super-slow without the assistance.

      • (Score: 3, Informative) by Marand on Wednesday August 06 2014, @08:49PM

        by Marand (1081) on Wednesday August 06 2014, @08:49PM (#78207) Journal

        That's my main problem with smartphone autocompletes.
        Try mixing marketing BS, technical terms and two or three languages, and you quickly disable everything.

        So I still use a phone with a physical keyboard to avoid being super-slow without the assistance.

        I found a happy medium on android with a setting tweak for MultiLing Keyboard [google.com]. It has separate prediction (autocomplete) settings for manual input versus swipe/gesture input, so I turned prediction off for manual input and left it on for the swiping. Swipe away, autocomplete the common words, and then when I have to input something I know the prediction will screw up, I tap it in manually.

        That one change fixed nearly all my autocomplete annoyances.

  • (Score: 4, Insightful) by velex on Wednesday August 06 2014, @03:41PM

    by velex (2068) on Wednesday August 06 2014, @03:41PM (#78057) Journal

    Why the hell is anyone using Excel for this? Of course Excel regularly mangles things. Take New England zipcodes for instance or phone numbers. Good god, people. Computers aren't magick. Know what the fuck the program you're using does. It might not be designed or configured for the purpose you're using it for out of the box and it sure as fuck isn't going to read your mind!

    • (Score: 3, Insightful) by doublerot13 on Wednesday August 06 2014, @03:43PM

      by doublerot13 (4497) on Wednesday August 06 2014, @03:43PM (#78058)

      Excel, just like any other tool, can either be very effective or dangerous.

      Put the blame where it belongs.[PEBKAC]

      • (Score: 3, Insightful) by velex on Wednesday August 06 2014, @03:47PM

        by velex (2068) on Wednesday August 06 2014, @03:47PM (#78060) Journal

        Very correct. They forgot to change the data type to general or text, and they didn't realize that Excel will make wild guesses about the data type you're putting in it. All to be user friendly I suppose. Of course, user-friendly depends on the user.

        • (Score: 3, Informative) by mrider on Wednesday August 06 2014, @04:42PM

          by mrider (3252) on Wednesday August 06 2014, @04:42PM (#78086)

          What's irksome is how frequently "user friendly" means "ignore your instructions and do what I think is correct" (where "I" means "the software").

          --

          Doctor: "Do you hear voices?"

          Me: "Only when my bluetooth is charged."

    • (Score: 2) by Dunbal on Wednesday August 06 2014, @03:49PM

      by Dunbal (3515) on Wednesday August 06 2014, @03:49PM (#78063)

      "now what the fuck the program you're using does."

      But then we can't hire Amritsar for $1 an hour to do it for us.

    • (Score: 5, Funny) by Nerdfest on Wednesday August 06 2014, @04:05PM

      by Nerdfest (80) on Wednesday August 06 2014, @04:05PM (#78069)

      I'm mainly of the opinion that Excel can be put into the same classification as regular expressions frequently are. Some people, when confronted with a problem, think "I know, I'll use Excel." Now they have two problems.

      • (Score: 3, Interesting) by mrider on Wednesday August 06 2014, @04:44PM

        by mrider (3252) on Wednesday August 06 2014, @04:44PM (#78089)

        There needs to be an "Amusingly Insightful" moderation category. :)

        --

        Doctor: "Do you hear voices?"

        Me: "Only when my bluetooth is charged."

    • (Score: 1) by Wootery on Thursday August 07 2014, @11:09AM

      by Wootery (2341) on Thursday August 07 2014, @11:09AM (#78373)

      it sure as fuck isn't going to read your mind

      But, as we're seeing, it'll damn well try.

      [Rambling about proper relational databases with proper handling of types]

  • (Score: 1, Troll) by ThG on Wednesday August 06 2014, @03:49PM

    by ThG (4568) on Wednesday August 06 2014, @03:49PM (#78061)

    Using a Microsoft product for something important.

    • (Score: 5, Interesting) by Nth_man on Wednesday August 06 2014, @04:35PM

      by Nth_man (4444) on Wednesday August 06 2014, @04:35PM (#78083)

      > Using a Microsoft product for something important.

      To reproduce a typical Excel bug:

      - Go to Excel.
      - In A1, type
              0.1
      - In B1, type
              =1.95-1.85
      - In A2, type
              =IF(A1=0.1;"Ok";"THE HORROR")
      - Go back to A2, press Ctrl+c.
      - Go to B2, press Ctrl+v.
      - See "THE HORROR".

      DO GOVERNMENTS MAKE US PAY REGULARLY SO THAT THEY CAN USE THIS????

      I tried it in LibreOffice Calc (and in Calligra Sheets) and the result is "Ok" because they do those operations correctly.

      • (Score: 2, Informative) by Nth_man on Wednesday August 06 2014, @05:12PM

        by Nth_man (4444) on Wednesday August 06 2014, @05:12PM (#78104)

        Another classical Excel bug:

                =IF(1.333-1.225=0.108,"Ok","THE HORROR")

                =IF(1.333-1.225-0.108=0,"Ok","THE HORROR")

        In the first case, Excel shows "Ok", and in the second one, "THE HORROR".

        I tried it in LibreOffice Calc (and in Calligra Sheets) and the result is "Ok" because they do those operations correctly.

        • (Score: 4, Informative) by bob_super on Wednesday August 06 2014, @07:10PM

          by bob_super (1357) on Wednesday August 06 2014, @07:10PM (#78168)

          "it's not a bug, it's floating point. Talk to the algo PhD about his definition of '=' " - The engineer.

      • (Score: 2) by wonkey_monkey on Friday August 08 2014, @09:00AM

        by wonkey_monkey (279) on Friday August 08 2014, @09:00AM (#78766) Homepage

        =IF(A1=0.1;"Ok";"THE HORROR")

        "The formula you typed contains an error." (should be commas, not semi-colons)

        Go back to A2, press Ctrl+c.
        - Go to B2, press Ctrl+v.

        You mean A1 and B1 :)

        Also try copying from B1 and doing a Paste Special... of "Values" into A1. Have a look at what gets displayed in the cell, and what's really in it on the entry bar (or whatever that's called).

        Where computers are concerned (and not just when it comes to Excel), decimals are only ever an approximation.

        A client of mine fell into this trap a few years back when we told him about a 21.99 monetary limit. He tested for =21.99 instead of 22.00, and fell foul of decimal approximation.

        --
        systemd is Roko's Basilisk
        • (Score: 1) by Nth_man on Friday August 08 2014, @01:34PM

          by Nth_man (4444) on Friday August 08 2014, @01:34PM (#78846)

          Thank you for reporting!

          > "The formula you typed contains an error." (should be commas, not semi-colons)

          It really depends on the "locale configuration" (or however they name it :-) ) of Excel. For example, if in your country your decimal separator was "," then instead of writing "3.1416" you wrote "3,1416" and you couldn't use use commas in some places of Excel formulas, but you had to use semi-colons. Anyway, I should have adapted it to the USA version of Excel :-(

          > You mean A1 and B1 :)

          No :), I meant what I wrote (A2 and B2) :) but thanks for caring!

          > Also try copying from B1 and doing a Paste Special... of "Values" into A1. Have a look at what gets displayed in the cell, and what's really in it on the entry bar (or whatever that's called).

          Yes! That doesn't happen when using other programs, but Excel is...
          and people trust it. They end up paying for it, with failures, sometimes without realizing it.

          > Where computers are concerned (and not just when it comes to Excel), decimals are only ever an approximation.

          Mmm... the last time that I checked.... math programs (like Mathematica or Maple) could work with decimal numbers without binary approximations; the "bc" program (which comes with (K)Ubuntu) also achieves it, although "bc" is very small, and "supports arbitrary precision numbers" like it says e.g. in "man bc".

          There are some libraries that I've used and I made programs with those libraries, and decimal numbers were also stored as decimal numbers, instead of using binary approximations. E.g. IBM explained it, with a lot of information, in http://www.eecs.yorku.ca/course_archive/2007-08/W/3002/YorkClassW08IanMcIntoshDecimal%20Floating%20Point-20080206Slides.pdf [yorku.ca]

          > A client of mine fell into this trap a few years back when we told him about a 21.99 monetary limit. He tested for =21.99 instead of 22.00, and fell foul of decimal approximation.

          One should choose good tools :-( At least he realized it :-)

          Greetings!

  • (Score: 2) by Snow on Wednesday August 06 2014, @04:02PM

    by Snow (1601) on Wednesday August 06 2014, @04:02PM (#78066) Journal

    Ever tried to import a .csv file into Excel that has a date and/or time section? Excel will, without fail, mess it up in some way... It's super annoying... TI even correctly recognises it as a date/time, but still screws it up.

    • (Score: 2) by VLM on Wednesday August 06 2014, @05:50PM

      by VLM (445) on Wednesday August 06 2014, @05:50PM (#78122)

      For other fun, try importing as text or whatever any technical-ish document involving part numbers or locators or whatever of the form "number letter E number" or in regex form something like "\d+e\d+" or "\d+E\d+" or "\d+.\dE\d+" you get the idea.

      What a steaming pile of software.

    • (Score: 2) by jackb_guppy on Wednesday August 06 2014, @10:09PM

      by jackb_guppy (3560) on Wednesday August 06 2014, @10:09PM (#78226)

      Format using international standard (ISO) of 2014-12-01 Always import correctly.

      • (Score: 2) by monster on Thursday August 07 2014, @12:49PM

        by monster (1260) on Thursday August 07 2014, @12:49PM (#78386) Journal

        But the problem is not "importing a date correctly", it's "importing some data not intended to be taken as a date but taken as one anyway". Quite different

        • (Score: 2) by jackb_guppy on Thursday August 07 2014, @11:13PM

          by jackb_guppy (3560) on Thursday August 07 2014, @11:13PM (#78636)

          Yes. I have the same problem with a list of *OUTQ on the main frame. Ones are call DEC10 for month end reports for the month of Dec 2010. When we pull that list directly in to excel it shows up as 10-DEC (the 10th day of DEC. We know it does that have know for years that it does that. If you manually entered it you have seen what it has done. You would place ' in front all will be well.

          Do not complain, if you did not do the research to understand your tool.

          Part Number: st4ring 12345E8 will show up as 1.2345+E12 if you do not place ' in front of the string. Because the user is lasy does not make the software. That is expected functionally of the software.

          PS: I am FOSS person. So excel is bad. MS is bad. The tool works as advertised. Oh, LibreOffice Calc works the same way. I expect Lotus123 if someone has copy will show it too.

          Software is not mind reading... close though.

  • (Score: 2, Informative) by b on Wednesday August 06 2014, @04:02PM

    by b (2121) on Wednesday August 06 2014, @04:02PM (#78067)

    This paper came out ten years ago.

    But in any case, any bioinformatician worth their salt shouldn't be using Excel for their analysis. Also, any pipeline that uses gene names should test for names that don't match the reference list at the end. This is pretty elementary stuff, but so is not using Excel.

    • (Score: 2) by edIII on Wednesday August 06 2014, @10:55PM

      by edIII (791) on Wednesday August 06 2014, @10:55PM (#78243)

      This is pretty elementary stuff, but so is not using Excel.

      That's not fair to Excel though.

      It's an excellent tool. I've used it for almost 20 years now as a CSV editor and for a lightweight DB tool since I can apply filters, formulas, rules, and outright visual basic code. It allows me to very quickly process large amounts of data and make it ready for import into a so-called "real DB engine/platform".

      Some may say it's stupid, but when I can shave off hours of work and look like a genius processing a few thousand records with some MID, replace, and lookups I'm gonna use the tool best suited for it.

      As a tool? Excel is a Swiss Army knife. As long as you're not an idiot or a novice. The true idiots are the ones who don't know to specify a data type for a selection or an entire column.

      This isn't the first time people have lamented that scientists need more basic computer courses including attention paid to programming and working with data structures (kinda of like DB engines).

      Bashing Excel is not as appropriate as bashing the scientists here. Garbage-In-Garbage-Out....

      --
      Technically, lunchtime is at any moment. It's just a wave function.
      • (Score: 1, Funny) by Anonymous Coward on Wednesday August 06 2014, @11:53PM

        by Anonymous Coward on Wednesday August 06 2014, @11:53PM (#78253)

        I can shave off hours of work and look like a genius

        ...then your little hack becomes the company's "standard" and you end up supporting it eternally.

        -- gewg_

        • (Score: 2) by edIII on Thursday August 07 2014, @04:19PM

          by edIII (791) on Thursday August 07 2014, @04:19PM (#78486)

          *sigh*

          It's a TOOL , not a platform. I don't create "hacks" in Excel that become standards. My point is that it's pretty damn good at data entry and prep.

          If you have a need for a very lightweight tool to prepare data to be passed to a larger platform (that also apparently lacks its own front end) you can use Excel since it just gets the job done.

          Bashing Excel here is laughably stupid and willfully ignorant. Fixing the problem in the article is as simple as selecting cells, right clicking for properties, and specifying a data type appropriate for the field.

          Could I spend a few hours creating a custom front end for that data entry that does everything we need? Sure. Why do that when I can spend literally 5-10 minutes prepping data? Don't get me started on how many one-off jobs I've had in 20 years where it's not required to write entire modules of code for one little job.

          I just don't get all the hate heaped on Excel here. It's a tool in a toolbox. Use it appropriately...

          --
          Technically, lunchtime is at any moment. It's just a wave function.
      • (Score: 1) by b on Thursday August 07 2014, @12:32AM

        by b (2121) on Thursday August 07 2014, @12:32AM (#78265)

        (Caveat: I am a bioinformatician.) It's incredibly frustrating seeing authors rely on bioinformatics for a substantial proportion on their findings, and yet do not hold it to the same level of scrutiny that they would other methods. Similarly, if papers need bioinformatics (or statistics, or proteomics, or sequencing, or anything that is more complicated than "vanilla" molecular biology), scientists need to seek for appropriate assistance. I think we agree with the point that scientists need more education in the field!

        Having said that, AFAIK all that you say Excel can do can also be done with better, quicker tools, such as grep, awk and r. Of course, these are less accessible to lay people.

        • (Score: 2) by edIII on Thursday August 07 2014, @04:50PM

          by edIII (791) on Thursday August 07 2014, @04:50PM (#78499)

          Having said that, AFAIK all that you say Excel can do can also be done with better, quicker tools, such as grep, awk and r. Of course, these are less accessible to lay people.

          I have to strongly disagree here on the "speed" and "better". Am I on a headless server?

          Only if I am on a headless server will I spend all that time running those command line tools you've mentioned. Most of the time I have my laptop, which is Win7 Professional. Now, since I have Win7, I also have my copy of Office 2003 still up and running.

          Having done this for 20 years I can tell you that I would much rather have a GUI and I can work with the data records far faster and easier with a mouse and effecting the same changes against the data as those command line tools.

          I *can* use all those tools straight from a headless server to get the job done. I *can* also use Excel and get it done *easier* and *faster*. After 20 years, I go for speed and ease of use. I don't need to be making a statement against the man (or MS) by making my job harder. Writing out all those really complex command lines (you should see ImageMagick on the command line) is a pain in the ass, and then I have to pipe it all back out the console to see my changes.

          Honestly, Excel seems far easier with the GUI than running all those commands by hand.

          If it's MS we don't like here... what about LibreOffice? It has quite a number of features (all the important ones) of Excel.

          My only point, which I'm being treated like a "noob" here for, is to use Excel for what it's good for. It's just a tool. Those researchers used the tool incorrectly. It's not even a stupid crappy feature that MS shoved into Excel that's causing the problem at all. Auto complete is extremely useful at times during fast data entry. If you don't want it to run against the cell, you just *tell* it to not run against the cell. You don't write articles about how stupid MS is and how their tool (which you can't figure out) is mangling your project :)

          Obviously I'm rushing to the defense of Excel here (not a popular position apparently) since I've used it as a supporting tool for DB work (against much larger platforms) for as long as I can remember. It's a pretty good reason why everyone came to me in the last 20 years to get data into the databases.

          Quite often I created forms (also in Open/Libre office) to get data entry ready and allow the employees to work with something easily before sending it to me to get processed. Eventually I switched us over to an internal website with forms, but until that was ready those spreadsheets were a suitable stop-gap measure.

          With Excel, or LibreOffice, I can create spreadsheets that *also* make calls and post the data to the backend platform. I did that once or twice to make things easier too.

          I love Excel (or the LibreOffice equivalent), and that's from somebody that uses headless servers all day long :)

          --
          Technically, lunchtime is at any moment. It's just a wave function.
          • (Score: 1) by b on Thursday August 07 2014, @11:56PM

            by b (2121) on Thursday August 07 2014, @11:56PM (#78659)

            I guess it comes down to experience with the tools, and the specifics of the work. I certainly find awk et al faster for my work. Also, traditional spreadsheets don't tend to deal well with large files. For example, I regularly process sam files. I just checked one, and it's 29,779,077 lines long. I can't imagine any GUI program dealing elegantly with that.

            Also, for high-throughput work, it's much easier to process a thousand files with (e.g.) `for i in *; do my_script > $i_processed; done`

            FWIW even on a server you can X forward and run Libreoffice, but I imagine this would be quite slow depending on your bandwidth.

  • (Score: 4, Interesting) by jackb_guppy on Wednesday August 06 2014, @04:15PM

    by jackb_guppy (3560) on Wednesday August 06 2014, @04:15PM (#78075)

    If they are researchers... they need to do research to understand the tool they are using.

    All cells in Excel are type General. If you just enter your data, or open CSV/TAB file. General looks at the data and makes it best guess on what you want. CSV, even if you quote the string, Excel still tries to use general processing rules / translation. CSV/TAB files do have a chance to map the column type (same as Access, or about any DB engine)

    DEC1 will change to 1-DEC (based on your local config). So does 2014-12-01 will be reformatted. If a zipcode is entered like 01001, will show up as 1001. This is all normal. Remember Zip Codes (in general are known as Postal Codes) are ALPHA / TEXT not a number and so are Phone Numbers. Since Excel does not understand that unless you TELL IT by putting ' in front of the string it will do the best it can.

    We are in the process to upgrade to new ERP and one of the test we did to see if the export to Excel would just "work". Instead of exporting a CSV, the ERP exports XML, with "CSS" mapping to identify, define and format the column type so dates, postal codes and part numbers (12345E5) are passed correctly.

    It is up to user to use the given tool correctly.

    • (Score: 2, Informative) by datapharmer on Wednesday August 06 2014, @05:38PM

      by datapharmer (2702) on Wednesday August 06 2014, @05:38PM (#78113)

      Seriously.... File > Options > Autocorrect > uncheck. Now to double check format the cells for the rows/columns as you wish them to appear. Was that so hard?

  • (Score: 4, Interesting) by martyb on Wednesday August 06 2014, @05:20PM

    by martyb (76) Subscriber Badge on Wednesday August 06 2014, @05:20PM (#78108) Journal

    Back in the early 80's, I was working at a company that had just moved our department to a new building. Everything was rush-rush, so there were still people in the new location who had not been moved out. We were just shoe-horned in around several wings in the building until their move completed. At that point, we would then get consolidated into our new, permanent locations. Through lack of foresight, there was no single directory of where each employee in our department was now located. We were constantly wandering the halls trying to find each other. of course we were also in the midst of a major rollout so pressure was high to meet very tight deadlines. This compounded the frustration.

    To their credit, the room numbers followed a simple scheme; a five-character code comprised of: {FloorNumber}{Wing}{RoomNumber}, where floor_number was an integer from {1|2|3}, wing was a letter matching the directions on a compass: {N|E|S|W}, and room number was a two-digit number [00..25]. For example, room location "2W23" was room number "23" in the "W"est wing on the "2"nd floor.

    I got sufficiently frustrated at being unable to easily locate my co-workers, that I coded up a quick REXX [wikipedia.org] program that read in a text file, and generated several reports. One was sorted by {LastName, FirstName}. Another was sorted by room location. For completeness sake, another one was sorted by phone number.

    Everything was working wonderfully until I noticed a sorting error in the room location report.

    I wanted them sorted in order of FloorNumber, then Wing, and then RoomNumber. But, it was occasionally getting the order incorrect. I put in lots of print statements. Pulled out a bunch of hair. Cursed. It was still doing it wrong. I finally noticed that the sort order was correct for 3 out of the 4 wings: N, S, and W wings were all sorted correctly.

    It was only rooms in the east wing that were incorrect.

    Have you figured it out?

    I started walking through the statements one by one. And then it hit me, the sort comparison no longer compared as strings, and instead sorted as if they were in scientific notation, so "3E12" would compare as being less than "1E23". And I could find no way to tell REXX to do it otherwise.

    So, prior to sorting, I replaced each of "E", "N", "S", and "W" with "1", "2", "3", and "4", respectively. Then I did the sort. And then changed them back when writing them to the report.

    That's just an anecdote, but I can create the same error in AWK unless I force a string comparison by catenating each value with a null string in the comparison.

    I'm curious if anyone knows of other languages where it would, by default, make the same kind of conversion prior to comparison?

    --
    Wit is intellect, dancing.
    • (Score: 2) by Common Joe on Thursday August 07 2014, @04:44AM

      by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Thursday August 07 2014, @04:44AM (#78315) Journal

      If I had mod points, I'd give them to you.

      A lot of people on here are complaining about this being an Excel or a PEBCAK issue, but the reality is that this kind of stuff can bite any of us at any time just because we deal with computers. As a matter of fact, I can say with 100% certainty that it has bitten every single one of us on this site multiple times. Yes, Excel has its issues and I learned and chuckled at the guy Nth_man's "The Horror" [soylentnews.org].

      I've dealt with large databases working with a lot of money and I sometimes sweated when working with certain queries. Some of those mistakes are invisible and when I put something in production, I always wondered if I got everything correct. I always tried to double check and verify in a different way, but the reality is that I was never sure and could never be. (We did have an independent verification in another language for our main calculation if you're wondering.)

    • (Score: 2) by monster on Thursday August 07 2014, @01:00PM

      by monster (1260) on Thursday August 07 2014, @01:00PM (#78388) Journal

      I started walking through the statements one by one. And then it hit me, the sort comparison no longer compared as strings, and instead sorted as if they were in scientific notation, so "3E12" would compare as being less than "1E23". And I could find no way to tell REXX to do it otherwise.

      And that is a pet peeve of mine against dynamic typing. In C, Pascal and others once you specify that something holds a string, anything you put in it keeps being a string. If you want to convert it to something else, ok, but you do it explicitly. With dynamic typing language developers are too keen to add "intelligence" to their type inference routines and, as other have already said in this thread, it usually ends up backfiring.

  • (Score: 2) by kaszz on Wednesday August 06 2014, @05:36PM

    by kaszz (4211) on Wednesday August 06 2014, @05:36PM (#78112) Journal

    There's several problems:
      * Know your tools!
      * Researchers should know better than using Microsoft for anything where accuracy or reliability matters
      * Software trying to be smart often fails in subtle but fatal ways
      * Automated default guessing is a mine field
      * Use scripts, CSV and SQL

  • (Score: 2, Informative) by elias on Wednesday August 06 2014, @05:40PM

    by elias (666) on Wednesday August 06 2014, @05:40PM (#78116)

    is the tag line that stuck with me, as one researcher [utk.edu] summed it up many years ago. There were many other problems back then and I guess nobody in their right mind expects that these will all be solved. It requires people to have a look at the code and fix it (e.g. R), or software to be specialized and expensive (SAS) to go into that level of detail.

    Excel is not meant for data analysis, never was, never will be.

    • (Score: 2) by kaszz on Wednesday August 06 2014, @06:20PM

      by kaszz (4211) on Wednesday August 06 2014, @06:20PM (#78141) Journal

      The point of MS-Excel is to make sure no one buys Visicalc or Lotus123. It's a morass of bugs and limits for scalability with a proprietary file format to boot. That only works with an operating system that needs constant patching and virus infestations.

  • (Score: 4, Informative) by zeigerpuppy on Wednesday August 06 2014, @07:38PM

    by zeigerpuppy (1298) on Wednesday August 06 2014, @07:38PM (#78176)

    Read this:
    http://christophergandrud.github.io/RepResR-RStudio/ [github.io]

    Install this:
    http://www.rstudio.com/products/rstudio/download-server/ [rstudio.com]

    Excel is a terrible tool for statistics,
    It has no audit trail, dodgy algorithms and is inefficient compared to statistical languages.
    R takes about a week to learn the basics but will save hundreds of hours of tedium and reanalysis for any moderately large statistical task.
    Combine it with knitr and LaTeX and the whole pipeline from data to publication quality report can be scripted.