Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by janrinok on Friday August 26 2016, @08:13AM   Printer-friendly
from the not-so-bright-scientists dept.

Scientific literature often mis-names genes and boffins say Microsoft Excel is partly to blame.

"Automatic conversion of gene symbols to dates and floating-point numbers is a problematic feature of Excel software," In a paper titled write Mark Ziemann, Yotam Eren and Assam El-OstaEmai of the Baker IDI Heart & Diabetes Institute in Australia in a paper titled Gene name errors are widespread in the scientific literature .

Among the things Excel does to gene names include changing "SEPT2", the name of a gene thought to have a role in proper formation of cell structure, to the date "2-Sep". The "MARCH1" gene becomes "1-Mar".

The paper notes that this is a problem that's been know for over a decade, but one which remains pervasive. The trio studied 35,175 Excel tables attached to 3,597 scientific papers published between 2005 and 2015 and found errors in "987 supplementary files from 704 published articles. Of the selected journals, the proportion of published articles with Excel files containing gene lists that are affected by gene name errors is 19.6 per cent."

It's not hard to change the default format of Excel cell to avoid changes of this sort: you can get it done in a click or three. Much of the problem in these papers is therefore between scientists' ears, rather than within Excel itself. The paper's silent on why genetic scientists, who The Register will assume are not short of intelligence, have been making Excel errors for years.

This article focuses on errors resulting from auto-correction of gene names; certainly other subject areas have suffered from similarly 'helpful' software. What hilarious and/or cringe-worthy 'corrections' have YOU seen?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by bzipitidoo on Friday August 26 2016, @03:19PM

    by bzipitidoo (4388) on Friday August 26 2016, @03:19PM (#393521) Journal

    > separators that are so likely to appear in data

    That's always the problem. Every time a new way to separate information was created, it became embedded, already used so that another method of separation had to be created to avoid confusion. Sometimes that leads to monstrosities such as HTML. Originally, writing did not have punctuation-- no commas, periods, semicolons and colons, and not even spaces. Yes, the space is considered punctuation.

    Heck, much ancient writing didn't even have vowels. Vowels are formally part of the alphabet, have been for centuries, but punctuation still is not, and children are not taught a list of punctuation symbols in the same manner as taught the alphabet and numerical digits. English speakers all know that the English alphabet has 26 letters, but can anyone say how many standard punctuation symbols there are off the top of their heads? Most people would overlook the space, despite that being by far the largest key on the typical keyboard. Or, more like wouldn't think of it as punctuation. Punctuation is just quietly slipped in as something of an afterthought while teaching writing. Much early writing uses a mid level dot ยท to separate most words but not all, as it was only a tool to reduce ambiguity and fairly often there was no ambiguity and so no need for a word separator.

    We have an ongoing debate between fixed and variable width fonts. I have yet to see a programming language that is written in a proportional font. Think what a mess that would make of Python especially. For separation schemes that depend on spacing, placing symbols on a grid is essential. Can get away with a proportional font for CSV. But for more sophisticated schemes for separating data elements, it sure is nice to be able to use spacing.

    Starting Score:    1  point
    Moderation   +1  
       Insightful=1, Total=1
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by digitalaudiorock on Friday August 26 2016, @05:29PM

    by digitalaudiorock (688) on Friday August 26 2016, @05:29PM (#393587) Journal

    We have an ongoing debate between fixed and variable width fonts. I have yet to see a programming language that is written in a proportional font. Think what a mess that would make of Python especially. For separation schemes that depend on spacing, placing symbols on a grid is essential. Can get away with a proportional font for CSV. But for more sophisticated schemes for separating data elements, it sure is nice to be able to use spacing.

    Wow...I'm not sure I've ever read anything that's confused me more. A delimited text format is intended for data transport purposes and simply must be parsable by whatever uses it. What on earth do fonts have to do with any of that in any way?

    • (Score: 0) by Anonymous Coward on Friday August 26 2016, @08:52PM

      by Anonymous Coward on Friday August 26 2016, @08:52PM (#393679)

      I have no clue what fonts have to do with it but I was waiting for the GP to complain that RPG columns don't line up with variable width fonts. Yes, RPG - I went there because I'm ooooold school.