Stories
Slash Boxes
Comments

SoylentNews is people

Submission Preview

Link to Story

Excel Hell Messes Up ~20 Per Cent of Genetic Science Papers

Accepted submission by Arthur T Knackerbracket at 2016-08-25 08:25:40
/dev/random

Story automatically generated by StoryBot Version 0.2.0a (Development).

Note: This is the complete story and will need further editing. It may also be covered by Copyright and thus should be acknowledged and quoted rather than printed in its entirety.

FeedSource: [TheRegister] collected from rss-bot logs

Time: 2016-08-25 06:32:12 UTC

Original URL: http://www.theregister.co.uk/2016/08/25/excel_hell_messes_up_20_per_cent_of_genetic_science_papers/ [theregister.co.uk] using UTF-8 encoding.

Title: Excel hell messes up ~20 per cent of genetic science papers

--- --- --- --- --- --- --- Entire Story Below --- --- --- --- --- --- ---

Excel hell messes up ~20 per cent of genetic science papers

Arthur T Knackerbracket has found the following story [theregister.co.uk]:

Scientific literature often mis-names genes and boffins say Microsoft Excel is partly to blame.

"Automatic conversion of gene symbols to dates and floating-point numbers is a problematic feature of Excel software," In a paper titled write Mark Ziemann, Yotam Eren and Assam El-OstaEmai of the Baker IDI Heart & Diabetes Institute in Australia in a paper titled Gene name errors are widespread in the scientific literature [biomedcentral.com].

Among the things Excel does to gene names include changing "SEPT2", the name of a gene thought to have a role in proper formation of cell structure, to the date "2-Sep". The "MARCH1" gene becomes "1-Mar".

The paper notes that this is a problem that's been know for over a decade, but one which remains pervasive. The trio studied 35,175 Excel tables attached to 3,597 scientific papers published between 2005 and 2015 and found errors in "987 supplementary files from 704 published articles."

"Of the selected journals, the proportion of published articles with Excel files containing gene lists that are affected by gene name errors is 19.6 per cent."

It's not hard to change the default format of Excel cell to avoid changes of this sort: you can get it done in a click or three. Much of the problem in these papers is therefore between scientists' ears, rather than within Excel itself. The paper's silent on why genetic scientists, who The Register will assume are not short of intelligence, have been making Excel errors for years.

The paper offers two workarounds. One is to use Google Sheets, which in the authors' tests "did not convert any gene names to dates or numbers when typed or pasted; notably, when these sheets were later reopened with Excel, LibreOffice Calc or OpenOffice Calc, gene symbols such as SEPT1 and MARCH1 were protected from date conversion."

The authors also cooked up scripts to find Excel errors and have made them available to download on Sourceforge [sourceforge.net].


Original Submission