Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday July 30 2017, @02:05PM   Printer-friendly
from the how-do-you-know-how-much-you-don't-know? dept.

The feat made headlines around the world: "Scientists Say Human Genome is Complete," The New York Times announced in 2003. "The Human Genome," the journals Science and Nature said in identical ta-dah cover lines unveiling the historic achievement.

There was one little problem.

"As a matter of truth in advertising, the 'finished' sequence isn't finished," said Eric Lander, who led the lab at the Whitehead Institute that deciphered more of the genome for the government-funded Human Genome Project than any other. "I always say 'finished' is a term of art."

"It's very fair to say the human genome was never fully sequenced," Craig Venter, another genomics luminary, told STAT.

"The human genome has not been completely sequenced and neither has any other mammalian genome as far as I'm aware," said Harvard Medical School bioengineer George Church, who made key early advances in sequencing technology.

[...] FAQs from the National Institutes of Health refer to the sequence's "essential completion," and to the question, "Is the human genome completely sequenced?" they answer, "Yes," with the caveat — that it's "as complete as it can be" given available technology.

[...] Church estimates 4 percent to 9 percent of the human genome hasn't been sequenced. Miga thinks it's 8 percent.

https://www.statnews.com/2017/06/20/human-genome-not-fully-sequenced/

I'm glad this is finally getting some coverage. A few years ago I looked into the human genome to prove to myself it didn't contain a certain sequence, and found this was impossible since ~10% of it was missing. When they talk about "sequencing a genome" it is total false advertising.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by Anonymous Coward on Sunday July 30 2017, @02:53PM (5 children)

    by Anonymous Coward on Sunday July 30 2017, @02:53PM (#546688)

    DNA sequencing only reads small chunks at a time, TFA says typically 1000 base pairs in the Human Genome Project, but mentions new capabilities of up to 60000 base pairs. You get a ton of these short sequences, then turn a computer loose, looking for overlaps and trying to build one complete sequence from them.

    The remaining gaps are where sequences couldn't be spliced because of ambiguity created by repeating sequences longer than the maximum read length.

    Starting Score:    0  points
    Moderation   +3  
       Informative=3, Total=3
    Extra 'Informative' Modifier   0  

    Total Score:   3  
  • (Score: 2) by kaszz on Sunday July 30 2017, @03:21PM (1 child)

    by kaszz (4211) on Sunday July 30 2017, @03:21PM (#546697) Journal

    Seems the sequencing method is incomplete. Just imagine a harddisc that would read random blocks and let the filesystem try to figure out which blocks belonged to what file from the block contents alone without any LBA reference or anything. A better method where the DNA string is pulled out and read in one long sequence is needed.

    Reminds of CD-rippers where block reading would not align properly so the ripper program had to align them on the fly.

    • (Score: 0) by Anonymous Coward on Monday July 31 2017, @05:25PM

      by Anonymous Coward on Monday July 31 2017, @05:25PM (#547256)

      This method is what they were using before shot-gun sequencing came about.

      In several years they managed to sequence was shot-gun sequencing covered in a matter of months.

      The linear read method is really hard, though they should be able to use it to target these few ambiguous regions.

  • (Score: 2) by takyon on Sunday July 30 2017, @03:37PM (2 children)

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Sunday July 30 2017, @03:37PM (#546705) Journal

    If newer machines can read fragments that are 60x longer than before, could some random joe's $1,500 sequencing yield a more complete sequence than the HGP reference genome?

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @08:37PM (1 child)

      by Anonymous Coward on Sunday July 30 2017, @08:37PM (#546813)

      60x longer is a far stretch, there would be quite some engineering (both physical and biochemical) problems to overcome (and you would require a fundamental new technique of doing it). But sequencing is just part of the problem. The genome you want to sequence is often first added to a library, genome is cut up and parts are randomly placed in "stable" vectors, which are transformed to bacteria/yeast. Some genes could be expressed from these vectors and be toxic to the bacteria/yeast (hence the quote marks in stable), hitting a hole in your sequence. Even if the vector survives, the sequences in the library are random, which would mean that, statistically, you'll never get 100% certainty to obtain 100% of the sequences.

      Newer sequencing techniques actually use shorter reads, but the problem stays the same, you're always looking at a pool of sequences of the large complete sequence.

      • (Score: 3, Insightful) by gringer on Sunday July 30 2017, @11:41PM

        by gringer (962) on Sunday July 30 2017, @11:41PM (#546889)

        60x longer is a far stretch

        No, it's not. Nanopore can get quite a lot of sequences over 60kb, and Nick Loman has managed a end-to-end-mappable sequence that is over 700kb in length.

        The challenge is now in sample prep, not sequencing. It's quite difficult to keep DNA intact enough for really long reads with most library preparation methods-

        --
        Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]