Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday July 30 2017, @02:05PM   Printer-friendly
from the how-do-you-know-how-much-you-don't-know? dept.

The feat made headlines around the world: "Scientists Say Human Genome is Complete," The New York Times announced in 2003. "The Human Genome," the journals Science and Nature said in identical ta-dah cover lines unveiling the historic achievement.

There was one little problem.

"As a matter of truth in advertising, the 'finished' sequence isn't finished," said Eric Lander, who led the lab at the Whitehead Institute that deciphered more of the genome for the government-funded Human Genome Project than any other. "I always say 'finished' is a term of art."

"It's very fair to say the human genome was never fully sequenced," Craig Venter, another genomics luminary, told STAT.

"The human genome has not been completely sequenced and neither has any other mammalian genome as far as I'm aware," said Harvard Medical School bioengineer George Church, who made key early advances in sequencing technology.

[...] FAQs from the National Institutes of Health refer to the sequence's "essential completion," and to the question, "Is the human genome completely sequenced?" they answer, "Yes," with the caveat — that it's "as complete as it can be" given available technology.

[...] Church estimates 4 percent to 9 percent of the human genome hasn't been sequenced. Miga thinks it's 8 percent.

https://www.statnews.com/2017/06/20/human-genome-not-fully-sequenced/

I'm glad this is finally getting some coverage. A few years ago I looked into the human genome to prove to myself it didn't contain a certain sequence, and found this was impossible since ~10% of it was missing. When they talk about "sequencing a genome" it is total false advertising.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Interesting) by kaszz on Sunday July 30 2017, @02:14PM (17 children)

    by kaszz (4211) on Sunday July 30 2017, @02:14PM (#546680) Journal

    So what makes the last percents so hard to sequence?

    • (Score: 3, Informative) by Anonymous Coward on Sunday July 30 2017, @02:53PM (5 children)

      by Anonymous Coward on Sunday July 30 2017, @02:53PM (#546688)

      DNA sequencing only reads small chunks at a time, TFA says typically 1000 base pairs in the Human Genome Project, but mentions new capabilities of up to 60000 base pairs. You get a ton of these short sequences, then turn a computer loose, looking for overlaps and trying to build one complete sequence from them.

      The remaining gaps are where sequences couldn't be spliced because of ambiguity created by repeating sequences longer than the maximum read length.

      • (Score: 2) by kaszz on Sunday July 30 2017, @03:21PM (1 child)

        by kaszz (4211) on Sunday July 30 2017, @03:21PM (#546697) Journal

        Seems the sequencing method is incomplete. Just imagine a harddisc that would read random blocks and let the filesystem try to figure out which blocks belonged to what file from the block contents alone without any LBA reference or anything. A better method where the DNA string is pulled out and read in one long sequence is needed.

        Reminds of CD-rippers where block reading would not align properly so the ripper program had to align them on the fly.

        • (Score: 0) by Anonymous Coward on Monday July 31 2017, @05:25PM

          by Anonymous Coward on Monday July 31 2017, @05:25PM (#547256)

          This method is what they were using before shot-gun sequencing came about.

          In several years they managed to sequence was shot-gun sequencing covered in a matter of months.

          The linear read method is really hard, though they should be able to use it to target these few ambiguous regions.

      • (Score: 2) by takyon on Sunday July 30 2017, @03:37PM (2 children)

        by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Sunday July 30 2017, @03:37PM (#546705) Journal

        If newer machines can read fragments that are 60x longer than before, could some random joe's $1,500 sequencing yield a more complete sequence than the HGP reference genome?

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
        • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @08:37PM (1 child)

          by Anonymous Coward on Sunday July 30 2017, @08:37PM (#546813)

          60x longer is a far stretch, there would be quite some engineering (both physical and biochemical) problems to overcome (and you would require a fundamental new technique of doing it). But sequencing is just part of the problem. The genome you want to sequence is often first added to a library, genome is cut up and parts are randomly placed in "stable" vectors, which are transformed to bacteria/yeast. Some genes could be expressed from these vectors and be toxic to the bacteria/yeast (hence the quote marks in stable), hitting a hole in your sequence. Even if the vector survives, the sequences in the library are random, which would mean that, statistically, you'll never get 100% certainty to obtain 100% of the sequences.

          Newer sequencing techniques actually use shorter reads, but the problem stays the same, you're always looking at a pool of sequences of the large complete sequence.

          • (Score: 3, Insightful) by gringer on Sunday July 30 2017, @11:41PM

            by gringer (962) on Sunday July 30 2017, @11:41PM (#546889)

            60x longer is a far stretch

            No, it's not. Nanopore can get quite a lot of sequences over 60kb, and Nick Loman has managed a end-to-end-mappable sequence that is over 700kb in length.

            The challenge is now in sample prep, not sequencing. It's quite difficult to keep DNA intact enough for really long reads with most library preparation methods-

            --
            Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
    • (Score: 1, Interesting) by Anonymous Coward on Sunday July 30 2017, @03:57PM (2 children)

      by Anonymous Coward on Sunday July 30 2017, @03:57PM (#546710)

      Progressively worse funding since 2001 is the problem.

      Americans don't care about science. Since 9/11 the singular purpose of America has been the targeted murder of towelheads. The year is 2017 and there isn't any money left for anything else.

      The last time the human genome was almost completely sequenced, a grad student sacrificed the use of his wrists to finish the project in a month before funding was cut.

      Today you would have to find someone naive enough to risk death for the privilege of working on the project.

      • (Score: 1, Insightful) by Anonymous Coward on Sunday July 30 2017, @06:12PM (1 child)

        by Anonymous Coward on Sunday July 30 2017, @06:12PM (#546749)

        Not enough funding is why geneticists act like weasels and say they completed the sequencing when they actually were unable to achieve their goals? Sorry, but no.

        • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @07:44PM

          by Anonymous Coward on Sunday July 30 2017, @07:44PM (#546793)

          ... wait, what?

          kaszz: "What's the problem with sequencing?"
          AC1: "Lack of funding."
          AC2: "That doesn't excuse their lying!"

          AC2, please stop responding to nonexistent arguments. Thank you.

    • (Score: 2) by gringer on Sunday July 30 2017, @11:37PM (6 children)

      by gringer (962) on Sunday July 30 2017, @11:37PM (#546887)

      Highly repetitive sequences that are longer than the template length of Sanger sequencing and Illumina sequencing:

      http://i.imgur.com/p5U1YCz.jpg [imgur.com]

      https://twitter.com/gringene_bio/status/887822844280709120 [twitter.com]

      https://twitter.com/gringene_bio/status/882006992268648449 [twitter.com]

      Long non-tandem repeats also cause problems.

      --
      Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
      • (Score: 2) by kaszz on Monday July 31 2017, @02:13PM (1 child)

        by kaszz (4211) on Monday July 31 2017, @02:13PM (#547149) Journal

        The last tweet is intriguing .. "The missing bits of code in the human genome could be key for understanding cancer.".

        • (Score: 0) by Anonymous Coward on Monday July 31 2017, @03:13PM

          by Anonymous Coward on Monday July 31 2017, @03:13PM (#547179)

          The tweet links to an article that links back to the article in the OP.

      • (Score: 2) by kaszz on Monday July 31 2017, @02:17PM (3 children)

        by kaszz (4211) on Monday July 31 2017, @02:17PM (#547152) Journal

        If repetitive sequences is the problem. Then the solution seems to be to stabilize long enough sequences and read them. The question is then, how long sequences do these need to be? and how long is the total?

        Is it possible to read the DNA code in place using say laser interferometry etc?
        Or other complete out-of-the-box methods.

        • (Score: 2) by gringer on Monday July 31 2017, @08:23PM (2 children)

          by gringer (962) on Monday July 31 2017, @08:23PM (#547346)

          The question is then, how long sequences do these need to be?

          To capture everything, about 3 million bases in length, according to Karen Miga at the London Calling conference this year.

          how long is the total?

          There are a lot of "totals" that I can pull out of human genome statistics; you need to be more specific than that. Concentrating just on centromeric regions, she said that the average centromeric length is about 3 Mb, so that suggests there's a bit over 60Mb of highly-repetitive sequence in the human genome.

          Is it possible to read the DNA code in place using say laser interferometry etc?

          Lasers and/or SEM are not yet able to scan at the 0.33 nm resolution required to resolve single bases of DNA. There have been a couple of papers in the last month or so discussing methods of visually resolving the 3D structure of DNA, but nothing about an adapter-free resolution of the sequence.

          Or other complete out-of-the-box methods.

          Nanopore sequencing is pretty good; it's what I used to make those images. Nick Loman has managed to get a read of a bit over 750 kb, and I expect that if ONT can get their accuracy for most 100bp sub-sequences up to Q30, or quadruple the read length, reads that span (or unambiguously cover) centromeres will be possible.

          --
          Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
          • (Score: 2) by kaszz on Tuesday August 01 2017, @12:52AM

            by kaszz (4211) on Tuesday August 01 2017, @12:52AM (#547455) Journal

            Lasers and/or SEM are not yet able to scan at the 0.33 nm resolution required to resolve single bases of DNA.

            Just a thought.. 8 kV would accelerate electrons with the energy for a x-ray photon at 0.1 nm wavelength. Possible solution?
            I'm thinking if the x-ray photon is shoot at different angles and the output angle is measured. A problem can be if the sample is not still though.

          • (Score: 2) by gringer on Tuesday August 01 2017, @06:11AM

            by gringer (962) on Tuesday August 01 2017, @06:11AM (#547533)

            Karen Miga has just put a preprint paper onto BioRXiv about resolving the centromere of the Y chromosome using nanopore reads:

            http://www.biorxiv.org/content/early/2017/07/31/170373 [biorxiv.org]

            --
            Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
    • (Score: 0) by Anonymous Coward on Monday July 31 2017, @03:15AM

      by Anonymous Coward on Monday July 31 2017, @03:15AM (#546954)

      Well, as everybody knows the first 90% of the work takes 90% of the time. The remaining 10% takes 90% more of the time.

  • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @03:02PM (4 children)

    by Anonymous Coward on Sunday July 30 2017, @03:02PM (#546693)

    Paragraph 1 summary: the human genome sequence is not completed

    Paragraph 3 summary: the human genome sequence is not completed

    Paragraph 4 summary: the human genome sequence is not completed

    Paragraph 5 summary: the human genome sequence is not completed

    Paragraph 6 summary: the human genome sequence is not completed

    Paragraph 7 summary: the human genome sequence is not completed

    I think a much shorter summary could have made the point equally well, without a large block quote from the article which spent 16 lines of text, but merely expressed the same single idea as the headline, but repeated an additional 6 times.

    • (Score: 2) by maxwell demon on Sunday July 30 2017, @03:10PM (2 children)

      by maxwell demon (1608) on Sunday July 30 2017, @03:10PM (#546694) Journal

      Wrong.

      Paragraph 1 summary: 2003 headlines declared the human genome sequencing to be completed.

      Paragraph 2 summary: Small problem here.

      Paragraph 3 summary: Eric lander says, it isn't.

      Paragraph 4 summary: So does Craig Venter.

      Paragraph 5 summary. And George Church says so, too.

      Paragraph 6 summary: And if you look closely, even the NIH admits it.

      Paragraph 7 summary: Estimates of how much is missing range from 4 to 9 percent.

      So while you are right that there's a lot of redundancy, it's not quite as bad as you claim.

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 0, Offtopic) by JNCF on Sunday July 30 2017, @03:29PM (1 child)

        by JNCF (4317) on Sunday July 30 2017, @03:29PM (#546703) Journal

        I'm not skipping straight to the comments so that I can read a summary of the summary,, guise; get back to trolling and pedantism!

        • (Score: 2) by maxwell demon on Sunday July 30 2017, @04:12PM

          by maxwell demon (1608) on Sunday July 30 2017, @04:12PM (#546716) Journal

          And I guess you've also not yet figured out what the "Parent" link does, right?

          --
          The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @06:18PM

      by Anonymous Coward on Sunday July 30 2017, @06:18PM (#546752)

      It is especially interesting how you fail to distinguish between qualitative and quantitative information? Are you involved in healthcare by any chance?

  • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @04:15PM

    by Anonymous Coward on Sunday July 30 2017, @04:15PM (#546719)

    It is amazing what they have done, but the NIH web site is definitely spinning the results on this.
    https://www.genome.gov/11006943/human-genome-project-completion-frequently-asked-questions/ [genome.gov]

    "Is the human genome completely sequenced?
    Yes - within the limits of today's technology, the human genome is as complete as it can be"

    No spin version might be a simple "NO, not completely, but close".

    They really need a better strategy than shred, sequence the pieces, and put Humpty Dumpty back together as best they can.
    The replication machinery in a cell has to be able to read the whole sequence when in makes a new cell.

    I wonder if there is some way the watch it doing that and get the sequence?

  • (Score: 2, Insightful) by Anonymous Coward on Sunday July 30 2017, @05:56PM (5 children)

    by Anonymous Coward on Sunday July 30 2017, @05:56PM (#546743)

    Any molecular biology student that gets its first classes in how sequencing is done, should know that 100% isn't feasible with current technology (highly repetitive regions are just difficult to sequence). Even if it would be possible, the question would arise what 100% human (or for almost any other organism) would mean due to variabilities between individuals that would be allowed. Then we would move the boundary for the same question to hybrids, sub-species and such.

    • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @06:23PM (3 children)

      by Anonymous Coward on Sunday July 30 2017, @06:23PM (#546757)

      It is like the "average man", such an entity does not exist and optimizing for him is a mistake. Likewise, it is extremely unlikely that anyone contains a cell with a sequence that matches what they call "the human genome" exactly.

      • (Score: 2) by HiThere on Sunday July 30 2017, @07:04PM (2 children)

        by HiThere (866) on Sunday July 30 2017, @07:04PM (#546774) Journal

        I thought Craig Venter had loads of cells that matched the sequence he described. (Yeah, it's not 100%, but I'm not convinced the areas with lots of repeats are that important...certainly not the precise number of repeats.)

        P.S.: Someone earlier said "the human genetic machinery can copy those regions", but they were a bit wrong. Copies of those regions tend to have lots of errors in the number of copies...but nobody's found any real link between that and any actual effect. Perhaps there's a link between the number of teleomers and aging, but that's not a long repeat. Neither its the one involved in Huntington's disease. Those are short repeats, and they frequently *are* significant.

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 1, Insightful) by Anonymous Coward on Sunday July 30 2017, @08:22PM

          by Anonymous Coward on Sunday July 30 2017, @08:22PM (#546810)

          If a non-repetitive region is flanked by regions which are repetitive, you'll have to go through quite some hoops to obtain the sequence of that region. It takes time and money. And, you might obtain let's say 99.5% of all the sequences, but connecting them into the full sequence is another problem.

          As for the telomeres, they are not the only repetitive sequences. 10 times an 'A' in an intron can really mess up your sequencing progresses in a single gene (own experience).
          The statement "the human genetic machinery can copy those regions" might be true, but many of the polymerases in an organism are not used in sequencing.

        • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @08:23PM

          by Anonymous Coward on Sunday July 30 2017, @08:23PM (#546811)

          If you have 6 billion basepairs with 1 in 100 million indel and 1 in 100 chromosomal missegregation rates it is extremely unlikely any two cells will have exactly the same sequence.

    • (Score: 2) by Immerman on Monday July 31 2017, @01:39PM

      by Immerman (3985) on Monday July 31 2017, @01:39PM (#547135)

      >the question would arise what 100% human (or for almost any other organism) would mean

      Quite. If for example if w were to assume that 95% of DNA were shared identically by 100% of humans, then you might think you could perfectly sequence one person's DNA and get 95% of the human genome. But that's ignoring the fact that 95% shared DNA doesn't mean that you have 95% of the human genome - to know that percentage you need to already know how much variation there is within the remaining 5% - assume an average of only 10 variants per gene, and the variations account for a third of the total genome. And even that is assuming that all differences are variations of an existing gene, rather than entirely new genes that have arisen over the millenia.

  • (Score: 0) by Anonymous Coward on Sunday July 30 2017, @06:09PM

    by Anonymous Coward on Sunday July 30 2017, @06:09PM (#546747)

    Please never again a summary like this! No, I'm not going to click.

  • (Score: 0) by Anonymous Coward on Monday July 31 2017, @09:24AM

    by Anonymous Coward on Monday July 31 2017, @09:24AM (#547061)

    "It's very fair to say the human genome was never fully sequenced," Craig Venter, another genomics luminary, told STAT.

    So did Craig Venter lie or even commit fraud? Or did Wired get it wrong?

    https://www.wired.com/2000/04/celera-wins-genome-race/ [wired.com]

    Celera Genomics has finished sequencing the entire human genome.

    The private company made the surprise announcement Thursday morning at a House hearing that had been scheduled to discuss the future of the Human Genome Project.

    "We've finished the sequencing phase," Celera president Craig Venter said at the hearing.

    With those words, Celera officially beat the public Human Genome Project in a long, closely watched race that ended several months ahead of Celera's own schedule as well as the public project's.

(1)