Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Thursday August 29 2019, @10:53AM   Printer-friendly

Submitted via IRC for SoyCow4408

Douglas Adams was right – knowledge without understanding is meaningless

Fans of Douglas Adams's Hitchhiker's Guide to the Galaxy treasure the bit where a group of hyper-dimensional beings demand that a supercomputer tells them the secret to life, the universe and everything. The machine, which has been constructed specifically for this purpose, takes 7.5m years to compute the answer, which famously comes out as 42. The computer helpfully points out that the answer seems meaningless because the beings who instructed it never knew what the question was. And the name of the supercomputer? Why, Deep Thought, of course.

It's years since I read Adams's wonderful novel, but an article published in Nature last month brought it vividly to mind. The article was about the contemporary search for the secret to life and the role of a supercomputer in helping to answer it. The question is how to predict the three-dimensional structures of proteins from their amino-acid sequences. The computer is a machine called AlphaFold. And the company that created it? You guessed it – DeepMind.

Proteins are large biomolecules constructed from amino acid residues and are fundamental to all animal life. They are, says one expert, "the most spectacular machines ever created for moving atoms at the nanoscale and often do chemistry orders of magnitude more efficiently than anything that we've built".

But these vital biomachines are also inscrutable because they assemble themselves into structures of astonishing complexity and beauty. (Illustrations of them make one think of what can go wrong when trying to wrap Christmas presents with those nice ribbons that only shop assistants can manage.) Understanding this "folding" process is one of the key challenges in biochemistry, partly because proteins are necessary for virtually every cell in a body and partly because it's suspected that mis-folding may help to explain diseases such as diabetes, Alzheimer's and Parkinson's.

[...] Two years ago, DeepMind, having conquered the board game Go, decided to take on the challenge, using the deep-learning technology it had developed for Go. The resulting machine was, predictably, named AlphaFold. At the CASP meeting last December, it unveiled the results. Its machine was, on average, more accurate than the other teams and by some criteria it was significantly ahead of the others. For protein sequences modelled from scratch – 43 of the 90 – AlphaFold made the most accurate prediction for 25 proteins. Its nearest rival only managed three.

[...] It's conceivable that a machine-learning approach will soon enable us to make accurate predictions of how a protein will fold and this may be very useful to know. But it won't be scientific knowledge. After all, AlphaFold knows nothing about biochemistry. We're heading into uncharted territory.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Thursday August 29 2019, @11:27AM (10 children)

    by Anonymous Coward on Thursday August 29 2019, @11:27AM (#887236)

    ... is giving possibillities of how it could fold within the set parameters. To know if it is correct it needs to be backed by experimental experiments, which, as far as my experience with them go, are currently still limited.

  • (Score: 0) by Anonymous Coward on Thursday August 29 2019, @12:35PM (9 children)

    by Anonymous Coward on Thursday August 29 2019, @12:35PM (#887247)

    "To know if it is correct it needs to be backed by experimental experiments"

    Perhaps, but folding must be predictable else biology would not work.
    An understanding of this physics plus the ability to model it should provide pretty good certainty without experiment.

    Machine learning sidesteps need for understanding.
    It is hard to know if the machine is keying off the the underlying physics which works or just a telltale which only probably/sometimes/maybe works.
    Kind of like picking your morning cereal based on the package color instead of contents.
    This prejudging based on a telltale might work because the same sort of cereal is sold is similar packages, but it does not provide the understanding to jump to something else you might like in a different package.
    Similar story with protein folding, it might pick out things like stuff it has seen before, but not get something novel right.

    On the other hand, if it does get things right often, then there is some interesting hidden in the model's training.
    Understanding what the training is picking out might lead to an understanding of the underlying physics?

     

    • (Score: 5, Interesting) by Anonymous Coward on Thursday August 29 2019, @01:30PM

      by Anonymous Coward on Thursday August 29 2019, @01:30PM (#887259)

      I've worked in protein chemistry before. Our group used NMR to analyse protein structures/dynamics, but another commonly used method is crystallography. And I think in silico protein folding is another way to look at protein structures (and dynamics). The reason we used NMR was that in our opinion it resembles in virto and in vivo situations more than crystallography (proteins in a liquid or membrane instread of a crystal). Often the structures resemble, but sometimes they don't. My guess is that this will also be the case with in silico folding. Therefore, you'll always need other (experimental) methods to validate (and strengthen) the model.

    • (Score: 2) by Hartree on Thursday August 29 2019, @02:27PM (7 children)

      by Hartree (195) on Thursday August 29 2019, @02:27PM (#887288)

      "Perhaps, but folding must be predictable else biology would not work."

      Why?

      I'm not being flippant. In fact, I strongly suspect that it's predictable from the physics. But does it have to be?

      I'd say it just has to be consistent in the sense that a given sequence will usually fold to a given final shape (obviously, it's not perfect, else we wouldn't have so much trouble from mis-folded proteins if the final form was uniquely specified by the sequence).

      Evolution is largely a blind search based on mutation and then seeing if the result is usable. Even if there was no feasible way to predict final structure from sequence it should, given enough time, still work just by trying and checking.

      In truth, the observation that, within limits, similar sequences give rise to similar secondary structure and small changes have a limited impact on that indeed indicates it is predictable. That also makes for faster evolution as minor mutations are more common than radical ones.

      We see common motifs that lead us to believe we can predict final folded form from sequences with less effort than actually making/analyzing the protein or doing the computationally prohibitive route of full detail dynamical simulations.

      • (Score: 4, Interesting) by hendrikboom on Thursday August 29 2019, @02:40PM (2 children)

        by hendrikboom (1125) Subscriber Badge on Thursday August 29 2019, @02:40PM (#887292) Homepage Journal

        It's not predictable from the base-pair sequences alone.

        Protein folding is also affected by the so-called chaperones [wikipedia.org].

        Nothing in biology is done in a simple, straightforward way.

        -- hendrik

        • (Score: 2) by Hartree on Thursday August 29 2019, @02:51PM

          by Hartree (195) on Thursday August 29 2019, @02:51PM (#887298)

          I'm familiar with them. That's one part of why I put in that qualification about it not being unique in folding. It also has the problems if the surrounding conditions aren't right it can get folded into a local minima rather than the expected final state.

          Even so, most of chemistry is statistical in nature and much of what's going on in our bodies is based on "usually happens" and then some error checking or damage control after the fact (DNA replication is an excellent example).

          Back in the early 90s, I was part of a group doing protein folding simulations and other biology related chemical physics. I was struck by how "good enough" works amazingly well in living things when it's combined with some error checking or error mitigation system. That the whole house of cards of complex living things works at all is a by our lady miracle, IMHO. :)

        • (Score: 2) by All Your Lawn Are Belong To Us on Thursday August 29 2019, @08:50PM

          by All Your Lawn Are Belong To Us (6553) on Thursday August 29 2019, @08:50PM (#887485) Journal

          "Nothing in biology is done in a simple, straightforward way."

          Need +1 deep mod.

          --
          This sig for rent.
      • (Score: 0) by Anonymous Coward on Thursday August 29 2019, @02:46PM

        by Anonymous Coward on Thursday August 29 2019, @02:46PM (#887296)

        You bring up a good point.
        I'm having a hard time seeing a philosophical difference between consistent and predictable, but there is a practical difference involving computation effort.

        What bothers me with the similar structures leading to similar results is that it seems a bit like cave man logic.

        A cave man knows how to rub two sticks together to make fire. This is very useful for keeping the lions at bay, but not at all the same thing as an understanding of chemistry which permits making things like gunpower which put him in a whole different state. AI seems more like the caveman than the chemist.

        For protein folding, perhaps there are no underlying rules and evolution has found a working set of random things that work. Aside from modeling the basic physics, I'd like to think that there are still some things to discover in the these similar, working structures.

      • (Score: 1) by shrewdsheep on Thursday August 29 2019, @03:09PM

        by shrewdsheep (5215) on Thursday August 29 2019, @03:09PM (#887304)

        Protein folding is actually a stochastic, error prone, process. There is a big apparatus in the cell to recycle mis-folded proteins, the proteasomes which degrade ubiquitine-tagged proteins (these are the mis-folded ones). I am not enough of an expert to know any number (maybe they do not exist) but my intuition would be that at least 10-20% of proteins are mis-folded and directly recycled. Ubiquitin has its name for a reason (the protein that is everywhere).
        With respect to the modeling, I believe a successful model has to model the folding process also instead of predicting folding state from sequence only. This can actually be build into Deep networks quite easily by forcing intermediate layers to predict intermediate states of the folding process (as defined by structural similarity and energy levels, say). TLDR, maybe some elements of this are already used.

      • (Score: 1) by khallow on Friday August 30 2019, @05:08AM (1 child)

        by khallow (3766) Subscriber Badge on Friday August 30 2019, @05:08AM (#887660) Journal

        I'd say it just has to be consistent

        That's the predictability - something that is consistent in the future.

        • (Score: 2) by Hartree on Friday August 30 2019, @01:45PM

          by Hartree (195) on Friday August 30 2019, @01:45PM (#887743)

          That's not quite what they're meaning by predictable in this case.

          They already know that a given sequence will usually fold to a given final form. Predictable here is meaning if you are given a sequence that you don't know the final form of, can you predict what that final form is without actually making the protein in quantity enough to do NMR studies, crystallize it, and do x-ray diffraction studies to figure out experimentally what the final form is.