Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by takyon on Monday May 16 2016, @09:20PM   Printer-friendly
from the better-late-than-never dept.

You may recall that I did a TEDxWellington talk about two months ago. My talk was about sequencing on the Oxford Nanopore MinION. The video of this talk has now been edited and is available on the TEDx Youtube Channel.

Although I haven't explicitly said it in the talk, this is a live demonstration of DNA sequencing, and possibly the first such demonstration outside ones done by Oxford Nanopore Technologies. I spend the first half of the talk stalling for time while the initial QC finished, and then a bit of time after data analysis (i.e. a BLAST search) discussing where we could be heading.

To give a bit of an idea of the challenges involved in doing this, all my equipment for sequencing (excluding laptop) was brought to the venue the day before (for the dress rehearsal) in a 30cm polystyrene cube.

On arrival at the venue, I stored the ONT reagents in a freezer in the nearby kitchenette, and prepared the flow cell about half an hour before my talk. In my lab I had prepared two tubes with pre-mixed reagents (one with library + water + running buffer; one with fuel mix), so I was able to use a fine-nozzle pasteur pipette to do the final mixing and loading onto the flow cell.

I had a slightly flakey USB connection on the MinION, so couldn't start the run off-stage (it was very sensitive to bumping). Despite starting the run during the video prior to my talk, I still had a bit less run time than the 5 minutes I had planned for, so had to tweak my presentation a bit to fit the end of the QC step into my talk.

The sequencing run was carried out using a laptop I had purchased for $900 NZD and set up a couple of weeks prior to the conference. Sequencing was done from battery power only, using the WiFi connection because the wired connection was being used for conference live streaming -- this might be the reason why called sequences took a little longer than a couple of minutes to download onto the laptop. The dress rehearsal the day before was the first time I'd carried out a sequencing run on that particular laptop, and made me aware that the screen resolution was less than the recommended minimum requirements from ONT.

Despite everything that happened, I don't think any of the audience were aware that I had any problems with my run (apart from needing to use my dress-rehearsal backup sequences), which aligns very nicely with the themes of trust and secrecy for this year's conference.

For those interested in looking at the actual reads from that run, I've put the "pass" reads into a dropbox folder.

If you want a little low-hanging-fruit programming project to work on, then you can have a look at improving the recently published open source base callers:

  • DeepNano (Neural network basecaller): Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads; Vladimír Boža et al., Comenius University
  • Nanocall (HMM basecaller): An Open Source Basecaller for Oxford Nanopore Sequencing Data; Matei David et al., Ontario Institute for Cancer Research and University of Toronto

And now, the answers to the Q&A:

takyon (881) asked:

The Nanopore website mentions connecting to a PC or laptop using a USB port. How about a smartphone? What software is used on the PC/laptop/phone to receive data?

Software called MinKNOW operates the MinION from the laptop or pc. It's not smartphone enabled yet but the company has previously expressed a desire to make it easier to take MinION in the field and this is one of the obvious ways to do that.

As well as MinKNOW, Oxford Nanopore is starting to offer other kinds of real time analysis solutions to people who might not have bioinformatics skills but want to perform their own experiment. These are available from Metrichor- one example is a workflow that allows people to identify species in their sample, in real time, against a reference dataset. [response from ONT based on public sources]

Does the MinION dump the raw data to a PC, which will then compress it?

When/where/how should the sequenced genome be compressed?

The electric current across 512 electrical sensing channels is sampled at [currently] 3kHz and transferred to the PC. By modifying the sequencing recipe source code, users have options in software to retain or discard that raw signal data. That signal is then processed into 'event' data which describes contiguous parts of the signal that fit into a similar range of current, generally attributed to the movement of a single base through the nanopore. These data are stored in an HDF5 file, which I believe has a bit of compression applied to it.

There's more information in the event data (and raw signal) than just the 4-base model of a DNA sequence, so I'm not convinced that it's a good idea to only store the final sequence information (and especially not now, given that the base caller needs a lot of improvement). Some time in the not-so-distant future we'll have to think of a better model than what we've got, and stop sweeping "out-of-model" genetic features under the carpet by calling them fancy names like epigenetics. Unfortunately, that also means that some changes need to be made with regards to reference genomes and associated compression schemes. I have no answers for how this might (or should) be done. Given that we're still surviving in the ultra-high-throughput phase of DNA sequencing with text files and gzip compression, I'm not yet convinced that any fancier compression is necessary.

Where do you acquire the graphene for nanopores (assuming that is what it used in current generations of the MinION), and have the costs fallen?

Graphene isn't yet used in the MinION, although this is in the pipeline. The current structure is a synthetic membrane into which biological nanopores are embedded. [response from ONT based on public sources]

Is Oxford Nanopore Technologies involved with the 100,000 Genomes Project or any other emerging population-scale sequencing efforts?

I expect not. ONT tends to do their own internal research in the lab and leave the users of the MinION to do the interesting stuff.

How are competing sequencing products better or worse than MinION, on factors other than portability/lack of portability?

I see sequencing as split into three different generations:

  1. Sequencing by Amplification -- copying entire sequences (e.g. sanger)
  2. Sequencing by Synthesis -- copying single bases (e.g. illumina)
  3. Sequencing by Observation -- no copying involved (e.g. nanopore)

As far as I'm aware, no other companies are trying to do sequencing without some form of synthesis, so it's hard to define a "competing" sequencing product. Some people will argue that PacBio is a direct competitor; here's my response, which I felt I needed to say in the previous discussion because I disagreed with the other answer that was given:

Both PacBio and ONT are producing sequencers that produce long reads off single molecules, but the technology and approach of sequencing is very different. In other words, while the output is [currently] similar, the way you get there is different.

The biggest difference, from my point of view, is that PacBio sequencers carry out sequencing by synthesis, while ONT sequencers (e.g. the MinION) carry out sequencing by observation. This has two fairly big advantages:

  1. Once samples are loaded onto the MinION device, no further reagent loading is necessary to get sequencing working.
  2. The MinION does something very close to model-free sequencing, by observing changes in electric current as the template passes through the pore.

We don't yet know how to properly use the output from the MinION, but a basecaller has been created by ONT to convert signal output into a base sequence. The assumed models are all in software (in the base-caller), and that can be updated and improved later in-silico.

Considering purely what is useful right now, Sequel is a bit cheaper per run ($700 vs $1200) for about five times the theoretical maximum yield (10 Gb vs 2 Gb), as long as you ignore the cost of purchasing (and maintaining) a Sequel.

However, that's not a particularly helpful answer. Most people who are using Sequencing By Synthesis (SBS) machines are probably not going to like (or change to) the MinION any time soon. People don't like change, and will cling onto whatever disadvantages they can find as an excuse to resist the change. These disadvantages for the MinION are fading away as the technology improves, but I doubt they'll ever disappear entirely:

  • The MinION doesn't generate enough data
  • The data it **does** generate is too low quality
  • The sequences it generates can't be processed in the same way as other sequence data

It's probably worth pointing out that the available MinION technology is (for the forseeable future) always going to be better than what can be seen in research papers, and the technology in development (by ONT) is always going to be better than what is available to users. The MinION technology is disruptive, and changes almost everything about how sequencing is carried out. Here's my initial attempt at generating a list of what's the **same** between the MinION and other sequencers:

  • In sample preparation, adapters are added to sequences to provide anchor points for the sequencing process to begin

And here's my list of MinION things that are different:

  • Sample preparation requires no amplification
  • Read length and quality out is associated with the quality of the sample preparation
  • A pipette is the only fluidics required for sequencing
  • Can sequence unknown DNA structures / variants
  • Can sequence hybrid DNA/RNA
  • Can sequence RNA directly
  • Fits into your pocket
  • Powered by USB2
  • Can sequence in real-time using a laptop running on a battery

I'm sure I've missed things off both of those lists; please feel free to add your own contributions.

What achievement is the company using to immediately promote the MinION? For example, are you or partners going out into the field and rapidly sequencing undiscovered bacteria, a certain taxonomical group of plants/animals/fungi, or an endangered population of big cats to preserve genetic diversity?

I need to once again clarify that I'm not an employee of ONT, I'm just a fanboy.

ONT prefers advertising the community achievements rather than what they've done internally in the lab, although they did produce a few posters on upcoming kits for their US meeting a few months ago. The nanopore sequencing community is fairly good at informing ONT about their publications, so the biggest effort on ONT's part in promotion is writing short stories about the community achievements.

The stories are many and various. I used two for my talk (tracking the Ebola virus through Africa, sequencing on the slopes of a volcano in the rainforests of Tanzania), and had to cut out another (about NASA sequencing in microgravity) due to time constraints. Most of the applications have been around things that could be done with other sequencers if samples were taken away and sequenced at another location, but I expect people will eventually get bored of that and explore the technology a bit more.

martyb (76) asked:

There has been much discussion on this site about the-powers-that-be vacuuming up all the information that they can. Not just from government entities (such as the NSA and GCHQ), but also corporate entities, as well (data brokers, insurance companies, Google, etc.)

I have a two-part question:

  1. What do you see as the greatest opportunities in the use of a tool such as MinION?
  2. How can we protect ourselves from having that information used against us?
  1. Making sequencing affordable, useful, and accessible for everyone. As a start, sequencers that are as cheap and as available as a high-end household appliance. Will you upgrade your phone this year, or buy a DNA sequencer?
  2. I don't know. DNA is everywhere. We certainly can't keep people from collecting our DNA, any more than we can stop them from recording our car license plate. With accessible sequencing, the same loss of control will extend to storing DNA sequences on private databases. This is not going to be a universal problem -- there are not many people who would be interested in keeping a record of every DNA sequence they come across -- but it is a problem that exists. Laws and acceptable ethical practise will prevent some bad uses, but not everything.

bitstream (6144) asked:

Is there any hindrance to provide software with the hardware that runs on the free Unixes like FreeBSD and Linux?

The software we're using for interfacing with the hardware is mostly python, and a Mac client is in the works, so I don't think there are any technical issues. ONT seems to be spread far too thinly on the software development front, and aren't particularly keen on releasing specifications for interfacing with their devices using free and open source software. The company appears to be interested in making money out of their software ideas as well as their hardware, and that worries me slightly.

VLM (445) asked:

How do you handle the HIPPA problems of the volume of data?

I guess to expand on what I'm talking about, its one thing to "lock down" and "keep secret" my O+ blood type (err, I think thats what it is, anyway). How do you handle "large amounts" of genetic data?

Sorry, I'm not familiar enough with HIPPA to make informed comments on that. Nanopore data is unlikely to overtake Illumina data in the next few years in terms of the amount of data produced, but the locking up of genetic data for privacy reasons is basically a moot point. Who needs DNA sequences to be stored, when you can just pick more DNA off the street, or out of a rubbish bin? I can store genetic data on a hard drive and keep it in a locked cabinet (or embedded in micro flash storage under my skin somewhere), but there's not really anything I can do to stop my body from discarding dead skin cells, or to stop one of my relatives agreeing to having their DNA sequenced instead. We need to think about a world where the information we can store is no different from the physical things that we have access to.

sbgen (1302) asked:

Is this gadget useful for testing of the food supply? I'm not talking about paranoid GMO stuff, but is it reasonably sensitive such that you could grind up wheat into flour, mix thoroughly, and test for bug DNA to verify bug contamination of the original wheat? Most of the stuff in the flour would be gluten protein and "wheat parts" but could you search for bug DNA specifically instead?

Yes, it is. And the more it is used for that purpose, the more useful it will become because the public databases of discovered sequences will increase in diversity. For my TEDx demonstration I did a live sequencing of a tomato source: tomato that my wife bought at the local market, which I extracted DNA from at home using a mortar and pestle, salt, detergent, a sieve, and some meths. I needed to do a bit of purification and sample preparation in the lab prior to sequencing [ONT is working on fixing that issue with something they call VolTRAX], but about 15 minutes after it was loaded onto the MinION I had sequence that could be BLASTed to a public database of sequences.

I did previously try bread and butter, but my lab skills aren't good enough for the high DNA concentration that was required (1.5μg in 50μl). There's plenty of DNA there, but it was mixed in with a lot of liquid as well. I think given a bit more time and money, I could probably work out a reasonable protocol for bread and butter.

If you just want to look for bacterial sequences in your food, that can be done as well. However, it tends to be the case that 99% of the sample DNA is host sequence and a waste of sequenced reads, so it can take a bit longer to get enough sequence to properly establish the microbial fraction of your sample. I don't think I could achieve that in a 15-minute demonstration.

Anonymous Coward asked:

What are your feelings on the current unreliability of polymorphisms at producing meaningful health outcomes? Besides obvious hereditary diseases and tumor profiling, there doesn't really seem to be much predictive power of having a genome sequence for a patient.

On the subject of polymorphisms, I think we put too much trust in our four-base model of DNA, and also don't put enough consideration into local genomic structure. I spent a couple of years doing research on haplotypes (i.e. combining multiple adjacent genetic variants), and was able to show that results and associations could change depending on whether or not haplotypes were taken into consideration. I also talk frequently with someone who's discovering interesting things about methylation, and have attended talks that discuss how the 3-dimensional structure of DNA can influence gene expression.

With regards to genome sequencing, it's very effective for conditions that have an obvious genetic basis, but also very expensive. Unfortunately there are a whole bunch of unique disease-causing variants floating around (especially in cancer), so it's frequently the case that finding one cause for one person doesn't translate to finding the ultimate cause for everyone.

devlux (6151) asked:

I worked in a lab for years. Paid for part of my college that way. Contamination and cross contamination of samples is always A HUGE problem. I see nothing here that addresses demunging of results or even that there is an attempt.

Consider a sample taken in a hotel room. Even after cleaning, there are literally thousands of people's skin & hair flakes laying about all over the room. There is no way to distinguish one from another and it's unlikely that normal room swabbing would be able to distinguish one person from another.

So how does this system purport to differentiate or at least make some control against contamination?

The biggest contamination issue with the way sequencing is done at the moment is that it requires an amplification step prior to the actual sequencing. This means that any contaminants are also amplified, and possibly preferentially amplified. This is such a big problem that labs are frequently split into pre-amplification (or pre-PCR) areas and post-amplification areas. The MinION has no requirement for amplification, so the reads that come out of the machine are a closer representation of the samples that go in.

Longer reads also help in this regard. The longer the read is, the more chance you have that a given sequence can be distinguished if it came from a different sample.

But at the end of the day, what you get out is only as good as what you put in. If you're putting the contents of someone's garbage into the MinION, don't be surprised if garbage comes out of it.

Anonymous Coward asked:

Is there any relationship between trust/secrecy and the talk topic?

The TEDx speakers for this year weren't **required** to involve trust in their talk, but I'd say that every speaker/performer did have the theme of trust running through their talk at some level.

There were a few trust themes in my talk:

  • I talked in jeans and a T-shirt; not the expected attire from a professional who works on genes
  • False trust in models vs observing the world
  • DNA extraction in front of my kids
  • Anxiety over the capabilities of the device (prerecorded video of a sequencing run)
  • Waiting for a sound clip to play
  • Trust needed prior to technology development
  • Trust needed prior to funding for research

But for me, trust was all over the place. Perhaps it wasn't obvious in the talk as presented, but it was definitely there:

  • My work didn't know in advance about my talk
  • It was my second conference as a speaker
  • I was talking to an audience of ~120, but livestreaming to an audience of thousands
  • I did a live technology demonstration
  • As far as I know it was the first MinION sequencing demonstration carried out by a non-ONT person
  • The device I was using had a flakey USB connection that disconnected when the laptop was moved
  • I didn't know in advance how much time I would get before my talk to start the sequencing run -- I had planned for 5 minutes, but only ended up with about 2 minutes. This meant I had to shift my talk around and fill in time until the run was ready to produce DNA sequences
  • I loaded the samples onto the sequencer using a pasteur pipette
  • I needed to use my backup sequences for demonstrating BLAST searches, because the ONT servers were a bit slower than they were in the dress rehearsal

So, yeah. There was a bit of a relationship between trust and my talk idea.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by bitstream on Tuesday May 17 2016, @07:04AM

    by bitstream (6144) on Tuesday May 17 2016, @07:04AM (#347173) Journal

    Actually asfaik. There is no substantial understanding of the DNA code. It's more like an firmware dump where people find some ASCII strings and experiment with blanking parts or moving them around. There are results but not much understanding in the sense that people can actually read the sequence and understand it's meaning.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 4, Interesting) by gringer on Tuesday May 17 2016, @10:23AM

    by gringer (962) on Tuesday May 17 2016, @10:23AM (#347225)

    We understand a lot about the DNA code, but there's still a lot to learn. We understand that DNA is converted to RNA, which is converted to amino acid sequences (proteins, with other modifications). We know how to use and modify DNA to modify pre-existing coded protein sequences (which was an essential part of creating the nanopore that DNA moves through on the MinION sequencing device), and know how to generate novel protein subsequences from DNA.

    We also know that there's a particular set of genes [nature.com] that creates and maintains a living organism and can be created from scratch, even though a substantial portion of those genes have unknown function.

    We understand a little bit about how DNA packs together, but there's a lot more to learn about DNA modifications even at the linear sequence level that can't be discovered using current sequencing-by-synthesis methods (e.g. methylation, abasic regions, non-standard bases). We don't know all that much about long-range DNA effects (e.g. why does a particular sequence 1 Mbp away affect the expression of a gene). We have a [mostly] one-dimensional view of DNA that is a side-effect of the sequencing that we've been doing up till now. Hopefully by the development of additional observational sequencing technologies, our understanding of how DNA works will expand into the other dimensions.

    --
    Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
    • (Score: 2) by bitstream on Wednesday May 18 2016, @12:46AM

      by bitstream (6144) on Wednesday May 18 2016, @12:46AM (#347580) Journal

      But do the research community understand what specific protein structure a specific DNA sequence will result in? 3 bases will result in one aminoacid asfair. The question then becomes how do one know which such ones that belong together?
      (bonus points for figuring out what all the proteins actually do..)

      And can one decode a protein into DNA sequence and thus find the spots which codes for it?

      • (Score: 3, Informative) by gringer on Wednesday May 18 2016, @01:51AM

        by gringer (962) on Wednesday May 18 2016, @01:51AM (#347612)

        But do the research community understand what specific protein structure a specific DNA sequence will result in?

        Not the 3D Structure. The amino acid sequence is well defined (i.e. the one-dimensional structure), but the 3D structure of proteins depends on a lot of environmental things that are not encoded in the DNA (e.g. pH, solvent concentration, post-translational modification, folding proteins). There are a few patterns of amino acids that tend to generate particular three-dimensional structures, but it's not possible to take a given sequence and know precisely what structure it will form in all conditions.

        And can one decode a protein into DNA sequence and thus find the spots which codes for it?

        This is a little harder to do due to redundancy in the translation. It's much easier to work at this in reverse: carry out six different translations of the genome into amino acid sequences and match the amino acid sequences to the protein. The NCBI tool tBLASTn [nih.gov] will do such a reverse search.

        --
        Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
        • (Score: 2) by bitstream on Wednesday May 18 2016, @02:00AM

          by bitstream (6144) on Wednesday May 18 2016, @02:00AM (#347616) Journal

          Not the 3D Structure. The amino acid sequence is well defined (i.e. the one-dimensional structure), but the 3D structure of proteins depends on a lot of environmental things that are not encoded in the DNA (e.g. pH, solvent concentration, post-translational modification, folding proteins). There are a few patterns of amino acids that tend to generate particular three-dimensional structures, but it's not possible to take a given sequence and know precisely what structure it will form in all conditions.

          But even these environmental factors are a consequence of other codings in the DNA? at least internally.

          • (Score: 2) by gringer on Wednesday May 18 2016, @08:46AM

            by gringer (962) on Wednesday May 18 2016, @08:46AM (#347727)

            But even these environmental factors are a consequence of other codings in the DNA? at least internally.

            That depends on how many turtles you want to count. Protein construction processes can be affected by chemical signals sent through the blood, and also by metal ions moving around in the cells. There are steroids that can be applied with a topical cream, and burrow down through the cell membrane and nuclear membrane, and then bind to receptors attached to DNA [oxfordjournals.org]. If a cell is heated up at the focal point of a lens in direct sunlight, any created proteins will be denatured ("cooked"), and form a structure that is usually more predictable, but quite far from it's native structure in normal physiological conditions.

            I suppose you could argue that our movements and actions are all a result of the programs coded into our DNA, but traveling down that path ends up in big philosophical debates about the nature of decisions, choices, and consciousness.

            --
            Ask me about Sequencing DNA in front of Linus Torvalds [youtube.com]
        • (Score: 1) by anubi on Wednesday May 18 2016, @03:09AM

          by anubi (2828) on Wednesday May 18 2016, @03:09AM (#347640) Journal

          There were a couple of guys out in cyberspace I extremely highly respected... +Fravia and +ORC ( Old Red Cracker ).

          They seemed to have a really uncanny sense of how to reverse software given no glimpse of the source.

          I sure wish those two were still around and took an interest in cracking the genome. If I wasn't so damn old, I would love to get into it myself.

          This is one puzzle I could really enjoy because the results of solving it will be very meaningful. I am not so fond of "guessing-games", where I feel I am just wasting my time - those are more like filling out tax forms. No sense of accomplishment whatsoever for doing anything meaningful, but a relief to have the little box of things people expect from me ticked. However, cracking the genome - and conversely learning how to reassemble it in various ways to do specific things... WOW!

          I can already see the replacement of damned near every petrochemical plant on the planet with biological equivalents - all powered from photosynthesis. Custom trees, if you will, who manufacture via photosynthesis any desired chemical structure.

          --
          "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]