Wednesday, March 23, 2011

What might a PGM2 look like?

The New England chapter of the Laboratory Robot Interest Group (NE-LRIG) had a nice meeting tonight over at Astra Zeneca's beautiful Waltham campus (woodsy borders, with a view of the nearby reservoir).  The meeting was sponsored by Ion Torrent & they gave one of the three talks.  All three talks were quite good, with my friend & former Millennium colleague Sunita Badola from Amgen leading off with 454 amplicon sequencing in oncology clinical trials, followed by Mark DePristo from the Broad Institute talking about the 1K genomes effort and finally Jason Affourtit from Ion Torrent (he oversees all their field applications scientists) about the Ion platform.  The Ion talk gave a bit of the standard overview, followed by some slides summarizing the talks at Marco Island.

Now, when I've run previous items on Ion Torrent they have garnered a lot of comments.  Some of these are positive, others (and some of my posts) not so much, with some of the commenters most charitably described as downright cynical.  I won't have any data of my own for at least a month (and a machine on my own is still no more than a great desire), but I must say that if Ion Torrent is all smoke and mirrors, as some comments have insinuated, they have an awful lot of good people in on the ruse.  A friend of mine who is a very experienced genomics lab head was raving about her new machine and I happened to talk to another site today and their first four runs have all come in with greater than 2X the number of reads in the specification and very good quality.  About the only thing I've heard that is less than raving is that the quality drops near the end of the reads in a way that the effective read length is sometimes more like 80-90 rather than 100.  Still, if you know this going in you can adapt to it.

A star before and after the talks was a PGM which was available for viewing, along with 314 and 316 chips being passed around (which I was sorely tempted to have disappear into my shirt pocket, but morals prevailed).  These, for example, brought home the fact which I hadn't appreciated before that the 316 has significantly more actual surface area than the 314 chip, though it fits in the same carrier.  The difference between the chips is quite visible in your hand.  The 316 would appear to essentially max out the form factor, so the 318 can't keep up this trend. [corrected 3/25 per Rick's catching the typo]

Something that was emphasized tonight that I hadn't appreciated before is that there are no pumps in the PGM.  Reagents are propelled by argon gas pressure, managed by valves which are themselves actuated by the argon (some electrical widgets ultimately control the valves).  Also, the case apparently encompasses a lot of empty space (the Ion folks were open about this, though the machine was not open to view the innards).  Presumably some of this was a conservative design leaving space for late additions (or perhaps the server), but some had to do with wanting a visually striking design.

Since the PGM instrument itself has little to do with performance of the instrument, there isn't a need to redesign it to address sequencing performance.  However, there might be other reasons.  While it is a relatively small benchtop instrument, space is often at a premium.  A group of us from Infinity visited an academic site nearby today, and their lab made ours look spacious -- every square inch of bench was crammed with machines.  
Furthermore, seeing the machine in person made it clear that in placing it, a lab must leave some space on the right side to access the wash and waste bottles along that side.  Hence, if you really wanted to cram a lab the effective footprint of the instrument is a bit larger.
So, to engage in rank speculation utterly uninformed by any hard facts, I might imagine that a focus for a PGM2 would be an even smaller footprint.  The four tubes in the front, which hold the nucleotides for sequencing, could be rearranged in a manner still artful (perhaps a diamond?) yet far more compact, enabling the side bottles to move to the front.  Perhaps the screen could move down below the instrument -- or become an off-machine accessory capable of driving multiple instruments.  Alternatively, perhaps the screen would be mounted behind the flowcell access hatch.  This hatch on top (dark grey) for placing the flowcell also seems larger than necessary.  So, if you really could combine all these, it could yield an instrument with an effective footprint about one half as wide or maybe better.

A question I didn't think to ask tonight is how sensitive is the instrument to the flowcell being level.  I'm guessing (but certainly not a confident guess) that such devices mostly don't care; at these scales gravity isn't a dominant force.  In that case, the hatch might be turned to open outwards, enabling a redesign to improve the ability to stack the instruments vertically.

Another obvious dimension would be a multi-flowcell instrument ala HiSeq 2000.  Could most of the mechanical simplicity of the system be retained while enabling multiple flowcells to be run in parallel?  That would be the key question.

Of course, the key driver for many of these would be if groups wanted multiple machines to drive very high throughput.  I think there will be a market for this, but it is premature to think it has developed.  And, it will be critical for such operations to have the promised emulsion PCR improvements (or replacement) which is promised.

Tuesday, March 22, 2011

What's On Your Cheat Sheet?

After years of scribbling on a motley collection of pads, during my time at Infinity I've been nearly rigorous about using a single notebook for my notes -- seminar notes, phone numbers, to do reminders, random thoughts (even blog ideas).  The book itself is a cheap permanently bound notebook from the local drugstore; I think they are less than $1 each.
The inside back cover of my notebook is titled "Useful Information", but I don't find very much there.  Mostly it is a lot of conversion factors, but primarily for Imperial units that nobody every uses: gills, hogsheads, penny-weights, rods, scruples and such.  Also such routinely accessed information such as the weight of a bushel of potatoes (60 pounds) or 1 barrel of flour (196 pounds) Other information include a 12x12 multiplication table, which was drilled into me over 30 years ago.  For the metric system, about a third of the page is taken up giving the same series of prefixes with each unit.  Another section has some Imperial to metric conversions.
It's interesting to think about what is curiously absent from the page.  For example, the common measurements for kitchen work, tablespoons and teaspoons, are absent.  Nowhere does a carat appear, nor conversion factors for the three different kinds of ounces (avoirdupois, troy and apothecary, for any European readers blissfully unaware of Imperial units).  I've also missed a "stone", which is a unit of weight that shows up in historical novels -- perhaps it doesn't have precise definition.  The weight of water is given in terms of a cubic ft being 2.48 gallons and weighing 62.425 pounds, rather than the usual "a pint's a pound the world around".
There are two odd values on that page given all this, but that's because I wrote them there.  It is a handy place to stash info, so I have written down that 1 human genome = 6 pg of DNA (checking that in Wikipedia, apparently it is really closer to 7: 6.95 & 6.8 for female and male respectively) .  The other odd value is 1 bp = 660 daltons.
Now, if I'm going to scribble in a few, why not add a bunch?  Indeed, while Google will happily tell me there are 8 furlongs in a mile, it won't directly answer how much a human genome weighs.  Nor will WolframAlpha -- it gave me information on human body weight in pounds.  So, what else could I need there -- and if I were printing up a bunch what would I put there.
Some of the more useful molecular biology reagent catalogs have whole sections of such information.  That is one challenge in designing such a information table; to be really useful it must be packed with information but at a density allowing high readability.  Plus, while the catalogs use many pages, I'm trying to cram it into 1 or maybe a few (the inside cover has an equally useless class schedule grid, useless to me that is).  Should I only put in what I truly can't remember, or also the things I don't have nailed so well that I can reproduce them quickly and confidently?
So, here are my current candidates, some for me and some if I were going to try to make a generally useful one.  Of course, a lot of what is valuable for ready reference depends on what you are doing.  At Codon I had a sheet taped by my desk with the sites for the restriction enzymes I used the most.  If you have a favorite vector, the polylinker map is a useful reference.  On the other hand, Planck's constant is a really important number, but one I've never needed to use in biology.  So I wouldn't bother using space on it.

  • IUPAC ambiguity codes for nucleotides.  Most I know by heart (or figure out quickly; the codes for 3 nucleotides are near the one letter they leave out), but M & K have always been a challenge.  As part of cramming for this post, I now have a mneumonic that works for me: M is Methyl, for A and C, which are capable of being methylated (I think the mnemonic is supposed to be on the native structure, but I don't know that well enough).  K is now the other two.
  • Amino acid single letter codes.  I don't need this, but for a mass produced one it would make sense.
  • The genetic code -- without trying, I have actually memorized this, but I'm not very fast working purely from memory nor am I always confident (which is why I'm not fast)
  • SI prefixes in order.  Again, I know most of these until you get to the two extremes, but usually have to rattle them off in order (milli=-3, micro=-6, nano=-9, pico=-12, etc).  
  • Powers of 2.  For up to 2^12, I can rattle these out.  Higher sometimes comes in handy.
  • Tm calculation estimation using G+C and A+T counts.  I don't use this often & don't really trust it, but for ballparking a Tm it might be worth having around
  • 1 mm^3 = 1 uL and 1000 um^3=1 pL.  Useful little conversions I found when I was exploring emPCR stuff (should I also put the formula for volume of a sphere in there, since I initially wrote it out incorrectly in that post?)
.That doesn't seem like nearly enough to fill up the page, but perhaps that's a good thing.  I probably don't know what else it would be useful to have there, so blank space to scribble more isn't a bad idea.

Friday, March 18, 2011

My Noisy Neighbors

My neighbors are up to their loud antics again; a really wild singles party. Fact of the matter is, I checked very carefully one evening before we bght the place to make sure the sound levels were what I was looking for.

Luckily, I'm not talking about a frat house or a heavy metal bar. A neighboring property has a vernal pool (a body of water which dries out in early summer, and hence cannot support fish) and the spring peepers (aka chorus frogs) tuned up for the first time of the season. Along with visiting a sugar house to see (and smell!) maple sugar being made, it's my favorite part of spring in New England.

Tuesday, March 08, 2011

What will be the last Sanger genome?

Even when I was finishing up as a graduate student, and only a few bacterial genomes had been published, one would periodically hear open speculation as to when the top journals would quit accepting genome sequencing papers. The thought was that once the novelty wore off, a genome would need to be increasingly difficult or have some very odd biology to keep getting in Science or Nature or such.

Happily, that still hasn't happened and genome sequencing papers still show up in the whole range of journals. I don't claim I scan every one, but I do try to poke around in a lot of the eukaryotic papers (I long since gave up on bacterial; happily they have become essentially uncountable). Two recent genomes in major journals, Daphnia (water flea) in Science and Pongo (orangutan, not dalmatian!) in Nature show that the limit has not yet been reached. These papers share another thread: both genomes were sequenced using fluorescent capillary Sanger sequencing.

Sanger, of course, was the backbone of genome projects until only very recently. Even in the last few years, only a few large genomes have been initially published using second generation technologies

Wednesday, March 02, 2011

Emulsion PCR: First Notes

One theme in some of the comments on my Ion Torrent commentary has been around the limitations of emulsion PCR. Some have made rather bold (and negative) predictions, such as Ion Torrent dooming themselves to short read lengths or users being unable to process many samples in parallel without cross-contamination.

Reading these really drove home to me that I didn't understand emulsion PCR. I've done regular PCR (once in a hotel ballroom, of all places!) but not emulsions. It seems simple in theory, but what goes on in practice? My main reasoning was based on the fact that emPCR is the prep method for both 454 and SOLiD; 454 clearly demonstrates the ability to execute long reads (occasionally hitting nearly a kilobase) and SOLiD the ability to generate enormous numbers of beads through emPCR. I also have a passing familiarity with RainDance's technology (we participated in the RainDance & Expression Analysis Cancer Grant program). But, I've also seen a 454 experiment go awry in a manner which was blamed on emPCR -- a small fraction of primer dimers in our input material became a painful fraction of the sequencing reads. Plus, there is that temptation to enter the Life Tech grand challenge on sample prep, or attempt to goad some friends into entering. So it is really past time to get more serious about understanding the technology.

So, off to the electronic library. Maddeningly, many of the authors in the field tend to publish in journals that I have less than facile access to, but between library visits, PubMed Central and those that are in more accessible journals, I've found a decent start.

Tuesday, March 01, 2011

When Will Life Technologies Get Serious About Their Grand Challenges?

My recent run of posts on Ion Torrent certainly garnered a lot of comments, and it would be much less than honest to say that many of the comments were far less favorable to Ion Torrent than what I have written. Indeed, many were not terribly favorable on me given what I had written about Ion Torrent -- one even asked if I "felt used" as part of a publicity stunt. (BTW, I don't -- if I can't ask the hard questions I have nobody to blame but myself).

One Ion's other very public events around the Ion Torrent has been to announce a series of three challenges to improve the performance of the instrument system (a fourth has been announced centered around SOLiD and three others have yet to be unveiled). The winner of a challenge can get $1M in prize money.

Now, contests along these lines have been successfully used by companies and organizations to drive technologies forwards. Netflix successfully crowdsourced better prediction of a user's movie tastes. The most spectacular success for such a contest was the winning entry for the Ansari X-Prize, SpaceShip One. Google is currently sponsoring a contest to land a rover on the moon and transmit HDTV images, which I look forward to eagerly.

Unfortunately, so far Life Technologies & Ion Torrent's contest seems to be all hat and no cattle. While the three goals have been announced (double the output per run, halve the sample prep time and double the accuracy), nothing else is in place. Each competition is apparently separate; there's no prize for halfway success on two of the axes. If they are serious about attracting competitors, they need to get down to brass tacks.

Now, I can't say I'm surprised. Not only has Ion shown a penchant for loudly trumpeting their progress prior to demonstrating it, but in their previous contest showed a certain degree of haste and a few punchlist items. In the first contest, submissions for how to use the instrument were judged to yield two U.S. winners (followed recently by two European winners). Each submission consisted of two parts; in the original rules it wasn't clearly stated what the distinction was between the two parts (perhaps it should have been obvious, but I don't routinely write grants) other than the rules stated a word limit for one of them. Once you tried to submit, however, then the word limit on the second section became apparent. Ion also ended up extending the deadline for submissions, which can either be seen as generous or irritating -- in the latter case, if you've burned midnight oil & spent part of a vacation chopping down an overlong second section to get your entry in on time. Importantly, that contest has a tiny fraction of complexity of any one of these contests.

Starting with, what are the rules? One key question will be around cost. For example, can a winning entry for sample prep use an instrument that costs much more than the PGM? That's not an absurd concept. Can the double the output prize be won by a sample prep process that takes a long time? For example, can I assay to find only DNA-bearing beads & then use a micromanipulator to position them? That is obviously a deliberately absurd proposal. But, unless the rules are carefully crafted someone will attempt a silly entry, and Ion will have a real mess if they are forced to put the laurels on silly.

A key & challenging area is around intellectual property (IP). The first obvious issue in this department is how much IP can you retain when submitting an entry? Obviously Ion isn't interested in paying out $1M to something they can't use -- so is the $1M in effect a license fee (with no royalties?)? On the other side of the IP coin, how much IP can a winning submission use which the submitter does not have rights to? For example, some wag might submit a sample prep protocol that is bridge PCR using in part Illumina reagents. But more complicated would be methods that only an IP lawyer can decide either infringe or build on some prior patent. If it's Life's patent, presumably they wouldn't care -- but an Illumina or Affy patent would be an entirely different fish.

Materials are going to be another critical issue issue for the yield and sample prep challenges. Any reasonable scheme for attacking these is going to get very expensive if complete kits must be purchased each time. For example, you may want to hammer on the beads without ever actually putting them on a chip. Will Ion give at-cost access to the specialized reagents (such as beads)? Furthermore, how much information are they willing to give out on the precise specs. For example, suppose a concept requires attaching something different to the beads than standard -- will specifications be provided to create appropriate beads?

Another key question is what samples? Will Life Tech supply the samples to use for improving yield or does a group get to define them? A devious approach to winning the prize would be to develop a sample which preps very badly with the standard prep. An attempt could be made to legislate this possibility away, but there would be significant advantage to standardized samples. But should these be real world samples, idealized samples (such as a pure population of a single fragment) or deliberately hard real world situations (e.g. an amplicon sample with a high fraction of primer dimers)? In a similar vein, what dataset will be made available for the accuracy challenge?

Now, Life is promising more information this Spring, and since that is still a few weeks away (or do they go by Punxsutawney Phil?). I really hope in the future they try to hold back their announcements until they're really ready to go. It doesn't help that the Twitter rehash scrolling on their screen is full of links that might provide more information, but none of them work. They really need to rethink their strategy of piecemeal delivery that can do nothing but frustrate the possible entrants in the contest.

Part of my frustration is I can't help but ponder throwing my hat in the ring. It's not hard to think of ideas for the sample prep problem and while I couldn't do the experiments I do have friends who could (time to get the core Codon team back in action!). Of coures, working out the IP headache would be an issue (unless the work was done at work, which is sadly too far afield of what we do to be a responsible course). I can also imagine a number of academic groups and even a few companies which might seriously consider entering some of these challenges. I'd love to talk up the accuracy challenge with computer geeks I know. The problems are of a very attractive sort for me -- you can very quickly generate very large and rich datasets, enabling quantitative approaches (such as Design of Experiments) to optimization. A lot of data can be generated without actually running chips but using even lower cost methods (such as microscopy or flow cytometry) to measure key aspects. But with nothing concrete to point at, it seems rather pointless to start scheming.

But, while I can't actually move forward on any of these, I can do a bit of homework on emulsion PCR. I'll try to write up that homework later this week, as it's been informative to me and I believe puts me in a better position to handicap Ion Torrent's claims on sample prep -- and various comments on emPCR from the previous posts.