Omics! Omics!: July 2008

Thursday, July 31, 2008

Farewell to some bits of olde Cambridge

We sent off one of our departing colleagues in style yesterday, taking him to the finest cuisine in Cambridge: the MIT Food Trucks. These institutions are various privately run trucks serving hot foods, from around the world, to long lines of students. While private, the trucks are sanctioned: not only do they have specially reserved parking spots but they also are listed in the MIT Food Service website.

However, first we had to find them. Their previous locale is now a major hole in the ground, to be filled in with the new Koch Cancer Institute (or some such name). With the MIT web site's help, we were able to find the new location.

During the year I (and others) have discovered two other institutions which were not so lucky.

It was a bit of a shock one day to discover the Quantum Books location cleared out, though not much after recollection. Quantum was a bookstore specializing in technical books -- particularly computing books. It was a handy place to browse such books before investing; I've spent far too much on books that looked good but were awful. The not so much shock was on thinking about it: not only was Quantum getting hammered by the usual Amazon internet tide, but they were in a perfectly awful location. While there might be a lot of commuter traffic, otherwise they were in a nearly retail-free zone that is one of the many crimes against urban design inflicted on Kendall Square in the 60's/70's. They tried to have a children's section and other experiments, but it was hard to see much hope of success. Quantum isn't kaput, but has gone to a nearly totally Internet model, but unless their fans are super-loyal, it's hard to see that lasting long.

Cambridge was once a center of conventional industry. For example, a huge fraction (I forget the amount; it's on a plaque in the park on Sidney Street) of the undersea telegraph cable used in WW2 was created in Cambridge. But fewer and fewer remain. Even in my short tenure at least 2 candy factories have closed, leaving only one left (tootsie rolls!). A prominent paint company moved out a few years ago. Sometimes it's hard to tell what's still active & what is only an empty shell. But not in this case.

There will be no more "goo goo g'joob" in Cambridge; Siegal egg company has not only cleared out but been cleared out -- the building is gone. A distributor of eggs, they were across the street from one MLNM building and adjacent to Alkermes. Indeed, it was that proximity to MLNM that forced me to notice them: their egg trucks would sometimes block Albany Street while backing into the loading dock, trapping the MLNM shuttle van (always with me late for a meeting!). I think the demolition was part of the adjacent MIT dorm construction, but perhaps a new biotech building will go in. By chance, the Google street view catches the building being prepared for demolition.

Will some future writer remark wistfully on the disappearance of biotech buildings from Cambridge? It's difficult to imagine -- but who a century ago could have imagined Cambridge getting out of the business of supplying everyday things.

Wednesday, July 30, 2008

Paring pair frequencies pares virus aggressiveness

Okay, a bit late with this as it came out in Science about a month ago, but it's a cool paper & illustrates a number of issues I've dealt with at my current shop.. Also, in the small world department one of the co-author's was my eldest brother's roommate at one point.

The genetic code uses 64 codons to code for 21 different symbols -- 20 amino acids plus stop. Early on this was recognized as implying that either (a) some codons are simply not used or (b) many symbols have multiple, synonymous codons, which turns out to be the case (except in a few species, such as Micrococcus luteus, which have lost the ability to translate certain codons).

Early on in the sequencing era (certainly before I jumped in) it was noted that not all synonymous codons are used equally. These patterns, or codon bias, were specific to specific taxa. The codon usage of Escherichia is different from that of Streptomyces. Furthermore, it was noted that there is a signal in pairs of successive codons; that is that the frequency of a given codon pair is often not simply the product of the two codon's individual frequencies. This was (and is) one of the key signals which gene finding programs use to hunt for coding regions in novel DNA sequences.

Codon bias can be mild or it can be severe. Earlier this year I found myself staring at a starkly simple codon usage pattern: C or G in the 3rd position. In many cases the C+G codons for an amino acid had >95% of the usage. For both building & sequencing genes this has a nasty side-effect: the genes are very GC rich, which is not good (higher melting temp, all sorts of secondary structure options, etc).

Another key discovery is that codon usage often correlates with protein abundance; the most abundant proteins show the greatest hewing to the species-specific codon bias pattern. It further turned out that highly used codons tend to be most abundant in the cell, suggesting that frequent codons optimize expression. Furthermore, it could be shown that in many cases rare codons could interfere with translation. Hence, if you take a gene from organism X and try to express it in E.coli, it would frequently translate poorly unless you recoded the rare codons out of it. Alternatively, expressing additional copies of the tRNAs matching rare codons could also boost expression.

Now, in the highly competitive world of gene synthesis this was (and is) viewed as a selling point: building a gene is better than copying it as it can be optimized for expression. Various algorithms for optimization exist. For example, one company optimizes for dicodons. Many favor the most common codons and use the remainder only to avoid undesired sequences. Locally we use codons with a probability proportional to their usage (after zeroing out the 'rare' codons). Which algorithm is best? Of course, I'm not impartial, but the real truth is there isn't any systematic comparison out there, nor is there likely to be one given the difficulty of doing the experiment well and the lack of excitement in the subject.

Besides the rarity of codons affecting translation levels, how else might synonymous codons not be synonymous? The most obvious is that synonymous codons may sometimes have other signals layered on them -- that 'free' nucleotide may be fixed for some other reason. A more striking example, oft postulated but difficult to prove, is that rare codons (especially clusters of them) may be important for slowing the ribosome down and giving the protein a chance to fold. In one striking example, changing a synonymous codon can change the substrate specificity of a protein.

What came out in Science is using codon rewriting, enabled by synthetic biology, on a grand scale. Live virus vaccines are just that: live, but attenuated, versions of the real thing. They have a number of advantages (such as being able to jump from one vaccinated person to an unvaccinated one), but the catch is that attenuation is due to a small number of mutations. Should these mutations revert, pathogenicity is restored. So, if there was a way to make a large number of mutations of small effect in a virus, then the probability of reversion would be low but the sum of all those small changes would be attenuation of the virus. And that's what the Science authors have done.

Taking poliovirus they have recoded the protein coding regions to emphasize rare (in human) codon pairs (PV-Min). They did this while preserving certain other known key features, such as secondary structures and overall folding energy. A second mutant was made that emphasized very common codon pairs (PV-Max). In both cases, more than 500 synonymous mutations were made relative to wild polio. Two further viruses were built by subcloning pieces of the synthetic viruses into a wildtype background.

Did this really do anything? Well, their PV-Max had similar in vitro characteristics to wild virus, whereas PV-Min was quite docile, failing to make plaques or kill cells. Indeed, it couldn't be cultured in cells.

The part-Min part wt chimaeras also showed severe defects and some also couldn't be propagated as viruses. However, one containing two segments of engineered low-frequency codon pairs, called PV-MinXY, could but was greatly attenuated. While its ability to make virions was slightly attenuated (perhaps one tenth the number), more strikingly about 100X the number of virions was required for a successful infection. Repeated passaging of PV-MinXY and another chimaera failed to alter the infectivity of the viruses; the attenuation stability through a plethora of small mutations strategy appears to work.

When my company was trying to sell customers on the value of codon optimization, one frustration for me as a scientist was the paucity of really good studies showing how big an effect it could have. Most studies in the field are poorly done with too few controls and only a protein or two. Clearly there is a signal, but it was always hard to really say "yes, it can have huge effects". Clearly in this study of codon optimization writ large, codon choice has enormous effects.

Tuesday, July 29, 2008

The youngest DNA author?

Earlier this year an interesting opportunity presented itself at the DNA foundry where I am employed. For an internal project we needed to design 4 stuffers. Stuffers are the stuff of creative opportunity!

A stuffer is a segment of DNA whose only purpose is to take up space. Most commonly, some sort of vector is to be prepared by digesting with two restriction enzymes and the correct piece then purified by gel electrophoretic separation and then manual cutting from the gel. If you really need a double digestion then the stuffer is important so that single digestion products are resolvable from the desired product; the size of the stuffer causes single digests to run at a discernibly different position.

Now, we could have made all 4 stuffers nearly the same, but there wasn't any significant cost advantage and where's the fun in that? We did need to make sure this particular stuffer contained stop codons guarding its frontiers (to prevent any expression of or through the stuffer), that it possess the key restriction sites and that it lack a host of other sites possibly used in manipulating the vector. It also needed to be easily synthesizable and verified by Sanger sequencing -- no runs of 100 As for example. But beyond that, it really didn't matter what went in.

So I whipped together some code to translate short messages written in the amino acid code (obeying the restriction site constraints) and wrap that message into the scaffold framework. And I started cooking up messages or words to embed. One stuffer contains a fragment of my post last year which obeyed the amino acid code (the first blog-in-DNA?); another celebrates the "Dark Lady of DNA". Yet another has the beginning of the Gettysburg Address, with 'illegal' letters just dropped. Some other candidates were considered and parked for future use: The opening phrase to a great work of literature ("That Sam I am, That Sam I am" -- the title also work!), a paen to my wagging companion,.

But the real excitement came when I realized I could subcontract the work out. My code did all the hard work, and another layer of code by someone else would apply another round of checks. The stuffer would never leave the lab, so there was no real safety concern. So I offered the challenge to The Next Generation and he accepted.

He quickly adapted to the 'drop illegal letters' strategy and wrote his own short ode to his favorite cartoon character, a certain glum tentacled cashier. I would have let him do more, but creative writing's not really his preferred activity & the novelty wore off. But, his one design was captured and was soon spun into oligonucleotides, which were in turn woven into the final construct.

So, at the tender age of 8 and a few months the fruit of my chromosomes has inscribed a message in nucleotides. For a moment, I will claim he is the youngest to do so. Of course, making such a claim publicly is the sure recipe to destroying it, as either someone will come forward with a tale of their toddler flipping the valves on their DNA synthesizer or will just be inspired to have their offspring design a genome (we didn't have the budget!).

And yes, at some future date we'll sit down and discuss the ethics of the whole affair. How his father made sure that his DNA would be inert (my son, the pseudogene engineer!) and what would need to be considered if this DNA were to be contemplated for environmental release. We might even get into even stickier topics, such as the propriety of wheedling your child to provide free consulting work!

Monday, July 28, 2008

The challenge of promulgating bear facts.

I had an opportunity this evening to briefly review the impact of DNA research on the taxonomy and conservation of Ailuropoda melanoleuca which also made me reflect on the frustrating struggle of scientific fact to struggle to the public forefront. Put more simply, we had a bedtime discussion of pandas, DNA, relatedness & poop.

As I've mentioned before, some innocent parental actions resulted in the strong imprinting of pandas on my greatest genetics project, so that our house is now filled with various likenesses of the great bicolor Chinese icons. That I can see only 3 where I am sitting now is surprising -- and partly reflects the fact it is dark outside. We have numerous books on giant pands and the school & public libraries have supplied more, and tonight a new little book from Scholastic arrived mysteriously on TNG's pillow. He was eagerly reading it when he came to the fateful passage "It says they're not bears!". But 'The Boy' knows better, and he knows why.

This is a recurring theme in panda books. For a long time the taxonomic placement of pandas was a matter of great dispute, with some assigning them bearhood, some placing them with raccoons, and some allotting pandas a unique clade. A related question concerned the affinity of giant pandas for red pandas and red pandas with the other carnivores. Finally, in the late 1980's the problem yielded to molecular methods, with the clear answer that pandas are bears, albeit a the root of the ursine tree.

What's surprising is how slowly this information has moved into the world of children's books. Of course, the public & school libraries often have books which predate the great resolution, so they are forgiven. Some explain that pandas are bears, but fail to give the evidence. And a few have caught up. But this Scholastic book wasn't one of them, despite having an original copyright solidly after the molecular studies AND a bunch of professors listed as advisors.

Given that TNG is so fond of pandas, and it is no secret, there are those (often adults) who will attempt to dissuade them in their bearness. So I've tried to coach him in how to go beyond simply asserting that they are bears, but explaining why science classes them so. And for an eight year old, he can give a pretty good 1-2 sentence summary.

Which leads us to scat. He merges the two a bit, certainly because of the affinity of his age group for matters excretory (which, of course, his cunning father considered in introducing this topic!). A key question in panda conservation is how many are in the wild. Between their secretive habits and dense bamboo forest habitat, it is difficult to spot a panda in the wild, let alone make a census (nevermind those questionnaires!). So, as with many wild animals, DNA from panda scat is a convenient way to track individuals, and with this tracking the estimate of the number of pandas has shot up -- from the really depressing (to panda fans) ~1500ish to perhaps about a thousand more -- still in grave peril as a wild species, but a thousand more pandas napping in the woods is something to cheer. Unfortunately, the items on pandas in kids magazines & kids sections of newspapers still often quote the older figure.

A similar sort of experiment came up as an item of controversy earlier this year. There are many things I find admirable about John McCain (which is not synonymous to say I'm voting for him -- I haven't decided & I won't tell once I do!), but his pandering about a bear issue earlier this year wasn't one of them. In his fight against congressional earmarks (a good thing), he had singled out a study in Yellowstone National Park's which was sampling DNA from grizzly scat. Amongst his assaults on this study was the question asked loudly of what good this would do beyond setting up a bear dating service. Now, on the one hand I think scientists should be carefully thinking why this important study is apparently being funded by earmark and not peer review. But it is truly sad when you can explain population sampling to an eight year old, but not to someone older than his father who wishes to run the country. Yes, bear counting isn't quite on the same scale as some of the other great scientific issues which are being discussed this election year. But, given that the source behavior of that study is often cited as a benchmark for veracity ("Does a bear..."), it wouldn't be a bad one to get right.

Sunday, July 27, 2008

Do we know how Velcade doesn't work?

In my recent piece on the proteasome inhibitor Argyrin A, a commenter (okay, so far THE commenter) noted something I can't argue with, that there is not a well nailed-down understanding of why proteasome inhibition is lethal to tumor cells. I probably should write up a further exploration of how they might work, but I really need to skim the literature for any new findings (so far, nothing stunning).

As Yogi Berra might say, Velcade (bortezomib) is effective in cancer except when it isn't. Indeed, in cell lines in culture the stuff is devastating, but that certainly isn't what's seen in the clinic. In that setting, there is this obnoxious tease of a signal in Phase I (remember, in oncology Phase I is tried in patients with the disease, not with healthy volunteers as in most indications) followed by cruel let-down in Phase II. Even in diseases where the drug works, such as myeloma, it doesn't work in all patients and some patients become resistant. Perhaps that resistance is the key to the puzzle: understand how tumors stop being sensitive and you'd understand the ones which are never sensitive to start with.

Three recent papers (in Blood, J Pharmacol Exp Ther & Exp Hematol) have found the same mechanism for this transition. Alas, none are free & I've only read the abstract (one of many reasons to swing by the MIT library soon All point to overexpression and/or mutation in PSMB5, the proteasome subunit which binds Velcade. Two of the papers report different point mutations, but both in the Velcade binding pocket and in at least one a reduced affinity for Velcade was demonstrated. Game, set & match?

Well, perhaps not. First of all, all three studies are in cell lines, two in closely related ones. As noted above, cell lines are highly imperfect for exploring proteasome inhibition in particular (and not uniformly reliable for oncotherapeutic pharmacology in general). Judging from the abstracts, none of them went fishing around in patient samples, or if they did they came up dry. Given that PSMB5 is an obvious candidate gene for bortezomib resistance, I'm pretty sure this one's been hammered on hard by my former colleagues. Nobody likes to publish the Journal of Negative Results, which I'm pretty sure is where it would end up. Almost certainly some patients will be found who went from sensitivity to resistance due to mutations in PSMB5, but at the moment it's not the long-awaited (and much desired/needed) central hypothesis of why proteasome inhibition works and which patients it should be used in.

Thursday, July 24, 2008

Another missed Nobel

The newswires carried the story of Dr. Victor McKusick's passing today. McKusick was the first to catalog human mutations (as Mendelian Inheritance in Man, now better known as OMIM in its Online version), and can be truly seen as one of the founders of genomics. I won't claim to know his full biography, but compiling lists of human mutations way back when probably seemed like a bit of an odd task to a lot of his contemporaries.

This follows the sudden passing of Judah Folkman earlier this year in stealing from us a great light in biology, both of whom which the Nobel Committee failed to recognize.

Of course, there are only three Medicine awardees a year (sometimes the biologists sneak in on the Chemistry prize, but clearly McKusick & Folkman would have been in consideration for the Medicine prize). Nobel picking is a strange and unfathomable world. I'm not complaining about anyone unworthy getting it (though the Nobels have some serious closeted skeletons from the early days -- prefrontal lobotomies for all!), but it's too bad so many miss out who would deserve it.

Monday, July 21, 2008

The curious case of the proteasome inhibitor Argyrin A

A burning set of questions in my old shop when I was there, and I have every reason to think is still aflame, is why does Velcade work in some tumors but not others and how could you predict which tumors it will work in. Does the sensitivity of myelomas & certain lymphomas generally (and a seemingly random scatter of solid tumor examples) to proteasome inhibition follow a pattern? And is this pattern a reflection of the inner workings of these cells or more how the drug is distributed throughout the body?

An even broader burning question is whether any other proteasome inhibitor would behave differently at either level. Would a more potent inhibitor of the proteasome have a different spectrum of tumors which it hit?

Now, while Velcade (bortezomib, fka PS) is the only proteasome inhibitor on the market, it will probably not always be that. Indeed, since Velcade has proven the therapeutic utility of proteasome inhibition, other companies and academics have been exploring proteasome inhibitors. The most advanced that I am aware of is a natural product being developed by Nereus Pharmaceuticals, which I will freely confess to not really following.

The featured (and therefore free!) article in July's Cancer Cell describes a new proteasome inhibitor, another natural product. Argyrin A was identified in a screen for compounds which stabilize p27Kip1, an important negative regulator of the cell cycle. Kip1 is one of the a host of proteins reported to be an important protein stabilized by proteasome inhibition (one of duties back on Landsdowne Street was to catalog the literature on such candidates). While there are probably many ways to stabilize p27Kip1, what they reported on is this novel proteasome inhibitor.

By straightforward proteasome assays Argyrin A shows a very similar profile to Velcade. That is, the proteasome has multiple protease activities which can be chemically distinguished, and the pattern of inhibition by the two compounds is very similar. However, by a number of approaches they make the case that there are significant biological differences in the response to Velcade & Argyrin A.

Now there is a whole lot of data in this paper & I won't go into detail on most of it. But I will point out something a bit curious -- very curious. They performed transcriptional profiling (using Affymetrix chips) on samples treated with Velcade, Argyrin A, and siRNA vs an ensemble of proteasome subunits, each at different timepoints. In their analysis they saw lots of genes perturbed by Velcade but a very small set perturbed by Argyrin A and the siRNA. Specifically, they claim 10,500(!) "genes" (probably probesets) for Velcade vs 500 for Argyrin A. That's a huge fraction of the array moving!

Now, I'll confess things are a bit murky. Back at MLNM I would have had the right tools at my disposal & could quickly verify things; now I have to rely on my visual cortex & decaying memory. But when I browse through their lists of genes for Argyrin A in the supplementary data, I don't see a bunch of genes which are a distinct part of the proteasome inhibition signature. At MLNM, huge numbers of proteasome inhibition experiments were done & profiled on arrays, using a number of structurally unrelated proteasome inhibitors in many different cell lines. Not only does a consistent signal emerge, but when an independent group published a signature for proteasome inhibition in Drosophila there was a lot of overlap in their signature & our signature once you mapped the orthologs.

What's the explanation? Well, it could be that I'm not recognizing what is there due to poor memory, though I'm pretty sure. One thing that is worrisome is that the Argyrin A group's data is based on a single profile per drug x timepoint; there are no biological replicates. That's not uncommon due to the expense and challenge of microarray studies, but good experiments are easy. Nor was there any follow-up by another technology (e.g. RT-PCR) to show the effects across biological replicates or other cell lines. Given that these are in tissue culture cells, which can behave screwy if you stare at them the wrong way, that's very unfortunate. Even small differences in the culturing of the cells -- such as edge effects on plates or humidity differences, can lead to huge artifacts.

Another possible explanation is that the Bortezomib cells were watched too late; the first Velcade timepoint is at 14 hours. After 14 hours, the cells are decidedly unhealthy and heading for death. The right times to sample were always a point of contention, but one suggestion that there is an issue is the lack of correlation between the different timepoints for Velcade vs the strong correlation for the other treatments (Figure 7). That works (in my head at least) in reverse too -- it's downright odd that their other treatments are so auto-correlated between 14 and 48 hours with Argyrin A -- if cells are not yet dead at 14 hours but committed to die, one would expect there to be some sort of movement away from the original profile.

One other curiosity. They do report looking for the Unfolded Protein Response (UPR) and report seeing it in the Velcade treated cells but not Argyrin A treated ones. The UPR is the cell's response to misfolded proteins -- and since disposal of misfolded proteins is a role of the proteasome, it has never surprised anyone that the UPR is induced by proteasome inhibitors. Can you really have a proteasome inhibitor that doesn't induce the UPR? If this is truly the case, it is very striking and deserves its own study.

Is the paper wrong? Obviously I can't say, but I really wonder about it. I also wonder if the referees brought up the same questions. Hopefully we'll see some more papers in the future which explore this compound in a wider range of cell lines and with more biological replicates

Nickeleit et al
Argyrin a reveals a critical role for the tumor suppressor protein p27(kip1) in mediating antitumor activities in response to proteasome inhibition.
Cancer Cell. 2008 Jul 8;14(1):23-35.

Wednesday, July 16, 2008

Forging into the gap

Gaps are important. There is a major brand by that name. Controversy over a perceived "missle gap" was a major issue in the Nixon-Kennedy election of 1960. Budget gaps cause governments to trim services. About a half an hour's drive west of where I grew up is the town of Gap, and a bunch of generations ago my ancestors probably passed through the Cumberland Gap.

Gaps occupy a special place in computational biology, specifically in the alignment of sequences and structures. As sequences evolve, they can acquire new residues (insertions) or lose residues (deletions), and so if we wish to align a pair of sequences we must put a gap in. Pairwise algorithms such as Needleman-Wunsch-Sellers and Smith-Waterman insert the optimal gaps -- given certain assumptions which include, but are not limited to, the match, mismatch, gap insertion and gap deletion penalties. Some pairwise alignment problems have been addressed by even more complicated gapping schemes. For example, if I am aligning a cDNA to a genomic sequence I may wish to have separate consideration of introns (a special case of gaps), gaps that would insert or remove multiples of three (codons) or gaps which don't all in either of those categories.

Multiple sequence alignment gets even harder. There are no exact algorithms to compute a guaranteed best alignment, so all methods have some degree of heuristics to them. Many algorithms are progressive, first aligning two sequences and then aligning another to that alignment and then another and so on, or perhaps aligning pairs of sequences and then aligning the aligned pairs and so on. Placement of gaps becomes especially tricky, as their placement in early alignments greatly influences the placement in later alignments, which could well be a bad thing.

Protein alignments in particular have the problem of trying to serve three masters, who are often but not always in agreement. An alignment can be a hypothesis of which parts of a protein serve the same role, a hypothesis as to which amino acids occupy similar positions in space, or a hypothesis as to which amino acids derive from codons with a shared ancestry. Particularly in the strongly conserved core of proteins these three are likely to be in agreement, but in the hinterlands of structural loops in proteins or disordered regions it's not so clear. There is also a bit of aesthetics that comes in; alignments just look neater and simpler when there are fewer gaps. Perhaps not quite Occam's Razor in action, but simplicity is appealing.

The June 20th issue of Science (yep, Science & Nature have been piling up) has a paper that addresses this issue and builds an algorithm unapologetically aligned to just the one goal: find the most plausible evolutionary history. They point out that while insertions and deletions are treated symmetrically by pairwise programs, they are quite asymmetric for progressive multiple alignment. The alignment gets to pay once for deleting something, but insertions (like overdue credit cards) incur a penalty with each successive alignment. It seems unlikely that nature works the same way, so this is undesirable.

One solution to this has been to have site-specific insertion penalties. Loytnoja & Goldman point out that this compensation often doesn't work and causes insertions to be aligned which are not homologous, in the sense that they each arose from a different event (indeed, these insertions should not be aligned with anything from an evolutionary point-of-view, though structurally or functionally an alignment is reasonable).

As an alternative, their method flags insertions made in early alignments so that they are treated specially in later alignments. The flagging scheme even allows insertions at the same position to be treated as independent -- they neither help nor penalize the alignment and are reported as separate entities.

Using synthetic data they tested their program against a number of other popular multiple aligners and found (surprise!) it did a better job of created the correct alignment. They also simulated what getting additional, intermediate data does for the alignments -- and scarily for the older alignment programs gap placement got worse (less reflective of the actual insertion/deletion history of the synthetic data).

The article closes with an interesting question: has our view of sequence evolution been shaped by incorrect algorithms? Is the dominant driver of sequence change in protein loops point mutants or small insertions/deletions.

Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis
Ari Löytynoja and Nick Goldman
http://www.sciencemag.org/cgi/content/abstract/320/5883/1632
p. 1632

Tuesday, July 15, 2008

If life begins at conception, when does life start & when does it end?

Yesterday's Globe carried an item that Colorado is considering adopting a measure which would define a legal human life as beginning at conception. Questions around reproductive ethics and law raise strong emotions, and I won't attempt to argue either one of them. However, law & ethics should be decided in the context of the correct scientific framework, and that is what I think is too often insufficiently explored.

Defining when life "begins" is often presented as a simple matter by those who are proponents of "life begins at conception" definition. However, to a biologist the definition of conception is not so simple. Conception involves a series of events -- at one end of these events are two haploid cells and at the other is a mitotic division of a diploid cell. In between a number of steps occur.

The question is not mere semantics. Many observers have commented that a number of contraceptive measures, such as IUDs and the "morning after" pill would clearly be illegal under such a statute, as they work at least in part by preventing the implantation of a fertilized egg into the uterine wall. Anyone attempting to develop new female contraceptives might view the molecular events surrounding conception as opportunities for new pharmaceutical contraceptives. For example, a compound might prevent the sperm from homing with the egg, binding to the surface, entering the egg, discharging its chromosomes, locking out other sperm from binding, or prevent the pairing of the paternal chromosomes with maternal ones (there's probably more events; it's been a while since I read an overview). Which are no longer legal approaches under the Colorado proposal?

At the other end, if we define human life by a particular pairing of chromosomes and metabolic activity, then when does life end? Most current definitions are typically based on brain or heart activity -- neither of which is present in a fertilized zygote.

Again, the question is not academic. One question to resolve is when it is permissible to terminate a pregnancy which is clearly stillborn. Rarer, but even more of a challenge for such a definition, are events such as hydatiform moles and "absorbed twins".

In a hydatiform mole an conception results in abnormal development; the chromosome complement (karyotype) of these tissues is often grossly abnormal. Such tissues are often largely amorphous, but sometimes recognizable bits of tissue (such as hair or even teeth) can be found. Absorbed twins are the unusual, but real, phenomenon of one individual carrying a remnant of a twin within their body. Both of these conditions are rare (though according to Wikipedia in some parts of the world 1% of pregnancies are hydatiform moles!) but can be serious medical issues for the individual carrying the mole or absorbed twin.

Are any these questions easy to answer? No, of course not. But they need to be considered.

Wednesday, July 09, 2008

Do-it-yourself genomics: bad advice is bad advice

GenomeWeb's frequently entertaining Daily Scan notes that Wired magazine has a wiki which gives instructions on how to explore your own genome, including how to do your own genetic testing by home-PCRing your DNA and sending it to a contract lab for sequencing.

It isn't a very good idea, but that doesn't mean people won't try it. Doing a simple PCR really is pretty easy; I've done it in a hotel ballroom (proctoring a high school science fair sponsored by Invitrogen). Instructions for homebrew thermocyclers are surely out there; a number were published in the early days of PCR. But that doesn't mean getting good results is easy. Sticking to a purely technical level, are Wired's instructions very good?

I'd say no. I suppose I should even register to edit the wiki, but at the moment I'll limit myself to pointing out some of the technical issues that are ignored or glossed over (the material I quote below may well change, since it is a wiki).

The first obvious area is primer design. Wired's instructions are pretty simple

Designing them may be the hardest step. Look up the DNA sequence flanking your genetic marker of interest in a database like dbSNP. Pick a segment that is about 20 bases long and slightly ahead of the marker. That is your forward primer. Pick another 20ish base sequence that is behind the region of DNA that you want to study. Use a web app of your choice to find its reverse complement.

Alas, this will frequently be a recipe for disaster. As for my own qualifications for making that claim I will state that (a) I regularly design PCR amplicons in my professional life and (b) I have a much greater appreciation for my ignorance about how PCR can go awry than the average biologist. Leading the list of pitfalls is designing a primer with too low a Tm -- if those 20 nucleotides are mostly A & T, it won't work well. Second would be if the two primers will anneal to each other; you'll get lots of primer-dimer and little else. Equally bad would be a primer that can prime off itself. Third would be if the primers aren't specific to your targeted region of the genome. Prime off a conserved Alu piece and you are in real trouble.

The really silly part about this advice is that there are free primer design programs all over the internet, and some of the sites will perform nearly all of the checks mentioned above.

The rules for placement are much trickier than suggested. If you are going to sequence (and you might be sequencing heterozygous DNA; see below), then you really need the primers to be at least 50 nucleotides away from what you care about -- there is a front of unincorporated dye which often drops the quality any closer than this.

Even more of a concern is the sequence data itself. Wired makes it sound easy

Once that's done, you can buy sequencing equipment and do it yourself, or send the sample off to any one of many sequencing companies and they will do it for about five dollars.

If you are sequencing uncloned PCR products, then you are sequencing a population. If you are heterozygous for a single nucleotide, that means that nucleotide will read out as a mix -- two overlapping peaks of perhaps half height. A deletion or insertion ("indel") will make the trace "double peaked" from that spot on.

Those are the best case scenarios. If you had poor quality amplification (due to badly designed primers or just a miserable to amplify region), all those truncated PCR products will be in the sequencing mix as well -- further degrading your signal. If your SNP is in a region expanded due to copy number variation, then life is even harder.

Which gets to another point: Wired seems to be ignorant of copy number variants. Their testing recipe certainly won't work there.

The idea of untrained, emotionally involved individuals trying to interpret good genetic data is scary enough (Wired's example of celiac disease, as pointed out over at DNA and You, is a particularly problematic one); scarier is to overlay lots of ambiguity and error due to sloppy amateur technique. Hopefully, few will have the energy & funds to try it.

Monday, July 07, 2008

History Forget: How not to explain the impact of Prozac

Having escaped the usual abode for the weekend, there were a pile of the accumulated newspapers to digest on the train this morning. The Sunday Globe Ideas section caught my eye with an item by Jonah Lehrer titled "Head Fake: How Prozac sent the science of depression in the wrong direction". It's not an awful article -- once you get past that subtitle. But, it isn't a great article either.

The article puts forth the thesis that Prozac led to a chemical theory of depression, which recent literature has seriously upended. Alas, that greatly distorts the history.

Prozac was not the first successful drug nor the real antecedent to a chemical theory of depression. Early antidepressives such as the tricyclics and monoamine oxidase inhibitors opened the path to thinking that depression was due to imbalances in specific neurotransmitters. Prozac itself, as a Selective Serotonin Reuptake Inhibitor (SSRI), was an outgrowth of that work -- given the previous success with psychoactive drugs which seemed to affect many neurotransmitters and evidence that specific neurotransmitters might be more important for specific psychological diseases, it was natural to try to zoom in on one neurotransmitter. Prozac then is not a paradigm shifter (ala Kuhn) but was an extension of the existing paradigm. The success of SSRIs, partly due to a significantly attenuated side effect profile and partly due to a lot of popular press and partly due to marketing, merely pushed an existing theory up the ranks, particularly in the popular zeitgeist.

Lehrer does do a nice job of summarizing some recent work suggesting how antidepressants may really work, which is that they may help neurons heal (a new paradigm of depression as a neurodegenerative disease). In a recent conversation a clinician acquaintance noted to me some of the same key points (I'll confess to having not read the literature myself), so there's nothing wrong here. He also notes that it was the investigation of inconsistencies of observation with the predictions of the chemical imbalance theory, such as the frequently observed time lag between beginning antidepressant therapy and seeing results, which led to the new theory.

But getting back to that irksome subtitle, did Prozac steer "the science of depression in the wrong direction" or simply on a winding path? Yes, the chemical imbalance theory looks like it may be down for the count. However, it was that very same theory, via its shortcomings, that led to the new theory. This is how science works -- it's often indirect & messy. That's an important message that's lost (or nearly so) in the piece. SSRIs were perhaps a blunt tool, but they are the tool which has unlocked a new understanding of the topic.

Could we have gotten to the current understanding of depression without SSRIs and other chemical antidepressants? That's an exercise in alternative history best left to experts in the field, if anyone. Perhaps we might have, but perhaps not -- or would have via an even more tortuous path. It is important to get out the story of how pharmaceutical antidepressants do and do not work, but it is equally important to get out the story of how science really works.

Thursday, July 03, 2008

Myeloma unified?

Multiple myeloma is a complex disease. Perhaps one metaphor is that of the mythical Hydra -- each time a new molecular tool is thrown at it the number of vicious heads increases. For example, there are different chromosomal translocations which lead to myeloma. If you look at myeloma samples by transcriptional profiling, then one can find distinct expression signatures for each translocation -- and just as easily find ways to split those signatures into further subtypes. For example, some translocations activate one gene disrupted by the translocation whereas other instances of the same translocation will activate both deranged genes.

Another possible metaphor is the old fable of blind men examining an elephant -- each reports that the object is different, based on examining a different portion of the beast. In the case of myeloma, one examiner might focus on the subset with large portions of the genome amplified, others on specific deletions on chromosome 13, another on those cases where bone destruction is rampant. My own experience with palpitating the pachyderm looked at the response to a specific drug.

Now the Staudt lab has come out with a paper in Nature which proposes lumping everything back together again. Initially using a retroviral RNAi screen they identified the transcription factor IRF4 as a unifying theme of myeloma. IRF4 is activated in one characteristic translocation and plays an important role in B-cell development, so it's not a total shock. But linking it across multiple types is surprising.

The screen achieved 2-8 fold knockdown of IRF4 in 3 different myeloma cell lines, each possessing a different hallmark translocation (one of which was an IRF4 translocation). This was later extended to additional myeloma lines with similar lethality, but the knockdown of IRF4 in lymphoma lines had little effect, save one line possessing a translocation of IRF4.

One interesting surprise is that with the exception of the known IRF4 translocation bearing line, none of the lines have amplifications or other obvious derangements of IRF4. Only one showed point mutations upon resequencing. Hence, somehow IRF4 is being activated but not via a painfully obvious mechanism.

RNAi approaches can suffer from off-targets, genes not meant to be hit which cause the phenotype being studied rather than the believed target. The paper provides strong evidence that the effects really are driven by IRF4 knockdown -- not only were multiple shRNAs targeting IRF4 found to kill myeloma cells, but one of these targets the 3' untranslated region of IRF4 -- and the phenotype could be rescued by expressing IRF4 lacking the 3' UTR.

Transcriptional profiling of the knockdown lines in comparison with parental lines revealed a number of candidate IRF4 targets, and a large number of these were also identified by chromatin immunoprecipitation-chip (ChIP-chip) studies, confirming them as direct IRF4 targets. As noted, some direct targets may have been missed by ChIP-chip due to limitations with the arrays used. One other interesting aspect: the IRF4 target list in myeloma lines somewhat resembles a union of that in plasma cells (the normal cell myelomas are most kin to) with that of antigen-stimulated B-cells.

A particularly interesting direct IRF4 target identified in this study is the notorious oncogene MYC. A number of identified IRF4 targets are also known MYC targets, suggesting synergistic activation. They also found that both IRF4 and MYC bind upstream of IRF4 -- suggesting a complex web of positive feedback loops.

An interesting further bit of work targeted various identified IRF4 targets and showed these knockdowns to be lethal to myeloma cell lines. Hence it is suggested that IRF4 ablation in myeloma would lead to tumor cell death by many routes. Mice heterozygous for IRF4 deletion are viable, suggesting that IRF4 could be targeted safely.

The catch would be targeting IRF4 -- transcription factors are on nobody's list of favorite targets. The authors cite as points of optimism approaches targeting p53 & BCL6. However, the p53 targeting route is by inhibiting an enzyme which destabilizes p53, so an analogous approach to IRF4 would require first identifying key determinants of its stability. The BCL6 example they cite uses a peptide mimic, not something the medicinal chemists love much.

Other approaches to targeting IRF4 might focus on "druggable" (if any) genes in the IRF4 target lists, or perhaps something else. I'll try to put together a post next week on one of those candidate elses.

Now that Staudt's group has brought things together, it is tempting to contemplate slicing off some more Hydra heads. How do IRF4 target gene profiles differ across the chromosomal abberation subtypes of myleoma? Do IRF4 targets have any predictive value for determining the appropriate medication or show differential response to different medications?

Omics! Omics!