Monday, March 25, 2013

in Mozambique the sunny sky is aqua blue

I really suck at writing in my lab book. So I figure if I can harness my desire for self promotion into a way to actually keep track of what I am doing in lab -- especially my next generation sequencing work -- I can kill two birds with one stone: have an actual record of what I have done and achieve fame fortune and maybe even notoriety in the process. Besides it gets lonely rattling around here inside the computer. All those 0's and 1's with only A, C, T and G to keep me company. and the occasional U of course.

Now I know y'all are dying for a recap of my work so here goes: This particular project started in Mozambique, with the collection of marine specimens, namely Terebrids. Terebrids, much like the more renowned cone snails, are venomous marine snails that use a delicious cocktail of up to 200 toxins to snare their prey. My theory on the 200 toxins is that a slow moving, not terribly bright snail needs all the help it can get to score lunch (usually worms). And such a cornucopia of toxins provides a biochemist with genetic leanings such as myself with a veritable field day of things to do in extracting RNA from venom ducts and performing next generation sequencing (NGS) and analysis.

Just a word or two for the lay person about NGS. Here is a simple recipe: extract some RNA. copy it into cDNA (which has no introns only exons and handy stuff like polyA tails). Blast it to bits (wheeee!). Take those bits and throw them willy nilly on a massively parallel platform (e.g. Illumina) that will start sequencing away like there is no tomorrow. Millions of sequencing reactions are taking place simultaneously, the excitement is almost boundless. And then soon, one day very soon, you will find that you have in excess of 500 million "raw reads" (translation: your chopped up cDNA, now sequenced in lengths of say, 100 bp). All this data (post quality checks) is now your baby, and to think that in this case it all started with these itsy bitsy venom ducts (think very small fingernail trimmings) from the species Terebra anilis

You decide to throw a party. That's a lot of reads! Visions of a paper in Nature Genetics start dancing in your head. So you do indeed throw a party, and guzzle lots of alcohol. Maybe other substances are involved (I couldn't possibly comment my dear. But you might think so.)

And then.... the RECKONING. Not only is your head splitting in two, but your realize you have no freaking clue how you are going to approach this data in order to turn it into something meaningful. I mean you have read all kinds of papers on the subject and have marveled at contig assembly and its associated statistics, pondered the myriad approaches to blasting the data against all those lovely NCBI databases, thrilled to the challenges of doing this all de novo (ie no reference genome or even EST database to map anything to) and you realize.... well quite frankly you realize you don't have a fucking clue.

Fortunately your advisor comes to the rescue. She assures you that this is not a problem, and that it will simply be necessary to just learn how to do it. No need for assistance from any quarter, that bioinformatics class I took in the first year of grad school should see me through. I start babbling about linux and ftp sites and contig assembly programs and perl and python and bash. About which I most assuredly know nothing. as in nada, niente, rien, nicht.

And here my friends is the bare bones of the thing. Will our heroine have the moxie and smarts to triumph over these adverse circumstances? Or will she be flattened by a 2 terabyte computing cluster? Is there a knight in shining armor on the horizon, in the form of the deeply coveted computer geek? Oh Lancelot, where are you in my hour of need?

(stay tuned, but a little spoiler, Lancelot is going to be a she.)