BioTorrents part II

It's time to watch BioTorrents in action, to see if it is a useful tool for my open laboratory notebook experiment. I have decided to create, upload and seed my torrent(s) from a remote computer (the Albiorix cluster in this case) to make sure the data is available at all times. Although the dataset I will upload in this test is quite small, I figure that if this method turns out to be useful, it will be a good way of distributing the various transcriptom datasets we are currently working on. To carry them around on a laptop in order to seed is not really feasible.

To do this I will use the script available for download here, and mktorrent v1.0 available here. Installation of mktorrent is really simple, just decompress the tarball, cd in to the newly created directory and type make install as root, and it will be installed in /usr/local/bin. The perl script is automatically personalized upon download (will contain a user id and two passkeys) and should therefor not be installed in a directory that is accessible to everyone. Instead, I will use my personal bin/ folder in my home/ directory.

For the dataset to upload, I will start with a couple of files from a small study of the Tic20 gene family, that was recently published in Plant Signaling & Behaviour, as an addendum to this paper in The Plant Journal (my bad, non of them are open access, but (EDIT) they at least one of them can be downloaded from my publications page). In my torrent, I will included the aligned sequences in a nexus file, that also includes the MrBayes block used, as well as a file containing the consensus tree that was published, and the *.mcmc file with data on how the metropolis coupled markov chains behaved during the analysis. Finally, I have also created a small text file with a short description of the three files in my torrent, and stored all files in a directory called Topel_Jarvis_2011_Tic20_PSB.

The script contains brief instructions of its use (just open the file in a text editor). However, running the script with only the "-h" flag, will give even more instructions. The help section explains that the script has two mandatory options; Category and License. The current dataset clearly falls under the "Phylogenetics" category, and after some hesitation, and goggling, I decided to go for the Creative Commons Attribution license (EDIT: The "Public domain" license seems to be more frequently used for the datasets uploaded to BioTorrents). Perhaps it is now considered common knowledge how to choose between the 10+ licences that are presented in the help text. Still, I would appreciate if the BioTorrents site could provide some guidelines, or link to some resource that could help users like me to select a license for their data.

So, here we go!

[mtop@pc158250 torrents]$ -c 5 -l 2 Topel_Jarvis_2011_Tic20_PSB/
Upload Successful! Please start seeding the torrent now.
[mtop@pc158250 torrents]$ 

And there it is. The new torrent has been uploaded to the BioTorrents site, and is visible when including "dead" torrents in the display options. The torrent is flaged as "dead" since I have not yet started seeding. More about that later.

In conclusion, BioTorrents seems to provide all the things I need for sharing data as part of my open laboratory notebook experiment. Compared to for example, with BioTorrents I can distribute data from a project that has not yet been published (or will never be "associated with a publication" as their FAQ state). I can handle uploads etc. from the command line, which makes everything so much easier (no web GUI needed, that can be quite confusing for someone with dyslexia, like myself), and can rely on one of my own servers for seeding my data, instead of having my laptop doing all the job.