ProteomeCommons.org Tranche Network

The ProteomeCommons.org Tranche Network is a secure peer-to-peer (P2P) network of computers used to share scientific data, primarily proteomics data. Tranche is sometimes referred to as "a BitTorrent for scientific" data because it works similar to the popular BitTorrent tool; however, Tranche includes many security features that make it more appropriate for sharing scientific data sets. The Tranche tools are free to use, open-source, and may be found at http://tranche.proteomecommons.org.

History of Tranche
The ProteomeCommons.org Tranche project started as part of Jayson Falkner's PhD work at the University of Michigan, Ann Arbor, in 2005. The work was done as part of the National Resource for Proteomics and Pathways (Grant# P41 RR018627), directed by Philip C. Andrews. Early versions of Tranche were used to aid in the collection of data for the ABRF sPRG 2006 study, and the first version of the Tranche project was presented at the 2006 ASMS in Seattle, WA. Jayson Falkner graduated in 2008 and work on Tranche is continued by the Tranche group at the University of Michigan and by Single Organism Software Inc (SOSI), a company co-founded by Jayson.

Tranche quickly became a widely used solution to the data sharing problem in proteomics. Several proteomics journals have made recommendations that require data sharing; however, none of those recommendations actually proposed a method of sharing the data. Tranche provides an ideal, free-to-use solution, which is supported a recommendation for use in Nature Biotechnology editor's note. Tranches growth and use can further be observed by the statistics collected and displayed on the tranche.proteomecommons.org web page. Currently more than 5,000 data sets are on-line, including several million files and multiple terabytes of data.

Tranche has been used by several other proteomics resources, including the obvious use by the ProteomeCommons.org data pages. The ProteinPedia project is another example use that annotates proteomics information and links to raw data stored in Tranche. The University of Vanderbilt's Medical Center also uses Tranche to archive several data sets related to its bioinformatics tools, primarily work done by David Tabb. A more verbose list of similar collaborations can be found at http://tranche.proteomecommons.org/examples. While use of Tranche may be completely hidden from users, several groups rely on the Tranche website to directly provide homepages for archives of data that has been collected. The National Cancer Institute (NCI) Mouse Models and CPTAC initatives are two examples of such work, but both the ABRF and HUPO organizations use Tranche in a similar fashion.

Use of Tranche has spread from proteomics in to other disciplines of science, including glycomics, metabalomics, and 2D gel data. However, the ProteomeCommons.org Tranche Network still primarily consists of tandem mass spectrometry proteomics data. Development and support of the Tranche codebase and tools continues to be provided by the University of Michigan. Commercial services related to storing data in Tranche and developing code for Tranche can be obtained from SOSI. The entire Tranche project is open-source, free to use, and anyone may participate in development of the code base.

Key Features of Tranche

Technical presentations and posters are archived on the Tranche download page; however, the key features of Tranche are summarized below.

* Multiple computers distribute the task of hosting data
** Data is typically compressed () and encrypted ()
** Data is split in to 1MB chunks for efficient distribution
* All data is digitally signed by the user(s) that uploaded it
** Files can be blocked if they are uploaded by untrusted users
** Files can be revoked/delete from the network if desired
** Downloads can be verified to ensure the content is as expected, e.g. not a virus
* All data is identified by a digital hash named the "Tranche Hash"
** MD5 + SHA-1 + SHA-256 + file length comprise a "Tranche Hash"
** Uploading the same file twice doesn't require more disk space
** No . Data can be looked up regardless of what server it is currently on
** The hashes enable an efficient, index free method of looking up data on a Tranche network
** Post download, the Tranche hash can be used to check that a file's contents haven't changed since publication
* Clear licensing terms when data is uploaded, including beta-support for


How is Tranche Different than BitTorrent

While Tranche is a P2P tool similar to BitTorrent significant changes were made so that Tranche would be appropriate for the sharing of scientific data sets, or in a broader sense, any data that requires tracking of information related to what users originally uploaded the data. Here is a short list of how Tranche is similar and different to BitTorrent.

Similarities compared to BitTorrent
*Uses P2P practices to efficient share data and scale well with large volumes of users
*Free to use and open-source

Differences compared to BitTorrent
*Licensing terms for use of data must be selected prior to upload
*Users are not anonymous. All data have signatures from the uploading users.
*Data may be revoked from the network if needed
*Individual files may be downloaded from an upload that includes multiple files, e.g. need not download the entire ZIP in order to get one file.
*Data may be passphrase protected (AES 256 encryption) for restricted access
*Primarily used for sharing scientific data. No known illicit content on the network.

Related Publications
*Archive of Tranche Presentations and Posters
*Andrews PC and Falkner JA, "Open Access to Proteomics Data: A Valuable Resource for Biology and Medicine", Journal of Proteome Research 6(6): pp 2047-2048, 2007
*Falkner JA, Hill JA, Andrews PC, "Proteomics FASTA Archive and Reference Resource", Proteomics (2008)
*Falkner JA, Andrews PC, "Publicly dissemination large amounts of proteomics data in a secure and scalable fashion", http://tranche.proteomecommons.org (in submission)
*Falkner JA, Ulintz PJ, and Andrews PC, "A Code and Data Archival and Dissemination Tool for the Proteomics Community", American Biotechnology Laboratory, Apr, 2006


 
< Prev   Next >