user story

Globus enables multi-institutional data sharing and collaboration in break-through discovery

Telomere-To-Telomere Consortium

T2T logo

Globus assists the Telomere to Telomere (T2T) consortium to fill the gaps and generate the first complete human genome assembly


Cover of Science magazine

The Human Genome Project, completed in 2003, covered about 92% of the total human genome sequence. The goal of the Telomere-to-Telomere (T2T) project was to fill the gaps and finish the first complete human genome assembly. The project was led by Adam Phillippy of NIH/NHGRI and Karen Miga of UC Santa Cruz, together with an international collaboration of over 50 institutions.

The Challenges

The technologies to decipher the gaps that remained didn’t exist at the time. But scientists knew that the last 8% likely contained information important for fundamental biological processes. The process of creating a genome assembly consists of sequencing segments of DNA and then computationally assembling them into full-length chromosomes. The missing regions consisted of highly repetitive sequences that were either not possible to sequence with existing technology or to assemble. Additional, highly similar, recently duplicated regions of the genome could not be distinguished from each other for proper placement in the assembly.

Technology Advances Fill the Gaps

The advent of long-read DNA sequence technologies, which are able to capture continuous segments of DNA tens to hundreds of thousand bases length, gives context to repetitive DNA sequences, allowing for their correct assembly. Two different long-read technologies, along with other sequencing techniques to assist in determining the order of DNA segments, provided the data foundation for the T2T genome. The new sequencing techniques are far faster and less costly, however not as accurate as the approaches used in the initial development of a human genome. This necessitated creation of large amounts of data, with new algorithms to build an assembly from diverse input sets.

Globus Enables Multi-institutional Collaboration and Sharing

As this was a distributed, collaborative project with open access data, exchanging large amounts of data in a timely manner was essential. Globus was used to submit and exchange data during the development of the project. The team created a Globus guest collection at the NIH to manage permissions for accessing data in subfolders within the collection. Using the powerful federated identity and group-based access control mechanisms in Globus greatly simplified the process, and enabled collaboration and sharing for the researchers at the 50 institutions involved in the project. During the course of the project 150 users shared over 100 TBs of data, and transferred 250 TBs of data and 168,000 files.

The Results

Image of resolved sequences
Resolved sequences by the T2T-CHM13v2.0 reference genome. Resource: T2T consortium

The release of the T2T genome has provided new opportunities for exploring difficult regions of the genome. It is already providing new insights into human brain-development and other biological mechanisms. While the improvements in medical genomics move slower, the human T2T assembly is now an additional tool to understand the impact of changes in a patient’s genome.

The most immediate results of the human T2T project have been the acceleration of the creation of other complete human and non-human genome assemblies. The protocols developed by the T2T project are being used to assemble genomes from a range of human populations and non-human species. These advances now allow a talented graduate student to produce a T2T genome assembly. A higher quality and less expensive assembly of all species will have a massive impact on biology, evolution, and human health.

Quotes

  • The T2T human genome assembly project was a large, collaborative project involving hundreds of individualists from multiple countries spread across many time zones. Being able to easily and quickly share large amounts of data was key to the success of the project. Globus was the essential tool for data sharing by the project. In my mind, Globus not only moves data, but drives collaborations."

    - Mark Diekhans, Technical Director, UC Santa Cruz Genomics Institute



Read more user stories