1,000 genomes pilot template for future projects
The 1,000 genomes project will grow in its next iteration, sequencing in different ways the genomes of 2,500 people from five large regions of the globe.
The pilot phase of the project, reported in the current issue of the journal Nature, evaluated the extent to which three different methods of sequencing could contribute to the entire picture of human variation.
The 1,000 genomes project is an international effort involving researchers from around the globe to determine the extent of human variation as it exists in the world. Baylor College of Medicine's Human Genome Sequencing Center plays a major role in the design and work of the project.
The different methods included:
- One project, led by Baylor College of Medicine Human Genome Sequencing director Dr. Richard Gibbs, provided the most complete sequencing information on the exons (or coding regions) of 1,000 genes in 697 people from seven populations.
- The other used a variety of sequencing technologies to sequence the genomes of two nuclear families that included two parents and a daughter at high coverage (which means in acting detail). Each sample was sequenced 20 to 60 times.
- The third project sequenced the genomes of 179 people from four populations at a lower coverage rate.
The project's fast pace was made possible only by next-generation sequencing technology, which can produce thousands or millions of sequences rapidly. The techniques involved allow researchers to evaluate all the rare variants found in areas of the genome known to be associated with human disease.
"On average, each person carries approximately 250 to 300 loss of function variants in annotated genes (genes for which the function is known) and 50 to 100 variants previously implicated in inherited disorders," they said.
Data from the project are available through the 1,000 Genomes web site or from the National Center for Biotechnology Information (FTP) at or the European Bioinformatics Institute (FTP). Researchers with limited computing power will be able to access the data through Amazon Web services through the company's Elastic Compute Cloud (AmazonEC2). The database contains all forms of variation found in the genome from single changes called single nucleotide polymorphisms (SNPs), to small insertions and deletions (of genetic material) to the large changes in the structure and number of copies of chromosomes called copy number variations.