In 2003, when an international consortium that included the Baylor College of Medicine Human Genome Sequencing Center released the completed sequence of the human genome, the work was just beginning.

Humans vary across populations, environments and gender, and scientists have been trying to define those changes ever since. In a report that appears online in the journal Nature, another international consortium, including the Baylor Human Genome Sequencing Center, sequenced and analyzed the genomes of more than 2,500 individuals from 26 populations. In their search for structural variants (large-scale variations in the genome that were greater than or equal to 50 base pairs in size), they mapped 68,818 structural variants in the individuals, using new techniques including long-read, single molecule sequencing. The work was part of the final phase of the 1000 Genomes Project designed to identify human genetic variation at low frequency among the world humans.

The authors wrote: “Our study emphasizes the population diversity of SVs (structural variants), quantifies their functional impact and highlights understudied SV classes, including inversions exhibiting marked sequence complexity.”

“Structural variants make up a critical class of genomic variation that likely produces larger functional effects,” said Dr. Fuli Yu, assistant professor in the Baylor Human Genome Sequencing Center.

The analysis of the genome of 2,500 people across five continents showed that more than 200 genes were missing entirely in some people, said one of the project’s leaders, Dr. Jan Korbel, in a released statement. He led the work at the EMBL in Heidelberg, Germany.  Some of the structural variations are different in different populations, the group noted.

“This manuscript highlighted the challenges for reliably detecting structural variations using different technologies,” said Yu. “This paper provided important technological approaches that exploited an ensemble of different sequencing strategies and multiple analytical methods that allowed us to ‘call’ or identify structural variants with high quality.”

The most common structural variations studies included deletions, insertions, duplications and inversion in the genetic material. When these changes interrupt or add material to genes, genetic disease can result. The complexity of structural variations and their location in areas of the genome where genetic material is frequently repeated makes their identification difficult. New technologies, including those pioneered at the Baylor center, have made such work easier, said Yu.

The work identified hot spots of structural variations mutation that cannot easily be explained.

The researchers wrote: “It remains difficult to fully disentangle the contributions of SV mutation rates and selective forces to the observed variant clustering.”

They concluded that individuals harbor a median (average) of 18.4 mega base pairs of structural variants per diploid genome (the double strand of DNA), a finding made up largely of copy number variants (11.3 mega base pairs) and biallelic deletions (5.6 mega base pairs). Biallelic refers to both copies of a gene in the genome.

They noted that current technology does not optimally capture the diploid nature of the genome, but predict that technology will overcome this obstacle.

“Until this is realized, our SV set represents an invaluable resource for the construction and analysis of personalized genomes,” they wrote.

Other Baylor researchers who took part in this work include Richard Gibbs, Min Wang, and Donna Muzny.

Funding for this work came from the National Human Genome Research Institute (Grants U41HG007497, R01GM59290, R01HG002898, R01CA166661, P01HG007497, R01HG007068, RR19895 and RR029676-01). Other grants came from the Wellcome Trust, an Emmy Noether Grant from the German Research Foundation and the European Molecular Biology Laboratory.