By Branwyn Wagman, Center for Biomolecular Science & Engineering, UC Santa Cruz

Scientists have sequenced the genome of one of the iconic Galapagos finches first described by Charles Darwin. The genome of the medium ground finch (Geospiza fortis) is among the first of a planned 100 genomes of vertebrate species to be sequenced and released by an international collaboration between the Genome 10K project and Beijing Genomics Institute (BGI).

This finch genome, the first of the BGI-Genome 10K collaboration to be made available through the UCSC Genome Browser, represents both a scientific and a symbolic advancement, according to Duke University associate professor Erich Jarvis, who studies the neurobiology of vocal learning in songbirds.

“The scientific advancement,” Jarvis said, “is that it will allow us to investigate the genomes of a group of closely related species with a significant amount of diversity on an island population, allowing us to potentially better understand the genetics of trait evolution.”

Jarvis added, “It is symbolic because it was the diversity of phenotypes in these finches that contributed to Darwin’s theory of evolution.”

Endemic to the subtropical or tropical dry forests and shrublands of the Galapagos Islands, this species evolves rapidly in response to environmental changes.

The BGI’s associate director of research, Goujie Zhang, said, “These finches are of great historical significance, but when Darwin first studied these birds, he was unlikely to have envisioned how this species would become a perfect model to study evolution in action.”

Zhang said, “Having the reference genome of this species has opened the door for carrying out studies that can look at real-time evolutionary changes on a genomic level of all of these enigmatic species.”

Jarvis said this new genome will help us understand the evolution of vocal learning, “The availability of the Geospiza fortis will allow us to validate findings so far only found in the zebra finch genome.”

Jarvis said these include genes with positively selected mutations involved with the vocal learning trait in finches and also with behavior necessary for spoken language in humans.

Jarvis added that the medium ground finch has several song types, whereas the zebra finch “is a more stereotyped vocal learning species.” This difference is expected to be under genetic control.

Adding richness to the possibility of understanding the genomic components of vocal learning, researchers have been recording Geospiza songs over the last 40 years.

Jarvis said these recordings reveal dialectic patterns that can now be linked to the genome by sequencing the genomes of additional individuals from living and past populations. “Like human spoken language, Geospiza song dialects are stable over many generations, but can change with emigration.”

“Having the well assembled draft reference genome of one individual will now allow scientists to determine if this cultural evolution is partly affected by genetics or is all pure cultural transmission,” Jarvis added.

In addition to being useful for investigating speciation, the genomic data can help in conservation efforts, Zhang said. “They also serve as the base for population studies that will aid in the conservation of these renowned finches. BGI is looking forward to working with any collaborators interested in joining us to carry out this work.”

Zhang said the medium ground finch genome, which is nearly one-third the size of the human genome, was sequenced from an individual female, producing a high-quality draft using 115X coverage data from the Illumina HiSeq sequencing system, which is considered a “next-generation” technology. With the aid of transcriptome data, BGI was able to annotate 16,286 protein-coding genes in this genome.

Professor Jun Wang, CEO of BGI, indicated the groups acted to spur the most rapid access and use of these data and to enable and encourage pre-publication use.

For that reason, in addition to releasing the medium ground finch genome on the UCSC Genome Browser, BGI has openly released the genome in their GigaScience journal’s database, GigaDB, which makes data available in a citable format and hosts it under a CC0 license that provides the least restrictions possible for data use.

According to Genome 10K co-founder Stephen O’Brien, “The genome sequence empowerment of Darwin’s finches will initiate the solving of evolutionary riddles that have puzzled biologists for a century.” O’Brien is now chief scientific officer and director of the Dobzhansky Center for Genome Bioinformatics at St. Petersburg State University, Russia.

Oliver Ryder, director of genetics at the San Diego Zoo Institute for Conservation Research, placed the new genome in a larger context: “The availability of this high quality genome assembly produced by BGI will facilitate the stewardship of earth’s biodiversity—a cherished goal of Genome10K.”


The Genome 10K project aims to assemble a genomic zoo—a collection of DNA sequences representing the genomes of 10,000 vertebrate species, approximately one for every vertebrate genus. The trajectory of cost reduction in DNA sequencing suggests that this project will be feasible within a few years. Capturing the genetic diversity of vertebrate species would create an unprecedented resource for the life sciences and for worldwide conservation efforts.

The growing Genome 10K Community of Scientists (G10KCOS), made up of leading scientists representing major zoos, museums, research centers, and universities around the world, is dedicated to coordinating efforts in tissue specimen collection that will lay the groundwork for a large-scale sequencing and analysis project.


The genomic data for the medium ground finch (Geospiza fortis) can be visualized on the UCSC Genome Browser and accessed from BGI’s GigaScience database and can be cited as follows:

Zhang G et al (2012): The genome of Darwin’s Finch (Geospiza fortis). GigaScience.

Photo: flickr