April 24, 2012 by yamakashi
Visually exploring comparative genomic data is difficult. Not only is the task made difficult by the fact that there is a very large number of genomes that have been sequenced, or are in the process of being sequenced, but also by the fact that the genomes themselves are large and the similarity data is sparse. There have been many efforts to generate visual representations of genome-to-genome relationships. Circos is one such project.
The difficulty in generating graphical representations of comparative data quickly becomes apparent when one explores the data itself. Using the UCSC Genome Viewer Table Browser regions of sequence similarity between dog and human number over 3,700,000. These pairs of related regions provide more than 1-fold coverage of the dog and human genomes – this is possible because coordinates of similarity pairs overlap. Out of all the pairs, the vast majority are small regions (90% of regions are <400bp on dog and <330bp on human). Adjacent groups of such pairs are frequent, indicating contiguity in similarity across large regions of genomes. However, long-range runs of similarity are broken up by gaps in similarity, or runs of similarity to other regions.
More information here:
See on circos.ca