The following software lists all patterns inside a pair or an individual chromosome in FASTA format: findpat-1.1.0.tar.gz. The program was tested on the following architectures: x86, amd64, ppc. It may work in other architectures as well.
The following refereed publications and presentations were derived from this project.
This graph shows, for each pair of human chromosomes, the longest common pattern present in both of them. In the diagonal the longest pattern repeated inside a single chromosome is depicted.
Here you can find some results of our experiments, with several visualization options. Just select a pair of chromosomes to retrieve results of patterns and some visualization for them. Select a chromosome and “null” to get the results for the individual chromosome. Note that some pairs have more results not yet calculated for all pairs (namely, Homo1-HomoN and Homo(2N-1)-Homo(2N)). All downloadable pattern files (preceded by full or full2) have the same format. They list all the patterns in the file or files, each one in two lines. The first line describes the pattern, the number of occurrences of that pattern and the length of that pattern, in a human readable format. The second line lists the positions of each occurrence, separated by a space. All positions are relative to the beginning of the corresponding chromosome (0-based) and are preceded by < or > indicating weither they correspond to the first or second input file.
This file contains 3 text files showing for some pairs of chromosomes (Homo 1 against every other chromosome and Homo 2i-1 against Homo 2i) and all individual chromosomes, how much of it is covered by patterns. In case of pairs of chromosomes, only patterns that are present at least once in each chrosome are counted.
coverage_chromosomes_by_patterns.tar.bz2
Database of all repats in Ensemble, enlarged with dusts (from BLAST) plus the elements obtained by TRF program and Repeat Masker. It is ordered by chromosome number, and starting position of the biological instances. Compiled by Javier Herero at EMBL-EBI UK.
Database of all genes from Ensembl. Compiled by Javier Herero at EMBL-EBI UK.
This file contains a text file showing all patterns reported for Homo 1 that have some of its occurrences overlapping already known patterns and some occurrences that do not intersect at all with them. The file only reports the non-overlapping (i.e., “new”) occurrences. The format is the same than the one in the pattern files above.
This graph shows the coverage of the novel patterns only in chromosome 1 of Homo Sapiens.
This is the coverage table of patterns in chromosome 1 of Homo Sapiens covering known biological repeats. For each biological class, we computed the coverage of each instance by our patterns of length > 39 bp. The table shows the number of instances in each percentil of coverage. The last columns indicate, for each class, the average of the percentage of coverage of the instances. The “total” row indicates the aggregated data over all classes.
covertrascro_all_repeats_homo1.zip
The following members of KAPOW are currently working on this: