Login | Register

Info | Home

BioPHP - Tandem Repeats finder

Original code submitted by joseba
Code bellow is covered by GNU GPL v2 license.

Description

Last change: 2010/10/18 17:04 | Edit description | Recent Changes | Original description
Searches for tandem repeats  within a DNA sequence.
A tandem repeat is a sequence with contains a pattern repeated several
times
as for example in AAAAAgtagtagtagtagtagtagtTTTTTTTTT, which contains a
tandem repeat (subsequence GTA repeated 6 times). Tandem repeats may
suffer insertions and deletions afecting the number of times the pattern
is repeated. When amplifying by PCR the tandem repeat and the sequence
around them, different strains will probably yield bands with different
lengths.
Comparison of patterns allows their usage for epidemiology porpoises

   $sequence is the DNA sequence to be searched
   $min_length is the minimum length of repeated pattern (must be >2)
   $max_length is the maximum length of repeated pattern
   $min_repeats is the minimum number of times the pattern must be
repeated
   $min_length_of_TR is the minimum length of the tandem repeat

   Example:
find_tandem_repeats(\"AAAAAgtagtagtagtagtagtagtTTTTTTTTT\",2,6,3,15)
   Result: Array(
             [0] => Array(
             [start_position] => 5
             [pattern] => gta
             [number_of_repeats] => 6
             [tandem_repeat_sequence] => gtagtagtagtagtagtagt
           ))

This function does not allow errors within repeated pattern
(with are detected by programs specificaly designed to search for tandem
repeat)

Code

Last change: 2010/10/18 17:04 | Edit Code | Recent Changes | Download | Original code
function find_tandem_repeats($sequence,$min_length,$max_length,$min_repeats,$min_length_of_TR){
        $len_seq=strlen($sequence);
        $counter=0;
        for ($i=0;$i<$len_seq-$min_length_of_TR;$i++){
                for ($j=$min_length;$j<$max_length+1;$j++){
                        if (($i+$j)>$len_seq){break;}
                        $sub_seq=substr($sequence,$i,$j);
                        $len_sub_seq=strlen ($sub_seq);
                        $matches=1;
                        $plus_extra=0;
                        while ($sub_seq==substr($sequence,($i+$j*$matches),$j)){$matches++;}
                        if ($matches>=$min_repeats){
                                for ($p=0; $p<strlen($sub_seq); $p++){
                                if (substr($sub_seq,$p,1)==substr($sequence,$i+$j*$matches+$p,1)){$plus_extra++;}else{break;}
                                }}
                        if ((($j*$matches)+$plus_extra)>=$min_length_of_TR){
                                $results[$counter]["start_position"]=$i;
                                $results[$counter]["pattern"]=$sub_seq;
                                $results[$counter]["number_of_repeats"]=$matches;
                                $results[$counter]["tandem_repeat_sequence"]=substr($sequence,$i,($j*$matches)+$plus_extra);
                                $counter++;
                                $i+=($j*$matches)+$plus_extra;
                                }
                }
        }
        return ($results);
}