BioPHP - Tandem Repeats finder
Original code submitted by josebaCode bellow is covered by GNU GPL v2 license.
Description
Last change: 2010/10/18 17:04 | Edit description | Recent Changes | Original descriptionSearches for tandem repeats within a DNA sequence.
A tandem repeat is a sequence with contains a pattern repeated several
times
as for example in AAAAAgtagtagtagtagtagtagtTTTTTTTTT, which contains a
tandem repeat (subsequence GTA repeated 6 times). Tandem repeats may
suffer insertions and deletions afecting the number of times the pattern
is repeated. When amplifying by PCR the tandem repeat and the sequence
around them, different strains will probably yield bands with different
lengths.
Comparison of patterns allows their usage for epidemiology porpoises
$sequence is the DNA sequence to be searched
$min_length is the minimum length of repeated pattern (must be >2)
$max_length is the maximum length of repeated pattern
$min_repeats is the minimum number of times the pattern must be
repeated
$min_length_of_TR is the minimum length of the tandem repeat
Example:
find_tandem_repeats(\"AAAAAgtagtagtagtagtagtagtTTTTTTTTT\",2,6,3,15)
Result: Array(
[0] => Array(
[start_position] => 5
[pattern] => gta
[number_of_repeats] => 6
[tandem_repeat_sequence] => gtagtagtagtagtagtagt
))
This function does not allow errors within repeated pattern
(with are detected by programs specificaly designed to search for tandem
repeat)
Code
Last change: 2010/10/18 17:04 | Edit Code | Recent Changes | Download | Original codefunction find_tandem_repeats($sequence,$min_length,$max_length,$min_repeats,$min_length_of_TR){
$len_seq=strlen($sequence);
$counter=0;
for ($i=0;$i<$len_seq-$min_length_of_TR;$i++){
for ($j=$min_length;$j<$max_length+1;$j++){
if (($i+$j)>$len_seq){break;}
$sub_seq=substr($sequence,$i,$j);
$len_sub_seq=strlen ($sub_seq);
$matches=1;
$plus_extra=0;
while ($sub_seq==substr($sequence,($i+$j*$matches),$j)){$matches++;}
if ($matches>=$min_repeats){
for ($p=0; $p<strlen($sub_seq); $p++){
if (substr($sub_seq,$p,1)==substr($sequence,$i+$j*$matches+$p,1)){$plus_extra++;}else{break;}
}}
if ((($j*$matches)+$plus_extra)>=$min_length_of_TR){
$results[$counter]["start_position"]=$i;
$results[$counter]["pattern"]=$sub_seq;
$results[$counter]["number_of_repeats"]=$matches;
$results[$counter]["tandem_repeat_sequence"]=substr($sequence,$i,($j*$matches)+$plus_extra);
$counter++;
$i+=($j*$matches)+$plus_extra;
}
}
}
return ($results);
}