Login | Register

Info | Home

BioPHP - Linear Correlation and Regresion Curve

Original code submitted by joseba
Code bellow is covered by GNU GPL v2 license.

Description

Last change: 2010/10/18 21:09 | Recent Changes | Original description
Will calculate a lineal relation ship between x and y values by solving a
and b values at y=ax+b. Correlation coeficient will be computed (r)

Code

Last change: 2013/10/20 14:10 | Recent Changes | Download | Original code and
<html>
<head>
<title>Linear Correlation and Regression</title>
<script type="text/javascript">
// next javascript function allows showing or not the table with info
function show() {
        if(document.getElementById('tableinfo').style.display == 'block') {
                document.getElementById('tableinfo').style.display = 'none';
        }else{
                document.getElementById('tableinfo').style.display = 'block';
        }
};
</script>
<style>
#tableinfo {display: none;}
</style>
</head>
<body bgcolor=FFFFFF>
<center>
<h1>Linear Regression and Correlation</h1>
<table width=600>
<tr><td>
<div align=right><a href="javascript:show();">info</a></div>
        <table width=100% id=tableinfo>
        <tr><td>
        <hr width=600 size=3 color=blue>
        In statistics, a <a href=http://en.wikipedia.org/wiki/Linear_regression target=new>linear regression</a> is an approach to model the relationship between a scalar dependent
        variable <b>y</b> and one or more explanatory variables denoted <b>x</b>, and the <a href=http://en.wikipedia.org/wiki/Correlation target=new>correlation</a> is a statistical measurement
        that describes the dependence between both variables.    .
        <p>This tool retrieves for the linear relationship between x and y values (the formula <b>y= ax+b</b>)
         and the <a href=http://en.wikipedia.org/wiki/Pearson_correlation target=new>Pearson correlation coefficient</a> (r) that describes the degree that linear dependence.
        <p>To use this tool, just include in the form <b>x</b> values and the dependent variables <b>y</b>.
        Each value for x and y must be separated by a line break, and the same number of values for x and y are required.
        <p>Often, non-linear relationships between two variables are linealized by applying to x or y values
        their logaritm or squares. You may do it when required by checking the corresponding checkboxes.
        </td></tr>
        </table>
<hr width=600 size=3 color=blue>

<?php

// author    Joseba Bikandi
// license   GNU GPL v2
// biophp.org

error_reporting(0);


if (!$_POST){
   if ($_GET["show"]=="example"){
        // when nothing is posted, and an example is requested
        // example is included within the form
        print_form("10\n20\n30\n40\n50\n60\n70\n80\n90\n100","31\n58\n93\n125\n144\n177\n209\n249\n270\n303");
        // tipical output for results
        print_results(3.03,-0.67,0.999);
        // example is explained
        print_example(3.03,-0.67,0.999);
    }else{
        // print out form
        print_form();

    }
}else{
        // when data is posted
        // get the data
        $vals_x=$_POST["vals_x"];
        $vals_x=preg_replace("/ |\r/","",$vals_x);   // removed spaces and returns

        $vals_y=$_POST["vals_y"];
        $vals_y=preg_replace("/ |\r/","",$vals_y);   // removed spaces and returns

        // parse data to an array
        $vals_x_array=preg_split("/\n/",$vals_x,-1,PREG_SPLIT_NO_EMPTY);
        $vals_y_array=preg_split("/\n/",$vals_y,-1,PREG_SPLIT_NO_EMPTY);

        // Check modifications to data
                if ($_POST["logx"]==1 and $_POST["x2"]==1){die("It is not allowed to apply logaritm and squares to x values.");}
                if ($_POST["logy"]==1 and $_POST["y2"]==1){die("It is not allowed to apply logaritm and squares to y values.");}

        // Apply modifications to data

                // logX
                if($_POST["logx"]==1){
                foreach($vals_x_array as $k => $v){
                        $vals_x_array[$k]=log10($v);
                }}
                // logy
                if($_POST["logy"]==1){
                foreach($vals_y_array as $k => $v){
                        $vals_y_array[$k]=log10($v);
                }}
                // x^2
                if($_POST["x2"]==1){
                foreach($vals_x_array as $k => $v){
                        $vals_x_array[$k]=$v*$v;
                }}
                // y^2
                if($_POST["y2"]==1){
                foreach($vals_y_array as $k => $v){
                        $vals_y_array[$k]=$v*$v;
                }}

        // compute correlation_regression
        $curve=correlation_regression ($vals_x_array,$vals_y_array);

        // print results
        print_form($vals_x,$vals_y);
        if ($curve){
                print_results($curve["a"],$curve["b"],$curve["r"]);
        }else{
                print "Error: input data is not correct";
        }

}

function correlation_regression ($vals_x,$vals_y){
        if (sizeof($vals_x)!= sizeof($vals_y)){return;}
        $sum_x=0;
        $sum_x2=0;
        $sum_y=0;
        $sum_y2=0;
        $sum_xy=0;
        $n=sizeof($vals_x);
        foreach($vals_x as $key => $val){
                $val_x=$val;
                $val_y=$vals_y[$key];
                $sum_x+=$val_x;
                $sum_x2+=$val_x*$val_x;
                $sum_y+=$val_y;
                $sum_y2+=$val_y*$val_y;
                $sum_xy+=$val_x*$val_y;
                //print "$val_x\t$val_y\n";
        }
        //print "<hr>sum_x\t$sum_x\nsum_y\t$sum_y\nsum_x2\t$sum_x2\nsum_y2\t$sum_y2\nsum_xy\t$sum_xy\n";

        // y=ax+b
        // calculate a
        $curve["a"]=($n*$sum_xy-$sum_x*$sum_y)/($n*$sum_x2-$sum_x*$sum_x);
         // calculate b
        $curve["b"]=($sum_y/$n)-($curve["a"]*$sum_x/$n);
        // calculate regression
        $curve["r"]=($sum_xy-(1/$n)*$sum_x*$sum_y)/((sqrt($sum_x2-(1/$n)*$sum_x*$sum_x)*(sqrt($sum_y2-(1/$n)*$sum_y*$sum_y))));
        return $curve;
}

//########print form
function print_form($vals_x,$vals_y,$a,$b,$r){
?>
        <form method=post action="<?php print $_SERVER["PHP_SELF"]; ?>">
        <table width=100%>
        <tr><td align=center>
                Values for x:<br><textarea cols="5" rows="12" name="vals_x"><?php print $vals_x; ?></textarea>
                <br>Apply to x values
                <br><input type=checkbox value=1 name=logx<?php if($_POST["logx"]=="1"){print " checked";} ?>> Log x
                <br><input type=checkbox value=1 name=x2<?php if($_POST["x2"]=="1"){print " checked";} ?>> x<sup>2</sup>

        </td><td align=center>
                Values for y:<br><textarea cols="5" rows="12" name="vals_y"><?php print $vals_y; ?></textarea>
                <br>Apply to y values
                <br><input type=checkbox value=1 name=logy<?php if($_POST["logy"]=="1"){print " checked";} ?>> Log y
                <br><input type=checkbox value=1 name=y2<?php if($_POST["y2"]=="1"){print " checked";} ?>> y<sup>2</sup>

        </td><td align=center valign=bottom>
                <input type=submit value=compute>
                <br><a href=?show=example>example</a>
        </td></tr>
        </table>
        </form>
<?php

}

//########print results
function print_results($a,$b,$r){
        $x="x";
        if($_POST["logx"]==1){$x="logx";}
        if($_POST["x2"]==1){$x="x<sup>2</sup>";}
        $y="y";
        if($_POST["logy"]==1){$y="logy";}
        if($_POST["y2"]==1){$y="y<sup>2</sup>";}
        print "
                <hr size=3 color=blue>
                <table bgcolor=CCCCFF align=center>
                <tr><td>
                Values for curve <b>$y=a$x+b</b>
                <br> a = $a
                <br> b = $b
                <br> Correlation (r) = $r
                </tr>
                </table>";

}

//########print example
function print_example($a,$b,$r){
$apples=round($a*35+$b);
print "
        <hr size=3 color=blue>
        <table width=100%>
        <tr><td>
                <b>Example</b>: The number of apples arriving to the restaurant per box and their weight
                in kilograms were registered. Data is shown in the form above.
                <p>We want to estimate the number of apples in a box when a new box arrives to the restaurant,
                so that we may decide the number of menus with apples we may offer to our clients.
                <p>We have computed the linear regresión between both parameters and we have obtained the
                value a=$a and b=$b to be used in the formula y=ax+b.
                <p>When a 35 kilos box arrives to the restaurant, by applying the formula
                the number of apples in the box is easily estimated:
                <center><p>y= $a *35 + $b =  $apples apples</center>
                <p>As correlation coefficient is good (r = $r), the number of apples computed
                will be a good estimation.

        </td></tr>
        </table>
     ";
}

?>
<hr size=3 color=blue>
</td></tr>
</table>
Source code available at <a href=http://www.biophp.org/stats/linear_correlation_regression/>biophp.org</a>
</center>
</body>
</html>