Login | Register

Info | Home

BioPHP - Linear Correlation and Regresion Curve

Original code submitted by joseba
Code bellow is covered by GNU GPL v2 license.


Last change: 2010/10/18 17:09 | Recent Changes | Original description
Will calculate a lineal relation ship between x and y values by solving a
and b values at y=ax+b. Correlation coeficient will be computed (r)


Last change: 2013/10/20 10:10 | Recent Changes | Download | Original code and
<title>Linear Correlation and Regression</title>
<script type="text/javascript">
// next javascript function allows showing or not the table with info
function show() {
        if(document.getElementById('tableinfo').style.display == 'block') {
                document.getElementById('tableinfo').style.display = 'none';
                document.getElementById('tableinfo').style.display = 'block';
#tableinfo {display: none;}
<body bgcolor=FFFFFF>
<h1>Linear Regression and Correlation</h1>
<table width=600>
<div align=right><a href="javascript:show();">info</a></div>
        <table width=100% id=tableinfo>
        <hr width=600 size=3 color=blue>
        In statistics, a <a href=http://en.wikipedia.org/wiki/Linear_regression target=new>linear regression</a> is an approach to model the relationship between a scalar dependent
        variable <b>y</b> and one or more explanatory variables denoted <b>x</b>, and the <a href=http://en.wikipedia.org/wiki/Correlation target=new>correlation</a> is a statistical measurement
        that describes the dependence between both variables.    .
        <p>This tool retrieves for the linear relationship between x and y values (the formula <b>y= ax+b</b>)
         and the <a href=http://en.wikipedia.org/wiki/Pearson_correlation target=new>Pearson correlation coefficient</a> (r) that describes the degree that linear dependence.
        <p>To use this tool, just include in the form <b>x</b> values and the dependent variables <b>y</b>.
        Each value for x and y must be separated by a line break, and the same number of values for x and y are required.
        <p>Often, non-linear relationships between two variables are linealized by applying to x or y values
        their logaritm or squares. You may do it when required by checking the corresponding checkboxes.
<hr width=600 size=3 color=blue>


// author    Joseba Bikandi
// license   GNU GPL v2
// biophp.org


if (!$_POST){
   if ($_GET["show"]=="example"){
        // when nothing is posted, and an example is requested
        // example is included within the form
        // tipical output for results
        // example is explained
        // print out form

        // when data is posted
        // get the data
        $vals_x=preg_replace("/ |\r/","",$vals_x);   // removed spaces and returns

        $vals_y=preg_replace("/ |\r/","",$vals_y);   // removed spaces and returns

        // parse data to an array

        // Check modifications to data
                if ($_POST["logx"]==1 and $_POST["x2"]==1){die("It is not allowed to apply logaritm and squares to x values.");}
                if ($_POST["logy"]==1 and $_POST["y2"]==1){die("It is not allowed to apply logaritm and squares to y values.");}

        // Apply modifications to data

                // logX
                foreach($vals_x_array as $k => $v){
                // logy
                foreach($vals_y_array as $k => $v){
                // x^2
                foreach($vals_x_array as $k => $v){
                // y^2
                foreach($vals_y_array as $k => $v){

        // compute correlation_regression
        $curve=correlation_regression ($vals_x_array,$vals_y_array);

        // print results
        if ($curve){
                print "Error: input data is not correct";


function correlation_regression ($vals_x,$vals_y){
        if (sizeof($vals_x)!= sizeof($vals_y)){return;}
        foreach($vals_x as $key => $val){
                //print "$val_x\t$val_y\n";
        //print "<hr>sum_x\t$sum_x\nsum_y\t$sum_y\nsum_x2\t$sum_x2\nsum_y2\t$sum_y2\nsum_xy\t$sum_xy\n";

        // y=ax+b
        // calculate a
         // calculate b
        // calculate regression
        return $curve;

//########print form
function print_form($vals_x,$vals_y,$a,$b,$r){
        <form method=post action="<?php print $_SERVER["PHP_SELF"]; ?>">
        <table width=100%>
        <tr><td align=center>
                Values for x:<br><textarea cols="5" rows="12" name="vals_x"><?php print $vals_x; ?></textarea>
                <br>Apply to x values
                <br><input type=checkbox value=1 name=logx<?php if($_POST["logx"]=="1"){print " checked";} ?>> Log x
                <br><input type=checkbox value=1 name=x2<?php if($_POST["x2"]=="1"){print " checked";} ?>> x<sup>2</sup>

        </td><td align=center>
                Values for y:<br><textarea cols="5" rows="12" name="vals_y"><?php print $vals_y; ?></textarea>
                <br>Apply to y values
                <br><input type=checkbox value=1 name=logy<?php if($_POST["logy"]=="1"){print " checked";} ?>> Log y
                <br><input type=checkbox value=1 name=y2<?php if($_POST["y2"]=="1"){print " checked";} ?>> y<sup>2</sup>

        </td><td align=center valign=bottom>
                <input type=submit value=compute>
                <br><a href=?show=example>example</a>


//########print results
function print_results($a,$b,$r){
        print "
                <hr size=3 color=blue>
                <table bgcolor=CCCCFF align=center>
                Values for curve <b>$y=a$x+b</b>
                <br> a = $a
                <br> b = $b
                <br> Correlation (r) = $r


//########print example
function print_example($a,$b,$r){
print "
        <hr size=3 color=blue>
        <table width=100%>
                <b>Example</b>: The number of apples arriving to the restaurant per box and their weight
                in kilograms were registered. Data is shown in the form above.
                <p>We want to estimate the number of apples in a box when a new box arrives to the restaurant,
                so that we may decide the number of menus with apples we may offer to our clients.
                <p>We have computed the linear regresión between both parameters and we have obtained the
                value a=$a and b=$b to be used in the formula y=ax+b.
                <p>When a 35 kilos box arrives to the restaurant, by applying the formula
                the number of apples in the box is easily estimated:
                <center><p>y= $a *35 + $b =  $apples apples</center>
                <p>As correlation coefficient is good (r = $r), the number of apples computed
                will be a good estimation.


<hr size=3 color=blue>
Source code available at <a href=http://www.biophp.org/stats/linear_correlation_regression/>biophp.org</a>