Jump to content

Mathematical formula to calculate spikes in line graph data?


D.Rattansingh
Go to solution Solved by Psycho,

Recommended Posts

Hope I can post this question here. 

 

So I've got some data I'm pushing into a line graph. Now I want to retrieve high 'spikes' in the graph e.g. 

 

A - 34

B - 23

C - 96

D - 54

E - 44

 

The value for D is much higher than the rest, this I want to retrieve. Is there any mathematical formula to do this or could help in the development of a programming algorithm?

Edited by D.Rattansingh
Link to comment
Share on other sites

Not the max, the data was just a sample. It's dynamic data which is being generated on the fly. Thus if the data is now:

 

A - 2

B - 87

C - 96

D - 9

E - 44

I want to retrieve that B and C is much higher than the rest while E is medium

 

A - 76

B - 81

C - 68

D - 74

E - 77

I want to retrieve that none had any major spikes compared to the other, they are all roughly around the same average.

 

I hope I'm being clear enough. There's about 15 clumps of data sets where the value in each of these sets are dynamic and I want to retrieve high spikes or low or even medium. If there is a way to determine this in mathematics, then it can be coded

Edited by D.Rattansingh
Link to comment
Share on other sites

  • Solution

Yes, this can be done, but there is no absolute solution. You will need to determine what constitutes a "spike". For example, what if you had the following data:

 

A: 20

B: 95

C: 25

D: 16

E: 88

F: 29

G: 18

H: 91

 

Would you consider B, E & H as spikes? There are so many of the "high" values that they could be considered normal. There are a few ways I could see achieving this:

 

Determine the average or median, then any point that is X% off from the average/median would be considered a spike. Or, get really geeky and calculate the standard deviation and use that as a measure of determining what a 'spike' is.

Link to comment
Share on other sites

Would you consider B, E & H as spikes? There are so many of the "high" values that they could be considered normal. There are a few ways I could see achieving this:

 

 

OK, I was feeling Geeky. Apparently yo would consider them spikes as they are outside the mean +- standard deviation (see attached resulting image)

 

The code for anyone interested

 

 

 

stddev_form.php

<?php
if (isset($_GET['val'])) {
    $data = json_encode($_GET['val']);
    echo "<img src='stddev_img.php?input=$data' /><br><hr>";
    
}  
?>
<html>
<head>
<meta name="generator" content="PhpED Version 8.1 (Build 8115)">
<title>Standard Deviation</title>
<meta name="author" content="Barand">
<link rel="shortcut icon"  href="">
<meta name="creation-date" content="12/26/2013">
<style type='text/css'>
div.input {
    width: 18%;
    height: 25px;
    float: left;
    margin: 5px;
}
div#wrapper {
    padding: 5px;
    margin-bottom: 15px;
    border: 1px solid black;
    overflow: auto;
}
span {
    min-width: 30px;
    text-align: left;
}
</style>
</head>
<body>
<form>
<h3>Input values</h3>
<div id='wrapper'>
<?php
    $label = 'A';
    for ($i=0; $i<20; $i++) {
        $text = sprintf("%s: ", $label++);
        echo <<<EOT
        <div class='input'>
        <span>$text</span>
        <input type='text' name='val[]' size='3'>
        </div>
EOT;
    }
?>
</div>
<div style='clear:both'>
<input type='submit' name='btnSubmit' value='Process'>
</div>
</form>
</body>
</html>

stsdev_img.php

<?php
if (isset($_GET['input'])) {
    $input = json_decode($_GET['input']);
    $input = array_filter($input);
}
else $input = array();    
if (empty($input))
    $input = array(10,15,20,25,30);
$k = 'A';
foreach ($input as $v) {
    $data[$k++] = $v;
}
$numbars = count($data);
$labels = array_keys($data);
$values = array_values($data);
$stats = stats($data);
$barwidth = 20;
$minheight = 300;
$ymax = ceil(max($input)/10)*10;
$chartheight = $ymax + 50;
$yscale = $chartheight < $minheight ? $minheight/$chartheight : 1;
$chartheight = max($chartheight, $minheight);
$chartwidth = $numbars * 2 * $barwidth + 20;
$xorigin = 20;
$yorigin = $chartheight - 20;

$im = imagecreatetruecolor($chartwidth,$chartheight);
$bg = imagecolorallocate($im, 0xff, 0xff, 0xff);
$barcol1 = imagecolorallocatealpha($im, 0x00, 0xff, 0x00, 50);
$barcol2 = imagecolorallocatealpha($im, 0xff, 0x00, 0x00, 50);
$sdcol = imagecolorallocate($im, 0xff, 0x80, 0xff);
$blk = imagecolorallocate($im, 0x00, 0x00, 0x00);
$gry = imagecolorallocate($im, 0x66, 0x66, 0x66);
imagefill($im, 0,0,$bg);
//
// show mean value and std dev zone
//
$y1 = $yorigin - ($stats['mean'] - $stats['sdev'])*$yscale; 
$y2 = $yorigin - ($stats['mean'] + $stats['sdev'])*$yscale;
imagefilledrectangle($im, $xorigin, $y1, $chartwidth, $y2, $sdcol);
imageline($im, $xorigin, $yorigin-$yscale*$stats['mean'], $chartwidth, $yorigin-$yscale*$stats['mean'], $gry); 

//
// axes
//
imageline($im, $xorigin, $yorigin, $xorigin, 30, $blk);
imageline($im, $xorigin, $yorigin, $chartwidth, $yorigin, $blk);
for ($i=0; $i<=$ymax; $i+=10) {
    $y = ($yorigin - $i*$yscale);
    imageline($im, $xorigin-2, $y, $chartwidth, $y, $blk);
    imagestring($im,1,$xorigin-18, $y-3, $i, $blk);
}
foreach ($labels as $i => $k) {
    $y = $yorigin + 4;
    $x = $i * 2 * $barwidth + $barwidth + $xorigin;
    imagestring($im, 3, $x, $y, $k, $blk);
}
//
// draw the bars for each value
//
foreach ($values as $i=>$v) {
    $y1 = $yorigin-$yscale*$v;
    $color = abs($v - $stats['mean']) > $stats['sdev'] ? $barcol2 : $barcol1;
    $x = $i * 2 * $barwidth + $barwidth + $xorigin;
    imagefilledrectangle($im, $x - $barwidth/2, $y1, $x + $barwidth/2, $yorigin, $color);
    imagerectangle($im, $x - $barwidth/2, $y1, $x + $barwidth/2, $yorigin, $blk);
    imagestringup($im, 3, $x-4, $yorigin-4, $v, $blk);
}

//
// stats values
//
$t1 = sprintf("%-9s %6.2f", 'Mean:', $stats['mean']);
$t2 = sprintf("%-9s %6.2f", 'Std dev:', $stats['sdev']);
imagestring($im,2,2,2,$t1, $blk);
imagestring($im,2,2,12,$t2, $blk);

header("Content-Type: image/png");
imagepng($im);
imagedestroy($im);


function stats(&$data)
{
    $n = count($data);
    $mean = array_sum($data) / $n;
    foreach ($data as $x) {
        $diffs[] = ($x - $mean)*($x-$mean);
    }
    $sd =  sqrt(array_sum($diffs)/$n);
    return array('mean'=>$mean, 'sdev'=>$sd);
}
?>

 

 

post-3105-0-64374800-1388073763_thumb.png

Link to comment
Share on other sites

OK, I was feeling Geeky. Apparently yo would consider them spikes as they are outside the mean +- standard deviation (see attached resulting image)

 

I just picked those numbers at random. Surprising that all three of the high numbers were just outside the max for the standard deviation and the low numbers were just inside the minimum for standard deviation. But, the question I asked was still valid - but would require slightly different numbers. So, let's assume one of the high numbers was 80 and one of the low numbers was 10. The OP would need to determine if the 80 should be considered a spike and, by the same respect, if the 10 should be considered a negative spike. But, if it were me, I would stick to standard deviation calculations until and unless there is a scenario that does not produce the 'expected' results.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.