D.Rattansingh Posted December 24, 2013 Share Posted December 24, 2013 (edited) Hope I can post this question here. So I've got some data I'm pushing into a line graph. Now I want to retrieve high 'spikes' in the graph e.g. A - 34 B - 23 C - 96 D - 54 E - 44 The value for D is much higher than the rest, this I want to retrieve. Is there any mathematical formula to do this or could help in the development of a programming algorithm? Edited December 24, 2013 by D.Rattansingh Quote Link to comment Share on other sites More sharing options...
gristoi Posted December 24, 2013 Share Posted December 24, 2013 $points = array(34, 23, 96, 54, 44); $spike = max($points); Quote Link to comment Share on other sites More sharing options...
D.Rattansingh Posted December 24, 2013 Author Share Posted December 24, 2013 (edited) Not the max, the data was just a sample. It's dynamic data which is being generated on the fly. Thus if the data is now: A - 2 B - 87 C - 96 D - 9 E - 44 I want to retrieve that B and C is much higher than the rest while E is medium A - 76 B - 81 C - 68 D - 74 E - 77 I want to retrieve that none had any major spikes compared to the other, they are all roughly around the same average. I hope I'm being clear enough. There's about 15 clumps of data sets where the value in each of these sets are dynamic and I want to retrieve high spikes or low or even medium. If there is a way to determine this in mathematics, then it can be coded Edited December 24, 2013 by D.Rattansingh Quote Link to comment Share on other sites More sharing options...
Solution Psycho Posted December 24, 2013 Solution Share Posted December 24, 2013 Yes, this can be done, but there is no absolute solution. You will need to determine what constitutes a "spike". For example, what if you had the following data: A: 20 B: 95 C: 25 D: 16 E: 88 F: 29 G: 18 H: 91 Would you consider B, E & H as spikes? There are so many of the "high" values that they could be considered normal. There are a few ways I could see achieving this: Determine the average or median, then any point that is X% off from the average/median would be considered a spike. Or, get really geeky and calculate the standard deviation and use that as a measure of determining what a 'spike' is. Quote Link to comment Share on other sites More sharing options...
D.Rattansingh Posted December 24, 2013 Author Share Posted December 24, 2013 Yes, thanks! Just read up on standard deviation and it's exactly what i'm looking for Quote Link to comment Share on other sites More sharing options...
Barand Posted December 26, 2013 Share Posted December 26, 2013 Would you consider B, E & H as spikes? There are so many of the "high" values that they could be considered normal. There are a few ways I could see achieving this: OK, I was feeling Geeky. Apparently yo would consider them spikes as they are outside the mean +- standard deviation (see attached resulting image) The code for anyone interested stddev_form.php <?php if (isset($_GET['val'])) { $data = json_encode($_GET['val']); echo "<img src='stddev_img.php?input=$data' /><br><hr>"; } ?> <html> <head> <meta name="generator" content="PhpED Version 8.1 (Build 8115)"> <title>Standard Deviation</title> <meta name="author" content="Barand"> <link rel="shortcut icon" href=""> <meta name="creation-date" content="12/26/2013"> <style type='text/css'> div.input { width: 18%; height: 25px; float: left; margin: 5px; } div#wrapper { padding: 5px; margin-bottom: 15px; border: 1px solid black; overflow: auto; } span { min-width: 30px; text-align: left; } </style> </head> <body> <form> <h3>Input values</h3> <div id='wrapper'> <?php $label = 'A'; for ($i=0; $i<20; $i++) { $text = sprintf("%s: ", $label++); echo <<<EOT <div class='input'> <span>$text</span> <input type='text' name='val[]' size='3'> </div> EOT; } ?> </div> <div style='clear:both'> <input type='submit' name='btnSubmit' value='Process'> </div> </form> </body> </html> stsdev_img.php <?php if (isset($_GET['input'])) { $input = json_decode($_GET['input']); $input = array_filter($input); } else $input = array(); if (empty($input)) $input = array(10,15,20,25,30); $k = 'A'; foreach ($input as $v) { $data[$k++] = $v; } $numbars = count($data); $labels = array_keys($data); $values = array_values($data); $stats = stats($data); $barwidth = 20; $minheight = 300; $ymax = ceil(max($input)/10)*10; $chartheight = $ymax + 50; $yscale = $chartheight < $minheight ? $minheight/$chartheight : 1; $chartheight = max($chartheight, $minheight); $chartwidth = $numbars * 2 * $barwidth + 20; $xorigin = 20; $yorigin = $chartheight - 20; $im = imagecreatetruecolor($chartwidth,$chartheight); $bg = imagecolorallocate($im, 0xff, 0xff, 0xff); $barcol1 = imagecolorallocatealpha($im, 0x00, 0xff, 0x00, 50); $barcol2 = imagecolorallocatealpha($im, 0xff, 0x00, 0x00, 50); $sdcol = imagecolorallocate($im, 0xff, 0x80, 0xff); $blk = imagecolorallocate($im, 0x00, 0x00, 0x00); $gry = imagecolorallocate($im, 0x66, 0x66, 0x66); imagefill($im, 0,0,$bg); // // show mean value and std dev zone // $y1 = $yorigin - ($stats['mean'] - $stats['sdev'])*$yscale; $y2 = $yorigin - ($stats['mean'] + $stats['sdev'])*$yscale; imagefilledrectangle($im, $xorigin, $y1, $chartwidth, $y2, $sdcol); imageline($im, $xorigin, $yorigin-$yscale*$stats['mean'], $chartwidth, $yorigin-$yscale*$stats['mean'], $gry); // // axes // imageline($im, $xorigin, $yorigin, $xorigin, 30, $blk); imageline($im, $xorigin, $yorigin, $chartwidth, $yorigin, $blk); for ($i=0; $i<=$ymax; $i+=10) { $y = ($yorigin - $i*$yscale); imageline($im, $xorigin-2, $y, $chartwidth, $y, $blk); imagestring($im,1,$xorigin-18, $y-3, $i, $blk); } foreach ($labels as $i => $k) { $y = $yorigin + 4; $x = $i * 2 * $barwidth + $barwidth + $xorigin; imagestring($im, 3, $x, $y, $k, $blk); } // // draw the bars for each value // foreach ($values as $i=>$v) { $y1 = $yorigin-$yscale*$v; $color = abs($v - $stats['mean']) > $stats['sdev'] ? $barcol2 : $barcol1; $x = $i * 2 * $barwidth + $barwidth + $xorigin; imagefilledrectangle($im, $x - $barwidth/2, $y1, $x + $barwidth/2, $yorigin, $color); imagerectangle($im, $x - $barwidth/2, $y1, $x + $barwidth/2, $yorigin, $blk); imagestringup($im, 3, $x-4, $yorigin-4, $v, $blk); } // // stats values // $t1 = sprintf("%-9s %6.2f", 'Mean:', $stats['mean']); $t2 = sprintf("%-9s %6.2f", 'Std dev:', $stats['sdev']); imagestring($im,2,2,2,$t1, $blk); imagestring($im,2,2,12,$t2, $blk); header("Content-Type: image/png"); imagepng($im); imagedestroy($im); function stats(&$data) { $n = count($data); $mean = array_sum($data) / $n; foreach ($data as $x) { $diffs[] = ($x - $mean)*($x-$mean); } $sd = sqrt(array_sum($diffs)/$n); return array('mean'=>$mean, 'sdev'=>$sd); } ?> Quote Link to comment Share on other sites More sharing options...
Psycho Posted December 26, 2013 Share Posted December 26, 2013 OK, I was feeling Geeky. Apparently yo would consider them spikes as they are outside the mean +- standard deviation (see attached resulting image) I just picked those numbers at random. Surprising that all three of the high numbers were just outside the max for the standard deviation and the low numbers were just inside the minimum for standard deviation. But, the question I asked was still valid - but would require slightly different numbers. So, let's assume one of the high numbers was 80 and one of the low numbers was 10. The OP would need to determine if the 80 should be considered a spike and, by the same respect, if the 10 should be considered a negative spike. But, if it were me, I would stick to standard deviation calculations until and unless there is a scenario that does not produce the 'expected' results. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.