rwallin Posted July 5, 2007 Share Posted July 5, 2007 I'm such a newbie with arrays and need some help. :'( I'm reading in a text file into an array and I am trying to determine the first instance where a line is duplicated and tell me the index. The problem is the first 10 characters of each line is a time stamp and I need to remove them before checking for duplicates. I've not dealt with arrays much so i'm kinda lost. I can echo out each line without the time stamp using the substr string function but can't get that info back into the array or strip the first 10 characters before creating the array i'm sure there is an easy way but I've found myself stumped. Any help would be appreciated <?php $lines = file ("12345.txt"); foreach ($lines as $line_num => $line) { echo "{$line_num}: " . substr($line,10) . "<br />\n"; } echo "<hr>"; $uniqueArray = array_unique($lines); $dupArray = array_diff_assoc($lines, $uniqueArray); foreach ($dupArray as $value) { //do whatever you want here like build an assoc array. echo "duplicate value='$value', first occurence at index=" . array_search($value,$lcArray) . "\n"; } ?> Quote Link to comment Share on other sites More sharing options...
GingerRobot Posted July 5, 2007 Share Posted July 5, 2007 Given that your waning to get the index of the duplicates, i think your best bet is to cycle through the array and create a new array. Before you enter data into this new array, you will check if it is already in the array: <?php $myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test $new_array = array();//we're going to put new values into here $duplicates = array();//and we'll put the keys of any duplicate values here foreach($myarray as $key => $value){ $value = substr($value,10);//strip out first 10 characters if(!in_array($value,$new_array)){//check if the value is in our new array $new_array[] = $value; }else{ $duplicates[] = $key; } } ?> So you end up with two arrays, $new_array which is effectively what you would have got from using the array_unique() function on your original array - each value is in the array only once, and you have $duplicates which contains the keys from your original array of the duplicated line(s) Quote Link to comment Share on other sites More sharing options...
rwallin Posted July 5, 2007 Author Share Posted July 5, 2007 nice! .. thanks for quick reply. Tried it using your test array and echo'd out the value and key to see what i'm getting and have another question <?php // $myarray = file ("12345.txt"); $myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test $new_array = array();//we're going to put new values into here $duplicates = array();//and we'll put the keys of any duplicate values here foreach($myarray as $key => $value){ $value = substr($value,10);//strip out first 10 characters if(!in_array($value,$new_array)){//check if the value is in our new array $new_array[] = $value; echo "value= ".$value."<br>"; }else{ $duplicates[] = $key ; echo "key= ".$key."<br>"; } } ?> Results in displaying this value= first line value= second line key= 2 so the Key being displayed is the last result that is a duplicate. How can I get the first key that is duplicated Also I only need to get the first key. So if there are more than 1 key that isduplicated all I care about is the very first one Quote Link to comment Share on other sites More sharing options...
GingerRobot Posted July 5, 2007 Share Posted July 5, 2007 That does depend slightly on what you are trying to do. If ALL you are wanting is to return the first key that was a duplicate instead of the last, and nothing else, then just reverse the original array whilst preserving keys: <?php // $myarray = file ("12345.txt"); $myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test $myarray = array_reverse($myarray,true);//reverse array - second parameter as true keeps your original keys $new_array = array();//we're going to put new values into here $duplicates = array();//and we'll put the keys of any duplicate values here foreach($myarray as $key => $value){ $value = substr($value,10);//strip out first 10 characters if(!in_array($value,$new_array)){//check if the value is in our new array $new_array[] = $value; echo "value= ".$value."<br>"; }else{ $duplicates[] = $key ; echo "key= ".$key."<br>"; } } ?> If you are looking to get the keys of all of the lines which were duplicates(e.g return keys 0 and 2 in our example) then its going to be a little more complex. Quote Link to comment Share on other sites More sharing options...
Barand Posted July 5, 2007 Share Posted July 5, 2007 or <?php $myarray = array( '0123456789first line', '0123456789second line', '0123456789third line', '0123456789second line', '0123456789fourth line' ); $k = count($myarray); for ($i=0; $i<$k-1; $i++) { for ($j=$i+1; $j<$k; $j++) { if (($txt = substr($myarray[$i], 10)) == substr($myarray[$j], 10)) { echo "Duplicate : $i - $txt"; break 2; } } } ?> Quote Link to comment Share on other sites More sharing options...
GingerRobot Posted July 5, 2007 Share Posted July 5, 2007 Barand - would i be right in thinking that if you didn't require the key of the first duplicate then my method would be more efficient? Quote Link to comment Share on other sites More sharing options...
Barand Posted July 5, 2007 Share Posted July 5, 2007 This benchmarks the 2 sets of code <?php /** * create test file with 1000 records (2 and 990 are duplicates) */ $fp = fopen('test.txt', 'w'); for ($i=1; $i<=1000; $i++) { if (($i==2)||($i==990)) $str = 'duplicate'; else $str = "line $i"; fwrite($fp, "0123456789$str\n"); } fclose ($fp); $myarray = file('test.txt'); /** * start the clock */ $t1 = microtime(true); /** * GR code */ $myarray = array_reverse($myarray,true);//reverse array - second parameter as true keeps your original keys $new_array = array();//we're going to put new values into here $duplicates = array();//and we'll put the keys of any duplicate values here foreach($myarray as $key => $value){ $value = substr($value,10);//strip out first 10 characters if(!in_array($value,$new_array)){//check if the value is in our new array $new_array[] = $value; #echo "value= ".$value."<br>"; }else{ $duplicates[] = $key ; echo "(GingerRobot)key= ".$key."<br>"; } } /** * intermediate clock reading */ $t2 = microtime(true); /** * BA code */ $k = count($myarray); for ($i=0; $i<$k-1; $i++) { for ($j=$i+1; $j<$k; $j++) { if (($txt = substr($myarray[$i], 10)) == substr($myarray[$j], 10)) { echo "(Barand))Duplicate : $i - $txt<br>"; break 2; } } } /** * final clock reading */ $t3 = microtime(true); /** * compare times */ printf ("GingerRobot time %0.6f<br>Barand time %0.6f", $t2-$t1, $t3-$t2); ?> Quote Link to comment Share on other sites More sharing options...
GingerRobot Posted July 5, 2007 Share Posted July 5, 2007 Ah right. I think i just worked out why yours is quicker. I was thinking that mine only loops through the data once. But i suppose that by using the function in_array() you are effectively looping through the array inside the loop just as you do in yours. Seems intersting that although the idea is similar in both codes, there is a (relatively) large differance in timing for the script. Quote Link to comment Share on other sites More sharing options...
rwallin Posted July 6, 2007 Author Share Posted July 6, 2007 You BOTH absolutely rock! I really appreciate the help. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.