[SOLVED] String Manipulation In Arrays

rwallin · July 5, 2007

I'm such a newbie with arrays and need some help. :'(

I'm reading in a text file into an array and I am trying to determine the first instance where a line is duplicated and tell me the index.

The problem is the first 10 characters of each line is a time stamp and I need to remove them before checking for duplicates.

I've not dealt with arrays much so i'm kinda lost. I can echo out each line without the time stamp using the substr string function but can't get that info back into the array or strip the first 10 characters before creating the array

i'm sure there is an easy way but I've found myself stumped. Any help would be appreciated

<?php

$lines = file ("12345.txt");

foreach ($lines as $line_num => $line) {
echo "{$line_num}: " . substr($line,10) . "<br />\n";
}

echo "<hr>";


$uniqueArray = array_unique($lines);
$dupArray = array_diff_assoc($lines, $uniqueArray); 
foreach ($dupArray as $value)
{
//do whatever you want here like build an assoc array.
echo "duplicate value='$value', first occurence at index=" . array_search($value,$lcArray) . "\n";
}

?>

GingerRobot · July 5, 2007

Given that your waning to get the index of the duplicates, i think your best bet is to cycle through the array and create a new array. Before you enter data into this new array, you will check if it is already in the array:

<?php 
$myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
}else{
	$duplicates[] = $key;
}


}
?>

So you end up with two arrays, $new_array which is effectively what you would have got from using the array_unique() function on your original array - each value is in the array only once, and you have $duplicates which contains the keys from your original array of the duplicated line(s)

rwallin · July 5, 2007

nice! .. thanks for quick reply.

Tried it using your test array and echo'd out the value and key to see what i'm getting and have another question

<?php 
// $myarray = file ("12345.txt");
$myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
	echo "value= ".$value."<br>";		
}else{
	$duplicates[] = $key ;
	echo "key= ".$key."<br>";
}

}
?>

Results in displaying this

value= first line
value= second line

key= 2

so the Key being displayed is the last result that is a duplicate. How can I get the first key that is duplicated

Also I only need to get the first key. So if there are more than 1 key that isduplicated all I care about is the very first one

GingerRobot · July 5, 2007

That does depend slightly on what you are trying to do. If ALL you are wanting is to return the first key that was a duplicate instead of the last, and nothing else, then just reverse the original array whilst preserving keys:

<?php 
// $myarray = file ("12345.txt");
$myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test
$myarray = array_reverse($myarray,true);//reverse array - second parameter as true keeps your original keys
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
	echo "value= ".$value."<br>";		
}else{
	$duplicates[] = $key ;
	echo "key= ".$key."<br>";
}

}
?>

If you are looking to get the keys of all of the lines which were duplicates(e.g return keys 0 and 2 in our example) then its going to be a little more complex.

Barand · July 5, 2007

or

<?php
$myarray = array(
        '0123456789first line',
        '0123456789second line',
        '0123456789third line',
        '0123456789second line',
        '0123456789fourth line'
        );
$k = count($myarray);

for ($i=0; $i<$k-1; $i++)
{
    for ($j=$i+1; $j<$k; $j++)
    {
        if (($txt = substr($myarray[$i], 10)) == substr($myarray[$j], 10))
        {
            echo "Duplicate : $i - $txt";
            break 2;
        }
    }
}
?>

GingerRobot · July 5, 2007

Barand - would i be right in thinking that if you didn't require the key of the first duplicate then my method would be more efficient?

Barand · July 5, 2007

This benchmarks the 2 sets of code

<?php
/**
* create test file with 1000 records (2 and 990 are duplicates)
*/
$fp = fopen('test.txt', 'w');
for ($i=1; $i<=1000; $i++)
{
    if (($i==2)||($i==990))
        $str = 'duplicate';
    else
        $str = "line $i";
        
    fwrite($fp, "0123456789$str\n");
}
fclose ($fp);

$myarray = file('test.txt');
/**
* start the clock  
*/
$t1 = microtime(true);

/**
* GR code
*/
$myarray = array_reverse($myarray,true);//reverse array - second parameter as true keeps your original keys
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
	#echo "value= ".$value."<br>";		
}else{
	$duplicates[] = $key ;
	echo "(GingerRobot)key= ".$key."<br>";
}

}

/**
* intermediate clock reading
*/
$t2 = microtime(true);

/**
* BA code 
*/
$k = count($myarray);

for ($i=0; $i<$k-1; $i++)
{
    for ($j=$i+1; $j<$k; $j++)
    {
        if (($txt = substr($myarray[$i], 10)) == substr($myarray[$j], 10))
        {
            echo "(Barand))Duplicate : $i - $txt<br>";
            break 2;
        }
    }
}

/**
* final clock reading
*/
$t3 = microtime(true);

/**
* compare times
*/
printf ("GingerRobot time %0.6f<br>Barand time %0.6f", $t2-$t1, $t3-$t2);
?>

GingerRobot · July 5, 2007

Ah right. I think i just worked out why yours is quicker. I was thinking that mine only loops through the data once. But i suppose that by using the function in_array() you are effectively looping through the array inside the loop just as you do in yours.

Seems intersting that although the idea is similar in both codes, there is a (relatively) large differance in timing for the script.

rwallin · July 6, 2007

You BOTH absolutely rock! I really appreciate the help.

Sign In

[SOLVED] String Manipulation In Arrays

Recommended Posts

rwallin

Link to comment

Share on other sites

GingerRobot

Link to comment

Share on other sites

rwallin

Link to comment

Share on other sites

GingerRobot

Link to comment

Share on other sites

Barand

Link to comment

Share on other sites

GingerRobot

Link to comment

Share on other sites

Barand

Link to comment

Share on other sites

GingerRobot

Link to comment

Share on other sites

rwallin

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information