Jump to content

[SOLVED] String Manipulation In Arrays


rwallin

Recommended Posts

I'm such a newbie with arrays and need some help.  :'(

 

I'm reading in a text file into an array and I am trying to determine the first instance where a line is duplicated and tell me the index.

 

The problem is the first 10 characters of each line is a time stamp and I need to remove them before checking for duplicates.

 

I've not dealt with arrays much so i'm kinda lost. I can echo out each line without the time stamp using the substr string function but can't get that info back into the array or strip the first 10 characters before creating the array

 

i'm sure there is an easy way but I've found myself stumped. Any help would be appreciated

 

<?php

$lines = file ("12345.txt");

foreach ($lines as $line_num => $line) {
echo "{$line_num}: " . substr($line,10) . "<br />\n";
}

echo "<hr>";


$uniqueArray = array_unique($lines);
$dupArray = array_diff_assoc($lines, $uniqueArray); 
foreach ($dupArray as $value)
{
//do whatever you want here like build an assoc array.
echo "duplicate value='$value', first occurence at index=" . array_search($value,$lcArray) . "\n";
}

?>

Link to comment
Share on other sites

Given that your waning to get the index of the duplicates, i think your best bet is to cycle through the array and create a new array. Before you enter data into this new array, you will check if it is already in the array:

 

<?php 
$myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
}else{
	$duplicates[] = $key;
}


}
?> 

 

So you end up with two arrays, $new_array which is effectively what you would have got from using the array_unique() function on your original array - each value is in the array only once, and you have $duplicates which contains the keys from your original array of the duplicated line(s)

Link to comment
Share on other sites

nice! .. thanks for quick reply.

 

Tried it using your test array and echo'd out the value and key to see what i'm getting and have another question

 

<?php 
// $myarray = file ("12345.txt");
$myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
	echo "value= ".$value."<br>";		
}else{
	$duplicates[] = $key ;
	echo "key= ".$key."<br>";
}

}
?> 

 

Results in displaying this

value= first line

value= second line

key= 2

 

so the Key being displayed is the last result that is a duplicate. How can I get the first key that is duplicated

 

Also I only need to get the first key. So if there are more than 1 key that isduplicated all I care about is the very first one

Link to comment
Share on other sites

That does depend slightly on what you are trying to do. If ALL you are wanting is to return the first key that was a duplicate instead of the last, and nothing else, then just reverse the original array whilst preserving keys:

 

<?php 
// $myarray = file ("12345.txt");
$myarray = array('0123456789first line','0123456789second line','0123456789first line');//just an example to test
$myarray = array_reverse($myarray,true);//reverse array - second parameter as true keeps your original keys
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
	echo "value= ".$value."<br>";		
}else{
	$duplicates[] = $key ;
	echo "key= ".$key."<br>";
}

}
?>

 

If you are looking to get the keys of all of the lines which were duplicates(e.g return keys 0 and 2 in our example) then its going to be a little more complex.

Link to comment
Share on other sites

or

 

<?php
$myarray = array(
        '0123456789first line',
        '0123456789second line',
        '0123456789third line',
        '0123456789second line',
        '0123456789fourth line'
        );
$k = count($myarray);

for ($i=0; $i<$k-1; $i++)
{
    for ($j=$i+1; $j<$k; $j++)
    {
        if (($txt = substr($myarray[$i], 10)) == substr($myarray[$j], 10))
        {
            echo "Duplicate : $i - $txt";
            break 2;
        }
    }
}
?>

Link to comment
Share on other sites

This benchmarks the 2 sets of code

 

<?php
/**
* create test file with 1000 records (2 and 990 are duplicates)
*/
$fp = fopen('test.txt', 'w');
for ($i=1; $i<=1000; $i++)
{
    if (($i==2)||($i==990))
        $str = 'duplicate';
    else
        $str = "line $i";
        
    fwrite($fp, "0123456789$str\n");
}
fclose ($fp);

$myarray = file('test.txt');
/**
* start the clock  
*/
$t1 = microtime(true);

/**
* GR code
*/
$myarray = array_reverse($myarray,true);//reverse array - second parameter as true keeps your original keys
$new_array = array();//we're going to put new values into here
$duplicates = array();//and we'll put the keys of any duplicate values here
foreach($myarray as $key => $value){
$value = substr($value,10);//strip out first 10 characters
if(!in_array($value,$new_array)){//check if the value is in our new array
	$new_array[] = $value;
	#echo "value= ".$value."<br>";		
}else{
	$duplicates[] = $key ;
	echo "(GingerRobot)key= ".$key."<br>";
}

}

/**
* intermediate clock reading
*/
$t2 = microtime(true);

/**
* BA code 
*/
$k = count($myarray);

for ($i=0; $i<$k-1; $i++)
{
    for ($j=$i+1; $j<$k; $j++)
    {
        if (($txt = substr($myarray[$i], 10)) == substr($myarray[$j], 10))
        {
            echo "(Barand))Duplicate : $i - $txt<br>";
            break 2;
        }
    }
}

/**
* final clock reading
*/
$t3 = microtime(true);

/**
* compare times
*/
printf ("GingerRobot time %0.6f<br>Barand time %0.6f", $t2-$t1, $t3-$t2);
?>

Link to comment
Share on other sites

Ah right. I think i just worked out why yours is quicker. I was thinking that mine only loops through the data once. But i suppose that by using the function in_array() you are effectively looping through the array inside the loop just as you do in yours.

 

Seems intersting that although the idea is similar in both codes, there is a (relatively) large differance in timing for the script.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.