Jump to content

Finding duplicate substrings?


lfernando

Recommended Posts

Hi there!

 

I have this string which is filled with list items, each list item has an id that looks like this

<LI id=1773_tab2_[12]>Dice 
<LI id=1773_tab2_[13]>Dog Treats 
<LI id=1773_tab2_[14]>Easter Eggs 
<LI id=1773_tab2_[15]>Fire 
<LI id=1773_tab2_[16]>Flowers 
<LI id=1773_tab2_[17]>Foliage 
<LI id=1773_tab2_[16]>Grass 
<LI id=1773_tab2_[18]>Hearts 
<LI id=1773_tab2_[19]>Lightning 

 

The problem is some list items have the same id (see "flowers" and "grass"). I need a script that will find the duplicates and give the second one a new one (so in this case  "grass" should be "1773_tab2_[20]"). Any ideas?

 

Thanks for the help!!

Link to comment
Share on other sites

haha wtf, why is this like this?

 

If you're the one outputting this list, then just fix the output function.

 

Otherwise, this preg_match_all line will fill $foo[1] with all the IDs, and $foo[2] with all the names of the list items:

preg_match_all("/id=\d+_tab\d+_\[(\d+)\]>(\w+)/", $string, $foo);

Using that, you can loop through and make sure your'e not repeating an ID.  You could also simply loop through $foo[2] and print out a new list with sequential IDs.

 

-Dan

Link to comment
Share on other sites

<?php
$page =  '<LI id=1773_tab2_[12]>Dice
<LI id=1773_tab2_[13]>Dog Treats
<LI id=1773_tab2_[14]>Easter Eggs
<LI id=1773_tab2_[15]>Fire
<LI id=1773_tab2_[16]>Flowers
<LI id=1773_tab2_[17]>Foliage
<LI id=1773_tab2_[16]>Grass
<LI id=1773_tab2_[18]>Hearts
<LI id=1773_tab2_[19]>Lightning ';
preg_match_all('/\[(\d+)(\].*)/', $page, $matchesarray);
$used = array();
$last = max($matchesarray[1]);
foreach ($matchesarray[1] as $k => $i) {
    if(in_array($i, $used)){
        $last++;
        $page = str_replace($matchesarray[0][$k], '['.$last.$matchesarray[2][$k], $page);
        $used[] = $last;
    } else $used[] = $i;
}
echo $page;
?>

Link to comment
Share on other sites

Hello!

 

Dan, to answer your question - my string is displayed in a WYSIWYG textarea, and sometimes the user enters a new list item by going to an existing one and hitting enter, and in firefox this gives the new list item the same id as the previous one.

 

Sasa, this looks like it should be working but its not :( i tried your code and $page still shows grass and flowers with the same id (16).

 

Crayon - I dont want to overwrite all of them as they represent an entry in a database, just the duplicate ones. They dont need to be in order, they just need to be unique.

 

Thanks everyone for your help! I'll keep trying though im pretty lost when it comes to regex :S

Link to comment
Share on other sites

I think i figured it out! Thanks everyone!!!!

$page =  '<LI id=1773_tab2_[12]>Dice<LI id=1773_tab2_[13]>Dog Treats<LI id=1773_tab2_[14]>Easter Eggs<LI id=1773_tab2_[15]>Fire<LI id=1773_tab2_[16]>Flowers<LI id=1773_tab2_[17]>Foliage<LI id=1773_tab2_[16]>Grass<LI id=1773_tab2_[18]>Hearts<LI id=1773_tab2_[19]>Lightning ';

preg_match_all("/_\[(\d+)\]>(\w+)/", $page, $matchesarray);
$used = array();$last = max($matchesarray[1]);

foreach ($matchesarray[1] as $k => $i) {   
if(in_array($i, $used)){       
$last++;        
$page = str_replace($matchesarray[0][$k], '_['.$last.']>'.$matchesarray[2][$k], $page);        
$used[] = $last;    } else $used[] = $i;}

echo "$page";

Link to comment
Share on other sites

Crayon - I dont want to overwrite all of them as they represent an entry in a database, just the duplicate ones. They dont need to be in order, they just need to be unique

 

hmm okay... well looks like you worked it out before I could post but here was my alternative anyway..

 

preg_match_all('~\[(\d+)\]~',$data,$max);
$max = max($max[1]);

function renumber ($match) {
  global $max;
  static $ca = array();
  $rep = (in_array($match[1],$ca)) ? "[".++$max."]" : "[".$match[1]."]";
  $ca[] = $match[1]; 
  return $rep;
}

$data = preg_replace_callback('~\[(\d+)\]~','renumber',$data);

echo $data;

Link to comment
Share on other sites

  • 1 month later...

Thank you for your help Crayon!

I'm faced with a new challenge so Im using your code now.

The new issue is that this code is inside a loop for ($j=1; $j<=4; $j++) and i need to find and find and replace duplicates only in "1773_tab".$j."_[number_to_replace]".

Any idea how to put a variable in your code?

 

Thanks again!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.