Jump to content

preg_replace to select a bunch of text


mac_gabe

Recommended Posts

Hi, I have a page of html and I'm trying to use preg_replace to select a bunch of text out of the whole page.

I've done it successfully once, to select some other text off that page, but in this example I end up selecting everything (i.e. too much).

 

My source html page looks something like this

a bunch of html
<a href="category-accentors.php" class="blog-category-link-enabled">Accentors (2)</a><br />
<a href="category-african-barbets.php" class="blog-category-link-enabled">African Barbets (3)</a><br />
...loads more links and a few divs...
<a href="category-xenops.php" class="blog-category-link-enabled">Xenops (2)</a><br />
more html

 

I want to select everything from

<a href="category-accentors.php" ...
to
... Xenops (2)</a><br />

and discard the rest and place it in a new php/html page. The (2) after Xenops is a variable number.

 

This is the preg_replace pattern I'm using:

$pattern_eng_bird_cat= '/\<a href="category-accentors\.php"(.*?)Xenops \((\d+)\)\<\/a\>\<br \/\>/';
$replace_eng_bird_cat= '<a href="category-accentors.php"$1Xenops ($2)</a><br />';
$eng_bird_cat= preg_replace($pattern_eng_bird_cat, $replace_eng_bird_cat, $categories); // should return list of English bird names and links from Accentors to Xenops
echo $eng_bird_cat;

 

I'm new to this and have tried searching and following as many links as poss but just can't work out where I'm going wrong. Any help gratefully received.

 

Link to comment
Share on other sites

Oh wait - I think I've just seen a problem. I forgot to put something in the pattern to search for the unwanted html.

 

Now I've got:

$pattern_eng_bird_cat= '/(.*?)\<a href="category-accentors\.php"(.*?)Xenops \((\d+)\)\<\/a\>\<br \/\>(.*?)/';
$replace_eng_bird_cat= '<a href="category-accentors.php"$2Xenops ($3)</a><br />';

 

which is slightly better - it excludes the initial unwanted html - but still returns the final unwanted html.

 

Link to comment
Share on other sites

I've removed the variable number to simplify things. This still doesn't work (it returns everything after Xenops, in addition to everything from Accentors to Xenops)

 

$pattern_eng_bird_cat= '/(.*?)\<a href="category-accentors\.php"(.*?)Xenops(.*?)/';
$replace_eng_bird_cat= '<a href="category-accentors.php"$2Xenops</a><br />';

 

Link to comment
Share on other sites

OK, I've finally worked it out. Trial and error is a wonderful thing! I removed the ? marks so (.*) instead of (.*?) and it works like a dream  :D I only started putting ? marks in because it worked better with them in another search, no real idea what they do or why, other than one is "greedy" and the other isn't.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.