Jump to content


Regex help

  • Please log in to reply
3 replies to this topic

#1 lowpitch

  • New Members
  • Pip
  • Newbie
  • 3 posts
  • LocationBrighton, UK

Posted 10 August 2006 - 03:37 AM

Hello there chaps,

Fairly experienced PHP user, but I must admit I wince whenever I have to do regex, and normally find an alternative way of solving it, mainly out of confusion/fear.

Anyway, this my situation. I have a load of HTML content being spat out by my CMS. Inside this HTML, i've added some custom attributes to my <a> tags so that, at display-time, I can rewrite the links dynamically depending on how the site is being viewed (HTML / flash / pda), etc. So, an example link in the CMS-generated HTML might look like

<a linkType="internal" linkID="36" href="#" title="Link to this page" class="whatever">Some text</a>

If viewing the site as HTML, I might want to rewrite this to...

<a href="viewPage.php?page=36" title="Link to this page" class="whatever">Some text</a>

Now, I was planning on getting hold of all the <a>something</a> tags, manipulating them as XML nodes (so I can access the attributes), and creating a new XML node for my rewritten <a> tag and saving this new node in place of the old node before writing the HTML to the browser.

So, essentially, in pseudo code, what I'd like to achieve is the following

$links = getSomeKindOfArrayContainingAllTheATags ();

foreach ($links as $link) 



  replaceOldHTMLwithNewHTML ();


However, I'm stuck on a couple of things here. Firstly, I'm very stuck trying to write the regex to isolate the <a> tags - note, I don't just want the href, or the content within the <a>sdfsd</a> tags, I want the whole thing. I'm also stuck on what methods to use in PHP to achieve what I want, how I'd go about replacing the old HTML with the new HTML, etc etc.

One way I assume I could do it is to get hold of all the <a> tags along with their offsets, or something, and do it that way.

Another way I can think of is to do it in three steps - firstly, get an array of all the a tags. Then iterate through them, creating an array of new tags which will act as replacements. Then finally, do some kind of uber regex search and replace, replacing old for new.

So I think what I'm hoping for is that someone will be able to give me a few pointers on the actual regex i need to use for this, and also if someone could point me in the right direction of how this kind of process would work, what methods to look up etc. I'd very much appreciate any help on this.

I'm using PHP 5.1, and I apologise if the formatting of this post goes strange.

Many thanks,

#2 effigy

  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 10 August 2006 - 02:54 PM

Something like this will match all the tags and then get the attributes. preg_match_all's PREG_OFFSET_CAPTURE flag may be useful.
Regexp | Unicode Article | Letter Database

#3 lowpitch

  • New Members
  • Pip
  • Newbie
  • 3 posts
  • LocationBrighton, UK

Posted 10 August 2006 - 03:50 PM

Thank you, that's very interesting and looks just what I need.

I've been looking at preg_replace a little - if I first manage to get an array of all the matching tags, and then build up an array of tags to replace them with, would I just be able to call

$whatever = preg_replace ($theSamePatternUsedToFindThemBefore, $arrayOfReplacementStrings, $myContent);

It looks to me like that would work, am I off course?

Thanks again,

#4 lowpitch

  • New Members
  • Pip
  • Newbie
  • 3 posts
  • LocationBrighton, UK

Posted 10 August 2006 - 04:02 PM

Actually, I have just achieved what I need.

preg_match_all ("#<a ([^>]*)>(.*?)</a>#is", $theSourceHTML, $arrayToStoreOutput, PREG_SET_ORDER);

Then I iterate through $arrayToStoreOutput, doing a str_replace on each one.

It works as I want it to, although I'm sure it could be more efficient.

Thanks for the link to the regex - very helpful.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users