Jump to content


regex help - plz

  • Please log in to reply
3 replies to this topic

#1 arfa

  • Members
  • PipPip
  • Member
  • 28 posts

Posted 09 June 2006 - 08:12 AM

I think this is pretty simple but my regex study is still very new.

I am walking through a flatfile database trying to pull two values out. A typical line reads:

<LI><!zx1><a class=cal href=events.php?act=tit&da=4&mo=6&yr=2006&href=1149763623&q=1>link text</a>

I am after the number value &href= [eg. 1149763623] and the *link text*. There are various variations on this standard line so I need to work between the <LI> and the </a>

I have (frustratingly) tried a huge range of permutations starting with something like...

preg_match_all("#<LI>?????>(.+?)????</a>#", $line, $matches);
preg_match_all("#<LI>?????>(.+?)</a>#", $line, $matches);

I hope to find the bits to replace the ???

ANY suggestions will be much appreciated.
thanks - arfa

#2 Fyorl

  • Members
  • PipPipPip
  • Advanced Member
  • 273 posts
  • LocationUK

Posted 09 June 2006 - 03:33 PM

regex for getting the number would be something like /[^<]+href=([^&]+)/ Just var_dump($matches) and you can see where your number will be. As for the link text: /<a[^>]*>(.*?)</a>/s

Don't worry, the printer fairies will sort it out.

#3 poirot

  • Members
  • PipPipPip
  • Advanced Member
  • 646 posts
  • LocationAustin, TX

Posted 09 June 2006 - 04:39 PM

You can always use a simple regex like:


$str = '<LI><!zx1><a class=cal href=events.php?act=tit&da=4&mo=6&yr=2006&href=1149763623&q=1>link text</a>';
preg_match("/&href=([^&]+)(?:[^>]*)>([^>]*)<\/a>/", $str, $m);

echo '<pre>';
echo 'HREF: ' . $m[1] . "\n";
echo 'LINK TEXT: ' . $m[2] . "\n\n";



Which generates:

HREF: 1149763623
LINK TEXT: link text

    [0] => &href=1149763623&q=1>link text
    [1] => 1149763623
    [2] => link text

~ D Kuang

#4 arfa

  • Members
  • PipPip
  • Member
  • 28 posts

Posted 10 June 2006 - 08:44 AM

many thanks for you replies

both solutions work but...

the line provided is only typical and there are variations.

there needs to be allowance for various other text/data/lnks prior to and after the <LI>....</a>

I am trying various .?*+ permutations but this is all part of my learning curve so further guidance would be much appreciated.

Progress is being made - many thanks

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users