Jump to content

[SOLVED] regular expression to get the details

Recommended Posts

I need regular expression to get the details


<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
$3.95 <b>Web Hosting</b></a></font>
VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
<span class=a>www.westhost.com</span>

<a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
2GB <b>Web Hosting</b> $1/Rs.40</a>
<font size=-1><span class=a>www.3ix.in</span>



I have only above two type of code in my document.

and I want to extract following data from it.



exact url: http://www.westhost.com/package-compare.html

Title: $3.95 Web Hosting

Description : VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...

Domain: www.westhost.com


I can make some kinda logic but cant make exact regular expression

<a id=(an|pa)[0-9] href=/[^&q|&adurl] (&q|&adurl)=$exacturl%[^ ]> $title [/url] <span>$Domain </span>$description </font>



I need regular expression to parse this data from my html code.

with regular expression I can use preg_match_all to get the data.


P.S. - For any reference one can refer http://www.google.com/search?hl=en&q...=Google+Search

From here i got the HTML code. Exact url is ended at % sign.


Thanks for any kind of help

I have not tested this but this "should" extract the URL:

regex: (http?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)


this should extract the price:

regex: (\\$)([-+]?\\d+)(\\.)([-+]?\\d+)


now im to lazy to do the other 2^^ but they might be wrong i've only been learning regex for like 2 hours or less


Here's a quick and dirty way to do it, this should give you an idea of one of the ways you can approach it.


$data = '<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
$3.95 <b>Web Hosting</b></a></font>
VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
<span class=a>www.westhost.com</span>

<a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
2GB <b>Web Hosting</b> $1/Rs.40</a>
<font size=-1><span class=a>www.3ix.in</span>';

$pattern = '#<a id=[a-z0-9]+ href=/pagead(?:.*)&adurl=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';
preg_match($pattern, $data, $matches);

Thanks a lot derwert,

your expression is matching perfectly for

<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
$3.95 <b>Web Hosting</b></a></font>
VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
<span class=a>www.westhost.com</span>


But since there are little change in seocond one like

instead of href=/pagead  =>  href=/url and instead of &adurl => &q


<a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
2GB <b>Web Hosting</b> $1/Rs.40</a>
<font size=-1><span class=a>www.3ix.in</span>


I am changing your expression from

$pattern = '#<a id=[a-z0-9]+ href=/(pagead|url)(?:.*)&(adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';

preg_match_all($pattern, $data, $matches);


But its not working. please help.

Just need a slight change to make it applicable for this statement also


$pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';


<a id=an5 href=/url?sa=L&ai=BIPYDqoBJaA&num=5&q=http://www.inetdomain.eu/internet_marketing.html&usg=AFQjCNF4pAmujdfbWbPUuYeGoADPQ0kMnQ>Internet <b>Marketing</b></a></font><br>Promote your business online,<br>increase your visibility<br><span class=a>www.INetDomain.Eu</span>


I am doing this way

$my_pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)(%|&usg)(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';


This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.