Jump to content

[SOLVED] regular expression to get the details


rishiraj

Recommended Posts

I need regular expression to get the details

 

<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
$3.95 <b>Web Hosting</b></a></font>
VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
<span class=a>www.westhost.com</span>

<a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
2GB <b>Web Hosting</b> $1/Rs.40</a>
<font size=-1><span class=a>www.3ix.in</span>

 

 

I have only above two type of code in my document.

and I want to extract following data from it.

 

Example:

exact url: http://www.westhost.com/package-compare.html

Title: $3.95 Web Hosting

Description : VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...

Domain: www.westhost.com

 

I can make some kinda logic but cant make exact regular expression

<a id=(an|pa)[0-9] href=/[^&q|&adurl] (&q|&adurl)=$exacturl%[^ ]> $title [/url] <span>$Domain </span>$description </font>

 

 

I need regular expression to parse this data from my html code.

with regular expression I can use preg_match_all to get the data.

 

P.S. - For any reference one can refer http://www.google.com/search?hl=en&q...=Google+Search

From here i got the HTML code. Exact url is ended at % sign.

 

Thanks for any kind of help

I have not tested this but this "should" extract the URL:

regex: (http?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)

 

this should extract the price:

regex: (\\$)([-+]?\\d+)(\\.)([-+]?\\d+)

 

now im to lazy to do the other 2^^ but they might be wrong i've only been learning regex for like 2 hours or less

 

Here's a quick and dirty way to do it, this should give you an idea of one of the ways you can approach it.

 

<?php
$data = '<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
$3.95 <b>Web Hosting</b></a></font>
VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
<span class=a>www.westhost.com</span>

<a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
2GB <b>Web Hosting</b> $1/Rs.40</a>
<font size=-1><span class=a>www.3ix.in</span>';

$pattern = '#<a id=[a-z0-9]+ href=/pagead(?:.*)&adurl=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';
preg_match($pattern, $data, $matches);
?>

Thanks a lot derwert,

your expression is matching perfectly for

<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene>
$3.95 <b>Web Hosting</b></a></font>
VPS, Huge Disk Space and Bandwidth! Fall Special ends soon...
<span class=a>www.westhost.com</span>

 

But since there are little change in seocond one like

instead of href=/pagead  =>  href=/url and instead of &adurl => &q

 

<a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true">
2GB <b>Web Hosting</b> $1/Rs.40</a>
<font size=-1><span class=a>www.3ix.in</span>

 

I am changing your expression from

$pattern = '#<a id=[a-z0-9]+ href=/(pagead|url)(?:.*)&(adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';

preg_match_all($pattern, $data, $matches);
print_r($matches);

 

But its not working. please help.

Just need a slight change to make it applicable for this statement also

 

$pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';

 

<a id=an5 href=/url?sa=L&ai=BIPYDqoBJaA&num=5&q=http://www.inetdomain.eu/internet_marketing.html&usg=AFQjCNF4pAmujdfbWbPUuYeGoADPQ0kMnQ>Internet <b>Marketing</b></a></font><br>Promote your business online,<br>increase your visibility<br><span class=a>www.INetDomain.Eu</span>

 

I am doing this way

$my_pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)(%|&usg)(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums';

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.