rishiraj Posted October 20, 2007 Share Posted October 20, 2007 I need regular expression to get the details <a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene> $3.95 <b>Web Hosting</b></a></font> VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... <span class=a>www.westhost.com</span> <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true"> 2GB <b>Web Hosting</b> $1/Rs.40</a> <font size=-1><span class=a>www.3ix.in</span> I have only above two type of code in my document. and I want to extract following data from it. Example: exact url: http://www.westhost.com/package-compare.html Title: $3.95 Web Hosting Description : VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... Domain: www.westhost.com I can make some kinda logic but cant make exact regular expression <a id=(an|pa)[0-9] href=/[^&q|&adurl] (&q|&adurl)=$exacturl%[^ ]> $title [/url] <span>$Domain </span>$description </font> I need regular expression to parse this data from my html code. with regular expression I can use preg_match_all to get the data. P.S. - For any reference one can refer http://www.google.com/search?hl=en&q...=Google+Search From here i got the HTML code. Exact url is ended at % sign. Thanks for any kind of help Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/ Share on other sites More sharing options...
kellz Posted October 20, 2007 Share Posted October 20, 2007 I have not tested this but this "should" extract the URL: regex: (http?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?) this should extract the price: regex: (\\$)([-+]?\\d+)(\\.)([-+]?\\d+) now im to lazy to do the other 2^^ but they might be wrong i've only been learning regex for like 2 hours or less Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/#findComment-374364 Share on other sites More sharing options...
derwert Posted October 21, 2007 Share Posted October 21, 2007 Here's a quick and dirty way to do it, this should give you an idea of one of the ways you can approach it. <?php $data = '<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene> $3.95 <b>Web Hosting</b></a></font> VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... <span class=a>www.westhost.com</span> <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true"> 2GB <b>Web Hosting</b> $1/Rs.40</a> <font size=-1><span class=a>www.3ix.in</span>'; $pattern = '#<a id=[a-z0-9]+ href=/pagead(?:.*)&adurl=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; preg_match($pattern, $data, $matches); ?> Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/#findComment-374596 Share on other sites More sharing options...
rishiraj Posted October 22, 2007 Author Share Posted October 22, 2007 Thanks a lot derwert, your expression is matching perfectly for <a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene> $3.95 <b>Web Hosting</b></a></font> VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... <span class=a>www.westhost.com</span> But since there are little change in seocond one like instead of href=/pagead => href=/url and instead of &adurl => &q <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true"> 2GB <b>Web Hosting</b> $1/Rs.40</a> <font size=-1><span class=a>www.3ix.in</span> I am changing your expression from $pattern = '#<a id=[a-z0-9]+ href=/(pagead|url)(?:.*)&(adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; preg_match_all($pattern, $data, $matches); print_r($matches); But its not working. please help. Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/#findComment-375357 Share on other sites More sharing options...
effigy Posted October 22, 2007 Share Posted October 22, 2007 ... href=\S+&(?:adurl|q) ... Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/#findComment-375459 Share on other sites More sharing options...
rishiraj Posted October 23, 2007 Author Share Posted October 23, 2007 Just need a slight change to make it applicable for this statement also $pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; <a id=an5 href=/url?sa=L&ai=BIPYDqoBJaA&num=5&q=http://www.inetdomain.eu/internet_marketing.html&usg=AFQjCNF4pAmujdfbWbPUuYeGoADPQ0kMnQ>Internet <b>Marketing</b></a></font><br>Promote your business online,<br>increase your visibility<br><span class=a>www.INetDomain.Eu</span> I am doing this way $my_pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)(%|&usg)(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/#findComment-376115 Share on other sites More sharing options...
rishiraj Posted October 23, 2007 Author Share Posted October 23, 2007 its solved now, thanks to my netpal jason. $pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)[%&](?:.*)>(.*)[/url]</font>(.*)<span class=a>(.*)</span>#Ums'; Hope it is the right way. Link to comment https://forums.phpfreaks.com/topic/74048-solved-regular-expression-to-get-the-details/#findComment-376123 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.