rishiraj Posted October 20, 2007 Share Posted October 20, 2007 I need regular expression to get the details <a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene> $3.95 <b>Web Hosting</b></a></font> VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... <span class=a>www.westhost.com</span> <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true"> 2GB <b>Web Hosting</b> $1/Rs.40</a> <font size=-1><span class=a>www.3ix.in</span> I have only above two type of code in my document. and I want to extract following data from it. Example: exact url: http://www.westhost.com/package-compare.html Title: $3.95 Web Hosting Description : VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... Domain: www.westhost.com I can make some kinda logic but cant make exact regular expression <a id=(an|pa)[0-9] href=/[^&q|&adurl] (&q|&adurl)=$exacturl%[^ ]> $title [/url] <span>$Domain </span>$description </font> I need regular expression to parse this data from my html code. with regular expression I can use preg_match_all to get the data. P.S. - For any reference one can refer http://www.google.com/search?hl=en&q...=Google+Search From here i got the HTML code. Exact url is ended at % sign. Thanks for any kind of help Quote Link to comment Share on other sites More sharing options...
kellz Posted October 20, 2007 Share Posted October 20, 2007 I have not tested this but this "should" extract the URL: regex: (http?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?) this should extract the price: regex: (\\$)([-+]?\\d+)(\\.)([-+]?\\d+) now im to lazy to do the other 2^^ but they might be wrong i've only been learning regex for like 2 hours or less Quote Link to comment Share on other sites More sharing options...
derwert Posted October 21, 2007 Share Posted October 21, 2007 Here's a quick and dirty way to do it, this should give you an idea of one of the ways you can approach it. <?php $data = '<a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene> $3.95 <b>Web Hosting</b></a></font> VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... <span class=a>www.westhost.com</span> <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true"> 2GB <b>Web Hosting</b> $1/Rs.40</a> <font size=-1><span class=a>www.3ix.in</span>'; $pattern = '#<a id=[a-z0-9]+ href=/pagead(?:.*)&adurl=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; preg_match($pattern, $data, $matches); ?> Quote Link to comment Share on other sites More sharing options...
rishiraj Posted October 22, 2007 Author Share Posted October 22, 2007 Thanks a lot derwert, your expression is matching perfectly for <a id=an5 href=/pagead/iclk?sa=l&ai=Bjt-Pnum=8&adurl=http://www.westhost.com/package-compare.html%3FDgoo-gene> $3.95 <b>Web Hosting</b></a></font> VPS, Huge Disk Space and Bandwidth! Fall Special ends soon... <span class=a>www.westhost.com</span> But since there are little change in seocond one like instead of href=/pagead => href=/url and instead of &adurl => &q <a id=pa3 href=/url?sa=L&ai=B0MF0&q=http://www.3ix.com/%3Fso onmouseover="return true"> 2GB <b>Web Hosting</b> $1/Rs.40</a> <font size=-1><span class=a>www.3ix.in</span> I am changing your expression from $pattern = '#<a id=[a-z0-9]+ href=/(pagead|url)(?:.*)&(adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; preg_match_all($pattern, $data, $matches); print_r($matches); But its not working. please help. Quote Link to comment Share on other sites More sharing options...
effigy Posted October 22, 2007 Share Posted October 22, 2007 ... href=\S+&(?:adurl|q) ... Quote Link to comment Share on other sites More sharing options...
rishiraj Posted October 23, 2007 Author Share Posted October 23, 2007 Just need a slight change to make it applicable for this statement also $pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)%(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; <a id=an5 href=/url?sa=L&ai=BIPYDqoBJaA&num=5&q=http://www.inetdomain.eu/internet_marketing.html&usg=AFQjCNF4pAmujdfbWbPUuYeGoADPQ0kMnQ>Internet <b>Marketing</b></a></font><br>Promote your business online,<br>increase your visibility<br><span class=a>www.INetDomain.Eu</span> I am doing this way $my_pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)(%|&usg)(?:.*)>(.*)</a></font>(.*)<span class=a>(.*)</span>#Ums'; Quote Link to comment Share on other sites More sharing options...
rishiraj Posted October 23, 2007 Author Share Posted October 23, 2007 its solved now, thanks to my netpal jason. $pattern = '#<a id=[a-z0-9]+ href=\S+&(?:adurl|q)=(.*)[%&](?:.*)>(.*)[/url]</font>(.*)<span class=a>(.*)</span>#Ums'; Hope it is the right way. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.