sloth456 Posted October 31, 2008 Share Posted October 31, 2008 Hi, I hate to ask, I feel like such a leecher rather than a contributor. I was wondering if anyone would help with the following problem. This is my code: preg_match('/<a href="(.*?)".*?<\/a>/',$googleresult,$matches); I've basically gone and scraped one page of listings from google using file_get_contents and put it in $googleresult I have stripped out all domains with "google" in them which leaves me with just the URLs for the actual listings. I'm basically trying to pull JUST the first URL between anchor tags out. I'm really useless with regular expression and so used http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial for help. Trouble is I seem to be pulling out more than just the URL <a href="http://java.sun.com/docs/books/tutorial/getStarted/application/index.html" class=l>Lesson: A Closer Look at the "<em>Hello World</em>!" Application (The Java <b>...</b></a>%20-site:<a href="http://java.sun.com/docs/books/tutorial/getStarted/application/index.html" class=l>Lesson: A Closer Look at the "<em>Hello World</em>!" Application (The Java <b>...</b></a> Quote Link to comment Share on other sites More sharing options...
bobbinsbro Posted October 31, 2008 Share Posted October 31, 2008 you're actually getting exactly what you're asking for (i'm assuming the last code block is an example of the return of the preg_match). your regex describes the entire contents of <a...></a> tags. if you only want the url try using something like : preg_match('/href="(.*?)"/',$googleresult,$matches); i'm pretty useless with regex myself, so success is not guaranteed. Quote Link to comment Share on other sites More sharing options...
samshel Posted October 31, 2008 Share Posted October 31, 2008 try preg_match_all and print $matches Quote Link to comment Share on other sites More sharing options...
sloth456 Posted October 31, 2008 Author Share Posted October 31, 2008 Thanks bobbinsbro, but that didn't quite work, I still pulled out the href=". @samshel: I think you found me the solution. Quote Link to comment Share on other sites More sharing options...
bobbinsbro Posted October 31, 2008 Share Posted October 31, 2008 i know you pulled the "href=". you were supposed to cut that bit off once the results returned... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.