sloth456 Posted October 31, 2008 Share Posted October 31, 2008 Hi, I hate to ask, I feel like such a leecher rather than a contributor. I was wondering if anyone would help with the following problem. This is my code: preg_match('/<a href="(.*?)".*?<\/a>/',$googleresult,$matches); I've basically gone and scraped one page of listings from google using file_get_contents and put it in $googleresult I have stripped out all domains with "google" in them which leaves me with just the URLs for the actual listings. I'm basically trying to pull JUST the first URL between anchor tags out. I'm really useless with regular expression and so used http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial for help. Trouble is I seem to be pulling out more than just the URL <a href="http://java.sun.com/docs/books/tutorial/getStarted/application/index.html" class=l>Lesson: A Closer Look at the "<em>Hello World</em>!" Application (The Java <b>...</b></a>%20-site:<a href="http://java.sun.com/docs/books/tutorial/getStarted/application/index.html" class=l>Lesson: A Closer Look at the "<em>Hello World</em>!" Application (The Java <b>...</b></a> Link to comment https://forums.phpfreaks.com/topic/130860-some-help-with-scraping-and-regular-expression/ Share on other sites More sharing options...
bobbinsbro Posted October 31, 2008 Share Posted October 31, 2008 you're actually getting exactly what you're asking for (i'm assuming the last code block is an example of the return of the preg_match). your regex describes the entire contents of <a...></a> tags. if you only want the url try using something like : preg_match('/href="(.*?)"/',$googleresult,$matches); i'm pretty useless with regex myself, so success is not guaranteed. Link to comment https://forums.phpfreaks.com/topic/130860-some-help-with-scraping-and-regular-expression/#findComment-679214 Share on other sites More sharing options...
samshel Posted October 31, 2008 Share Posted October 31, 2008 try preg_match_all and print $matches Link to comment https://forums.phpfreaks.com/topic/130860-some-help-with-scraping-and-regular-expression/#findComment-679216 Share on other sites More sharing options...
sloth456 Posted October 31, 2008 Author Share Posted October 31, 2008 Thanks bobbinsbro, but that didn't quite work, I still pulled out the href=". @samshel: I think you found me the solution. Link to comment https://forums.phpfreaks.com/topic/130860-some-help-with-scraping-and-regular-expression/#findComment-679225 Share on other sites More sharing options...
bobbinsbro Posted October 31, 2008 Share Posted October 31, 2008 i know you pulled the "href=". you were supposed to cut that bit off once the results returned... Link to comment https://forums.phpfreaks.com/topic/130860-some-help-with-scraping-and-regular-expression/#findComment-679229 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.