memoryproblems Posted June 15, 2011 Share Posted June 15, 2011 First off, I'm pretty new at this, so please try not to laugh (too hard) at me. I'm trying to put together a script to scrape out some data of some page source for me. This is for an online game, and I'm looking to sort out everything inside the code that is shown below. title="Ruler: DATA"> I've looked around the web (again, I'm very new), and found a few tutorials that look interesting, and went about doing this with Regex and preg_dump_all here is my code: <?php $data = file_get_contents('scrapedata.html'); $regex = '/title="Ruler: (.+?)">/'; preg_match_all($regex,$data,$match); var_dump($match); echo ($match); ?> I've got two problems: 1) var_dump($match) spits out the entire array, but echo ($match) says only "Array". If I change preg_match_all to simply preg_match, echo ($match) shows the first item that I'm looking for, but obviously it doesn't go through the entire source to find all the instances of what I'm looking for. (each page has roughly 20 items that I'm looking to collect) My main question here is, how do I take the results of the preg_match_all (which is an array), and list the results of that array just one by one on echo? 2) For what I'm doing, I need to do two different versions, one just like I coded above, and another that modifies the $regex line. In the source code, there is a variable that can be listed among the data, and I want to skip over any listing that has that variable. For example, I want to collect it if its like this: <td> <p align="center"> <a href="send_message.asp?Nation_ID=XXXXXX"><img border="0" src="assets/compose_message.png" width="16" height="16" title="Ruler: DATA"></a> </td but if its like this, I want to skip over it: <td> <p align="center"> <a href="send_message.asp?Nation_ID=XXXXXX"><img border="0" src="assets/compose_message.png" width="16" height="16" title="Ruler: DATA"></a> <a href="stats_alliance_stats_custom.asp?Alliance=Rapture"><img src="images/alliance_statistic.gif" border="0" title="Alliance: DATA"></a> </td> I figured that the way to do this would be to change the $regex line to $regex = '/title="Ruler: (.+?)"></a></td>/'; but it returns a warning (shown below) and says null in the var_dump ($match) Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'a' in /home/virtual/site80/fst/var/www/html/scraper/scraper.php on line 4 null Is there some way to put the </a></td> into the $regex line and have that work? Sorry if my questions are a little dumb, been trying to find answers to this all day (and fighting off the inevitable heart attack from all the frustration) with little luck. Thanks for any insight you might have mp Quote Link to comment https://forums.phpfreaks.com/topic/239478-data-scraping-preg_match_allregex-questions/ Share on other sites More sharing options...
.josh Posted June 16, 2011 Share Posted June 16, 2011 preg_match_all() returns an array of matches. You need to iterate through the array with a loop. http://www.phpfreaks.com/tutorial/php-loops Quote Link to comment https://forums.phpfreaks.com/topic/239478-data-scraping-preg_match_allregex-questions/#findComment-1230253 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.