EsOne Posted December 8, 2009 Share Posted December 8, 2009 Ok. I am trying to do a basic scrape and echo results via PHP. I have the information I am trying to scrape, and I think I am using the correct code, but I am not sure, since it is not showing the array of the results when I try it. Here is a section of the page source I am trying to scrape noire","user_id":"3237825","show_in_sig":"1","show_in_profile":1,"last_engine_run":"1260190557","tap_count":"11984","view_count":"10951","total_gold_won":2119743,"env_health":"32166","env_bg_id":null,"env_last_grant_time":"1260190557","inhab_retire":false,"game_info":{"1":{"type":1,"instance_id":"1190571832.1260284818.446690843","open_time":1260292832,"close_time":1260293672,"end_time":1260293732,"length":60,"results_time":1260293742,"state":"open","player_count":6}},"events": The part I am trying to scrape is the "state":"open" I am wanting the $match outcome to show if it is "open" If the code is not "open", the whole line of code disappears. I am testing the code on open games, so I can see if it will come back and tell me the "state" is "open" Here is the code I am using to do this... <?php $data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122'); $regex = '/"state":"(.+?)","player_count"/'; preg_match($regex,$data,$match); var_dump($match); echo $match; ?> Result is coming back: 1. array(0) { } Array Completely not showing the results. I tried scraping another section using the above code, and it did work, but the section that did work did not have any " around it. I am VERY new at PHP, so I am figuring it is something to do with my $regex, and the whole " in the results I am looking for. Also, if I use the if command to say if ($match = "open") echo "glowing"; Would this echo "Glowing" (minus the ") when the match variable is equal to "open" Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/ Share on other sites More sharing options...
rajivgonsalves Posted December 8, 2009 Share Posted December 8, 2009 your source does not have that string in it also you might want to try json_decode as it is a JSON string <?php $data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122'); $json = json_decode($data); print_r($json); ?> Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973581 Share on other sites More sharing options...
cags Posted December 8, 2009 Share Posted December 8, 2009 Looking at your code it's probably the "e; that is causing you problems, the PCRE engine will take this as litteral and look for an ampersand followed by the word quote followed by a semi-colon. Just replace them with a normal " character. '/"state":"(.+?)","player_count"/' Edit: I was replying at the same time as rajivgonsalves, I know nothing of json, but if he is correct then that solution seems the better option. Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973582 Share on other sites More sharing options...
EsOne Posted December 8, 2009 Author Share Posted December 8, 2009 Thanks for the quick reply! After using that, I get the following: Fatal error: Call to undefined function: json_decode() in /home/content/e/s/o/esone/html/test/test.php on line 4 So you can see my entire php file, I will put it here now. <html> <head>1. <?php $data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122'); $json = json_decode($data); print_r($json); ?></head> <body> </body> </html> Thanks @cags - Thanks for the reply. I also have tried that, but again no luck. I will switch it back, and reply with the results of that (as I do not have them now) Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973585 Share on other sites More sharing options...
rajivgonsalves Posted December 8, 2009 Share Posted December 8, 2009 json_decode only works on PHP 5+ you have to download the wrapper if you want it to work with earlier version, following is a wrapper I use for my projects which are PHP 4 http://www.boutell.com/scripts/jsonwrapper.html the whole idea of using json_decode is because the resulting output will be an array/object which will make it easier to extract data from Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973589 Share on other sites More sharing options...
EsOne Posted December 8, 2009 Author Share Posted December 8, 2009 Looking at your code it's probably the "e; that is causing you problems, the PCRE engine will take this as litteral and look for an ampersand followed by the word quote followed by a semi-colon. Just replace them with a normal " character. '/"state":"(.+?)","player_count"/' Edit: I was replying at the same time as rajivgonsalves, I know nothing of json, but if he is correct then that solution seems the better option. You were correct. I changed them back to " and I got this: array(2) { [0]=> string(29) ""state":"open","player_count"" [1]=> string(4) "open" } Array Now, I see the wanted result is in [1], string 4. How would I "if" this to say if [1] = "open" to echo the word "Glowing"? Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"? Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973590 Share on other sites More sharing options...
thebadbad Posted December 8, 2009 Share Posted December 8, 2009 How would I "if" this to say if [1] = "open" to echo the word "Glowing"? <?php if ($match[1] == 'open') { echo 'Glowing'; } ?> But note that $match[1] won't exist when the pattern doesn't match the source, resulting in a thrown notice in those cases. Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"? That's possible yes, but problematic. Some servers ban your server's IP if they suspect you're automating a lot of requests. And the script would halt until the state changed to "open", and probably time out, depending on your server settings. A more realistic approach would be to make a request every 5 minutes e.g. Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973597 Share on other sites More sharing options...
EsOne Posted December 8, 2009 Author Share Posted December 8, 2009 How would I "if" this to say if [1] = "open" to echo the word "Glowing"? <?php if ($match[1] == 'open') { echo 'Glowing'; } ?> But note that $match[1] won't exist when the pattern doesn't match the source, resulting in a thrown notice in those cases. Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"? That's possible yes, but problematic. Some servers ban your server's IP if they suspect you're automating a lot of requests. And the script would halt until the state changed to "open", and probably time out, depending on your server settings. A more realistic approach would be to make a request every 5 minutes e.g. Thank you all so much!!! If I may, I have one more set of questions. 1. Now, when it echos "glowing" (I also used "elseif" to make "not glowing") It still echos the arrays. How do I make it to where it only echos the results and not, i.e. "array(0) { } Not Glowing "? 2. badbad, you spoke of a way to parse every 5 minutes or so. How would I be able to do that? I want to thank you all a lot. I am learning slowly on PHP, and a lot of the tutorials I have read go into "What makes it work" but not "why it works", which is how I learn. Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973616 Share on other sites More sharing options...
thebadbad Posted December 8, 2009 Share Posted December 8, 2009 1. Now, when it echos "glowing" (I also used "elseif" to make "not glowing") It still echos the arrays. How do I make it to where it only echos the results and not, i.e. "array(0) { } Not Glowing "? Simply don't echo/print/var_dump()/print_r() the array. 2. badbad, you spoke of a way to parse every 5 minutes or so. How would I be able to do that? Via a cron job. Quote Link to comment https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/#findComment-973639 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.