Jump to content

Just a scraping help. Regex problem?


EsOne

Recommended Posts

Ok. I am trying to do a basic scrape and echo results via PHP.

I have the information I am trying to scrape, and I think I am using the correct code, but I am not sure, since it is not showing the array of the results when I try it.

 

Here is a section of the page source I am trying to scrape

 

noire","user_id":"3237825","show_in_sig":"1","show_in_profile":1,"last_engine_run":"1260190557","tap_count":"11984","view_count":"10951","total_gold_won":2119743,"env_health":"32166","env_bg_id":null,"env_last_grant_time":"1260190557","inhab_retire":false,"game_info":{"1":{"type":1,"instance_id":"1190571832.1260284818.446690843","open_time":1260292832,"close_time":1260293672,"end_time":1260293732,"length":60,"results_time":1260293742,"state":"open","player_count":6}},"events":

 

The part I am trying to scrape is the "state":"open"

I am wanting the $match outcome to show if it is "open"

If the code is not "open", the whole line of code disappears. I am testing the code on open games, so I can see if it will come back and tell me the "state" is "open"

 

Here is the code I am using to do this...

 

<?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$regex = '/"state":"(.+?)","player_count"/';
preg_match($regex,$data,$match);
var_dump($match);
echo $match;
?>

 

Result is coming back:

1. array(0) { } Array

 

Completely not showing the results.

 

I tried scraping another section using the above code, and it did work, but the section that did work did not have any " around it. I am VERY new at PHP, so I am figuring it is something to do with my $regex, and the whole " in the results I am looking for.

 

 

Also, if I use the if command to say

if ($match = "open")
echo "glowing";

Would this echo "Glowing" (minus the ") when the match variable is equal to "open"

Link to comment
https://forums.phpfreaks.com/topic/184431-just-a-scraping-help-regex-problem/
Share on other sites

your source does not have that string in it also you might want to try json_decode as it is a JSON string

 

<?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$json = json_decode($data);

print_r($json);
?>

Looking at your code it's probably the &quote; that is causing you problems, the PCRE engine will take this as litteral and look for an ampersand followed by the word quote followed by a semi-colon. Just replace them with a normal " character.

 

'/"state":"(.+?)","player_count"/'

 

Edit: I was replying at the same time as rajivgonsalves, I know nothing of json, but if he is correct then that solution seems the better option.

Thanks for the quick reply!

 

After using that, I get the following:

 

Fatal error: Call to undefined function: json_decode() in /home/content/e/s/o/esone/html/test/test.php on line 4

 

So you can see my entire php file, I will put it here now.

 

<html>
<head>1. <?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$json = json_decode($data);

print_r($json);
?></head>
<body>
</body>
</html>

 

Thanks ^_^

 

 

@cags - Thanks for the reply. I also have tried that, but again no luck. I will switch it back, and reply with the results of that (as I do not have them now)

json_decode only works on PHP 5+ you have to download the wrapper if you want it to work with earlier version, following is a wrapper I use for my projects which are PHP 4

 

http://www.boutell.com/scripts/jsonwrapper.html

 

the whole idea of using json_decode is because the resulting output will be an array/object which will make it easier to extract data from

Looking at your code it's probably the &quote; that is causing you problems, the PCRE engine will take this as litteral and look for an ampersand followed by the word quote followed by a semi-colon. Just replace them with a normal " character.

 

'/"state":"(.+?)","player_count"/'

 

Edit: I was replying at the same time as rajivgonsalves, I know nothing of json, but if he is correct then that solution seems the better option.

 

 

You were correct. I changed them back to " and I got this:

 

array(2) { [0]=>  string(29) ""state":"open","player_count"" [1]=>  string(4) "open" } Array

 

Now, I see the wanted result is in [1], string 4.

 

How would I "if" this to say if [1] = "open" to echo the word "Glowing"?

Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"?

How would I "if" this to say if [1] = "open" to echo the word "Glowing"?

 

<?php
if ($match[1] == 'open') {
echo 'Glowing';
}
?>

 

But note that $match[1] won't exist when the pattern doesn't match the source, resulting in a thrown notice in those cases.

 

Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"?

 

That's possible yes, but problematic. Some servers ban your server's IP if they suspect you're automating a lot of requests. And the script would halt until the state changed to "open", and probably time out, depending on your server settings. A more realistic approach would be to make a request every 5 minutes e.g.

How would I "if" this to say if [1] = "open" to echo the word "Glowing"?

 

<?php
if ($match[1] == 'open') {
echo 'Glowing';
}
?>

 

But note that $match[1] won't exist when the pattern doesn't match the source, resulting in a thrown notice in those cases.

 

Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"?

 

That's possible yes, but problematic. Some servers ban your server's IP if they suspect you're automating a lot of requests. And the script would halt until the state changed to "open", and probably time out, depending on your server settings. A more realistic approach would be to make a request every 5 minutes e.g.

 

Thank you all so much!!!

 

If I may, I have one more set of questions.

 

1. Now, when it echos "glowing" (I also used "elseif" to make "not glowing") It still echos the arrays. How do I make it to where it only echos the results and not, i.e. "array(0) { } Not Glowing "?

 

2. badbad, you spoke of a way to parse every 5 minutes or so. How would I be able to do that?

 

 

I want to thank you all a lot. I am learning slowly on PHP, and a lot of the tutorials I have read go into "What makes it work" but not "why it works", which is how I learn.

 

1. Now, when it echos "glowing" (I also used "elseif" to make "not glowing") It still echos the arrays. How do I make it to where it only echos the results and not, i.e. "array(0) { } Not Glowing "?

 

Simply don't echo/print/var_dump()/print_r() the array.

 

2. badbad, you spoke of a way to parse every 5 minutes or so. How would I be able to do that?

 

Via a cron job.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.