Jump to content

Just a scraping help. Regex problem?


EsOne

Recommended Posts

Ok. I am trying to do a basic scrape and echo results via PHP.

I have the information I am trying to scrape, and I think I am using the correct code, but I am not sure, since it is not showing the array of the results when I try it.

 

Here is a section of the page source I am trying to scrape

 

noire","user_id":"3237825","show_in_sig":"1","show_in_profile":1,"last_engine_run":"1260190557","tap_count":"11984","view_count":"10951","total_gold_won":2119743,"env_health":"32166","env_bg_id":null,"env_last_grant_time":"1260190557","inhab_retire":false,"game_info":{"1":{"type":1,"instance_id":"1190571832.1260284818.446690843","open_time":1260292832,"close_time":1260293672,"end_time":1260293732,"length":60,"results_time":1260293742,"state":"open","player_count":6}},"events":

 

The part I am trying to scrape is the "state":"open"

I am wanting the $match outcome to show if it is "open"

If the code is not "open", the whole line of code disappears. I am testing the code on open games, so I can see if it will come back and tell me the "state" is "open"

 

Here is the code I am using to do this...

 

<?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$regex = '/"state":"(.+?)","player_count"/';
preg_match($regex,$data,$match);
var_dump($match);
echo $match;
?>

 

Result is coming back:

1. array(0) { } Array

 

Completely not showing the results.

 

I tried scraping another section using the above code, and it did work, but the section that did work did not have any " around it. I am VERY new at PHP, so I am figuring it is something to do with my $regex, and the whole " in the results I am looking for.

 

 

Also, if I use the if command to say

if ($match = "open")
echo "glowing";

Would this echo "Glowing" (minus the ") when the match variable is equal to "open"

Link to comment
Share on other sites

your source does not have that string in it also you might want to try json_decode as it is a JSON string

 

<?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$json = json_decode($data);

print_r($json);
?>

Link to comment
Share on other sites

Looking at your code it's probably the &quote; that is causing you problems, the PCRE engine will take this as litteral and look for an ampersand followed by the word quote followed by a semi-colon. Just replace them with a normal " character.

 

'/"state":"(.+?)","player_count"/'

 

Edit: I was replying at the same time as rajivgonsalves, I know nothing of json, but if he is correct then that solution seems the better option.

Link to comment
Share on other sites

Thanks for the quick reply!

 

After using that, I get the following:

 

Fatal error: Call to undefined function: json_decode() in /home/content/e/s/o/esone/html/test/test.php on line 4

 

So you can see my entire php file, I will put it here now.

 

<html>
<head>1. <?php
$data = file_get_contents('http://www.gaiaonline.com/chat/gsi/index.php?v=json&m=[[6500%2C[1]]%2C[6510%2C[%22789151%22%2C0%2C1]]%2C[6511%2C[%22789151%22%2C0]]%2C[6512%2C[%22789151%22%2C0]]%2C[107%2C[%22null%22]]]&X=1260293122');
$json = json_decode($data);

print_r($json);
?></head>
<body>
</body>
</html>

 

Thanks ^_^

 

 

@cags - Thanks for the reply. I also have tried that, but again no luck. I will switch it back, and reply with the results of that (as I do not have them now)

Link to comment
Share on other sites

json_decode only works on PHP 5+ you have to download the wrapper if you want it to work with earlier version, following is a wrapper I use for my projects which are PHP 4

 

http://www.boutell.com/scripts/jsonwrapper.html

 

the whole idea of using json_decode is because the resulting output will be an array/object which will make it easier to extract data from

Link to comment
Share on other sites

Looking at your code it's probably the &quote; that is causing you problems, the PCRE engine will take this as litteral and look for an ampersand followed by the word quote followed by a semi-colon. Just replace them with a normal " character.

 

'/"state":"(.+?)","player_count"/'

 

Edit: I was replying at the same time as rajivgonsalves, I know nothing of json, but if he is correct then that solution seems the better option.

 

 

You were correct. I changed them back to " and I got this:

 

array(2) { [0]=>  string(29) ""state":"open","player_count"" [1]=>  string(4) "open" } Array

 

Now, I see the wanted result is in [1], string 4.

 

How would I "if" this to say if [1] = "open" to echo the word "Glowing"?

Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"?

Link to comment
Share on other sites

How would I "if" this to say if [1] = "open" to echo the word "Glowing"?

 

<?php
if ($match[1] == 'open') {
echo 'Glowing';
}
?>

 

But note that $match[1] won't exist when the pattern doesn't match the source, resulting in a thrown notice in those cases.

 

Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"?

 

That's possible yes, but problematic. Some servers ban your server's IP if they suspect you're automating a lot of requests. And the script would halt until the state changed to "open", and probably time out, depending on your server settings. A more realistic approach would be to make a request every 5 minutes e.g.

Link to comment
Share on other sites

How would I "if" this to say if [1] = "open" to echo the word "Glowing"?

 

<?php
if ($match[1] == 'open') {
echo 'Glowing';
}
?>

 

But note that $match[1] won't exist when the pattern doesn't match the source, resulting in a thrown notice in those cases.

 

Also, (as I said I was quite new with PHP), would I be able to do an "else" command to make it loop until it is "open"?

 

That's possible yes, but problematic. Some servers ban your server's IP if they suspect you're automating a lot of requests. And the script would halt until the state changed to "open", and probably time out, depending on your server settings. A more realistic approach would be to make a request every 5 minutes e.g.

 

Thank you all so much!!!

 

If I may, I have one more set of questions.

 

1. Now, when it echos "glowing" (I also used "elseif" to make "not glowing") It still echos the arrays. How do I make it to where it only echos the results and not, i.e. "array(0) { } Not Glowing "?

 

2. badbad, you spoke of a way to parse every 5 minutes or so. How would I be able to do that?

 

 

I want to thank you all a lot. I am learning slowly on PHP, and a lot of the tutorials I have read go into "What makes it work" but not "why it works", which is how I learn.

 

Link to comment
Share on other sites

1. Now, when it echos "glowing" (I also used "elseif" to make "not glowing") It still echos the arrays. How do I make it to where it only echos the results and not, i.e. "array(0) { } Not Glowing "?

 

Simply don't echo/print/var_dump()/print_r() the array.

 

2. badbad, you spoke of a way to parse every 5 minutes or so. How would I be able to do that?

 

Via a cron job.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.