Jump to content

Recommended Posts

I'm trying to pull the stock quotes Beta from yahoo finance since the yahoo query language doesn't support it. 

 

My code returns an empty array.  Any ideas why?

 

<?php

$content = file_get_contents('http://finance.yahoo.com/q?s=NFLX');
preg_match('#<tr><th width="48%" scope="row">Beta:</th><td class="yfnc_tabledata1">(.*)</td></tr>#', $content, $match);

print_array($match);

?>

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/
Share on other sites

file_get_contents does not work that way, it's only for files on your server. try printing $content to the screen to see what you get.

 

What you're trying to do will need a function which my mind has completely blanked on now, but I have used in the past...and it's driving me nuts that I can't recall the term for it. Hopefully someone else knows the functionality I'm thinking off... grrr.

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331937
Share on other sites

file_get_contents does not work that way, it's only for files on your server. try printing $content to the screen to see what you get.

 

What you're trying to do will need a function which my mind has completely blanked on now, but I have used in the past...and it's driving me nuts that I can't recall the term for it. Hopefully someone else knows the functionality I'm thinking off... grrr.

 

Printing content just prints the HTML of the page from yahoo.  I'm open to learning how to get the result I want. 

 

A little more background... This functionality will be dynamic so, if a user has 20 stocks entered in my app and then they hit calculate I would need my server to scrape 20 different pages.   

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331940
Share on other sites

file_get_contents does not work that way, it's only for files on your server.

 

That's not true. If you have allow_url_fopen set to true in the php.ini, you can view websites with it.

 

@OP: The problem is because you switched the attributes of the <th>

preg_match('#<tr><th scope="row" width="48%">Beta:</th><td class="yfnc_tabledata1">(.*)</td></tr>#', $content, $match);

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331944
Share on other sites

It looks like they are putting 'scope' before 'width', and you are putting 'width' before 'scope'.

 

Wahoo... good eyes...

 

I just grabbed the code from firebug, but firebug must have rearranged it.  It's working now. Thanks.

 

Will this be incredibly inefficient?  Any better way of doing this?

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331945
Share on other sites

@Jesirose,

file_get_contents() can read URIs and there is no need to use cURL: http://php.net/manual/en/function.file-get-contents.php

 

@unemployment,

Parse HTML DOM instead of using regexps on big HTML file.

Consider using my favorite library for this purpose:  http://simplehtmldom.sourceforge.net/

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331947
Share on other sites

It looks like they are putting 'scope' before 'width', and you are putting 'width' before 'scope'.

 

Wahoo... good eyes...

 

I just grabbed the code from firebug, but firebug must have rearranged it.  It's working now. Thanks.

 

Yeah, Firebug reformats stuff to make sure it's up to standards. If you want to do something like this you'll need to view the raw source.

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331948
Share on other sites

@unemployment,

Parse HTML DOM instead of using regexps on big HTML file.

Consider using my favorite library for this purpose:  http://simplehtmldom.sourceforge.net/

 

Using this library, you can get needed content using this code (not tested, but should work):

$html = file_get_html('http://finance.yahoo.com/q?s=NFLX');
echo $html->find('td[class=yfnc_tabledata1]')->innertext;

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331954
Share on other sites

@Jesirose,

file_get_contents() can read URIs and there is no need to use cURL: http://php.net/manual/en/function.file-get-contents.php

 

the project I was working on required the user to be logged in, perhaps that was why I had to use cURL. Thanks for the correction.

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331957
Share on other sites

@Jesirose,

file_get_contents() can read URIs and there is no need to use cURL: http://php.net/manual/en/function.file-get-contents.php

 

the project I was working on required the user to be logged in, perhaps that was why I had to use cURL. Thanks for the correction.

 

You could probably do that with file_get_contents as well, but cURL is probably easier for that.

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331960
Share on other sites

I just realized that my array is wrong.  It's pulling in content from the entire table.  Anyone know why my array is pulling in all this additional data?

 

<?php

$content = file_get_contents('http://finance.yahoo.com/q?s=NFLX');
preg_match('#<tr><th scope="row" width="48%">Beta:</th><td class="yfnc_tabledata1">(.*)</td></tr>#', $content, $match);

print_array($match);

?>

Link to comment
https://forums.phpfreaks.com/topic/259880-page-scraping-fails/#findComment-1331966
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.