[SOLVED] getting info from another site

nutt318 · October 6, 2007

Not sure if this is the correct area but anyways here is my question. There is a website that has a section that has some price quotes and stocks. Anyways I want to be able to capture that information and put it on my own website. I am not sure how to do this and not sure what to look for in source codes.

So basically I just want to copy a little section of another website and have that information in my website. Just let me know if that is possible.

Thanks,

Jake

Yesideez · October 6, 2007

You can do it but you'd have to either rely on them having their website exactly te same for following some sort of standard which is almost impossible.

If you know the format of their site doesn't change then you can parse the information but if they change their page layout even slightly then your script will fail.

Would be interesting to see what others make of this.

nutt318 · October 7, 2007

Cool, well if it can be done I sure would like to try. So basically I am looking or should look for some code on how to parse a web site? If possible do you know any code or some ruff idea on some of the code to do this?

Thanks,

Jake

Rithiur · October 7, 2007

Well, basically you should be looking into regular expression. Parsing data out of websites with regex is relatively "easy" (as long as you know regexp). Basically, you just download the page and then capture the data from it using preg_match.

As Yesideez said, however, you will have to update the script every time they change the layout, because your parsing solution will fail. This is not particularly big problem, though. It just means that the script will need to maintained and you should also try to make any parsing scripts as maintainable as possible (in one project of mine, all website parsers are in separate classes, making them easy to update if needed).

tibberous · October 7, 2007

This is so god damn easy that you should be able to do it in 5 lines. You can do it with more lines using curl and thus make it better.

Here is an example:

  $data = file_get_contents("http://www.filefactory.com/upload/upload_flash_begin.php?files=1");

  if($data === FALSE)
   die("Couldn't load url.");
   
  preg_match_all("/<viewhash>([^<]*)/", $data, $viewhashes);
  
  return $viewhashes[1][0];

The first line reads in all the data at the site, in 1 line. Go PHP!

Lines two and three are error handling.

Line 4, that's your parsing. VERY important to learn regex, or pay someone to do your regex for you, like that one guy who is the only person I have seen take so much pride in not knowing something.

Last line - your data! If you are not good with regex, or even if you are, you can use print_r on $matches to make sure you get the right thing.

And that it. All you need to do. And it's done. And so and am and my bitch girlfriend, so I'm gonna go to sleep, by myself drink, and you can tell everyone what a nice guy I am, cause I'm suck a great fucking guy she left me.

tibberous · October 7, 2007

Also if you don't use regex, you can use explode, with pretty decent results. It takes longer, and there is more code. Same is true for strpos, and stripos if you are using PHP5.

nutt318 · October 8, 2007

tibberous,

Ok, well I do not know alot about regex but I am a pretty quick and easy learner. So if I wanted use that type of code you posted let me see if this is right. Basically on the webpage below I want to get the Nymex Crude Future prices for that line and possibly some others. Also just line 4 of the code i am having troubles understanding on how to select just certian parts of the data on that page. Anyways let me know on how i would put the code in the viewhash area. Thanks,

$data = file_get_contents("http://www.bloomberg.com/markets/commodities/energyprices.html");

  if($data === FALSE)
   die("Couldn't load url.");
   
  preg_match_all("/<viewhash>([^<]*)/", $data, $viewhashes);
  
  return $viewhashes[1][0];

nutt318 · October 8, 2007

Well I modified my code a little more but now i am getting an error on my page when it loads that says

Warning: file_get_contents(): URL file-access is disabled in the server configuration

Rithiur · October 8, 2007

Warning: file_get_contents(): URL file-access is disabled in the server configuration

It means the url wrappers are disabled for file handling functions, so they can't access remote locations.

You need to either get the allow_url_fopen enabled in the php.ini (which may not be possible in shared hosting, if that's what you have). Another way to download remote web pages is using the CURL functions. You can find more information about CURL in the php manual:

http://docs.php.net/manual/en/ref.curl.php

However, here is a simple example of how to download a page using CURL:

<?php
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://rithiur.anthd.com/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);

$contents = curl_exec($ch);
curl_close($ch);

echo $contents;
?>

nutt318 · October 8, 2007

Rithiur, Well that worked great for displaying the page information in my web page. But if i want to select just a certian area of that page to be displayed but not the whole page how would i go about doing that?

For example if I want to select just 3 to 4 lines of text in the middle of the page what else would i need in the code.

I have serach the php.net and have been looking the the table of contents for the cURL info but cannot find how to get just a certian area to display. Let me know if you have and idea.

thanks,

nutt318 · October 8, 2007

Thanks to Rithiur he helped me with this code and the results are exactly what was needed, THanks man.


<?php
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://www.bloomberg.com/markets/commodities/energyprices.html");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);

$contents = curl_exec($ch);
curl_close($ch);

function find_values ($string, $page)
{
$string = preg_quote($string, '#');

// takes everything from the given string to end of row
preg_match("#$string(.*)</tr>#Us", $page, $match);

// Get the values from the row we found previously
preg_match_all("#<span[^>]*>([^<]*)</span>#s", $match[1], $values);

// Return the values	
return $values[1];
}

$find = find_values('Nymex Crude Future', $contents);
echo "Nymex Crude Future: Price = $find[0], Change = $find[1], & Change = $find[2], Time = $find[3]<br>";

$find = find_values('Nymex Heating Oil Future', $contents);
echo "Nymex Heating Oil Future: Price = $find[0], Change = $find[1], & Change = $find[2], Time = $find[3]<br>";

$find = find_values('Nymex RBOB Gasoline Future', $contents);
echo "Nymex RBOB Gasoline Future: Price = $find[0], Change = $find[1], & Change = $find[2], Time = $find[3]<br>";


?>

Sign In

[SOLVED] getting info from another site

Recommended Posts

nutt318

Link to comment

Share on other sites

Yesideez

Link to comment

Share on other sites

nutt318

Link to comment

Share on other sites

Rithiur

Link to comment

Share on other sites

tibberous

Link to comment

Share on other sites

tibberous

Link to comment

Share on other sites

nutt318

Link to comment

Share on other sites

nutt318

Link to comment

Share on other sites

Rithiur

Link to comment

Share on other sites

nutt318

Link to comment

Share on other sites

nutt318

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information