Jump to content

[SOLVED] Screen Scraping and REGEX


JamieThompson90

Recommended Posts

Hello evryone, I hate to open a new account and ask a question straight off, but however I've done it and freely ashamed of myself!

What it is, Im trying to capture Stock Quotes from Yahoo Finance so I came up with this code:

<?php
<?php
$symbol='NRK.L';
$theurl="http://uk.finance.yahoo.com/q?s=$symbol";
  if (!($contents = file_get_contents($theurl)))
  {    echo 'Invalid URL';  exit; }
$pattern = '
(((At\s([1-9]|1[0-2])[0-5][0-9])([AP]M))|(On (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s[0-9]+)\s</small><big><b>([0-9]+).([0-9]+)\sp))
';

  if (eregi($pattern, $contents, $quote))
  { 
    echo "<p>";
    echo "$quote[1] $symbol"; 
    echo '</p>';
  } 
  else 
  {
    echo '<p>No quote available</p>';
  };
  
  ?>

In order to match the source code of

http://uk.betastreaming.finance.yahoo.com/q?s=nrk.l

Were it reads:

<small>At 4:35PM   </small><big><b>86.00 p

 

OR

 

<small>On Dec 14 </small><big><b>91.90 p 

This data is dynamic and will change regularly, the time will update (At x:xxAM|PM) Or it will display a date (On MMM DD) And the price will vary constantly.

When I have checked my REGEX statment it works agains that specific string, however it willnot work agains the full page!

Any ideas where I may be going wrong?

I apricaite any help!

Thanks in advanced.

Jamie

Link to comment
Share on other sites

this works for me:

<?php
$hay = '<small>at 4:35PM   </small><big><b>86.00 p lal
ala <small>On Dec 14 </small><big><b>91.90 p ';
$pat '~<small>[a-zA-Z]{2} (.+?)</small><big><b>([0-9][0-9]\.[0-9][0-9]) p~s';
preg_match_all($pat, $hay, $out);
print_r($out);
?>

 

 

you were missing a delimeter in the pattern, and adding the s modifier will make it parse through multiple lines

where ~ is the delimeter

$pat = '~regex pattern~s';

 

 

you also might want to output your full source code page that you're using as your haystack, to make sure it is what you think it is

Link to comment
Share on other sites

  • 3 weeks later...
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.