[SOLVED] Parsing from an external site that contains similar text

Someone789 · June 27, 2008

Hi,

Firstly, I'm completely new to Regex.

There is a page on another website which continually displays updated information, with the most recent text appearing in descending order (as in, under the previous update). For example:

There are now 1005 sales on Wednesday, June 18, 2008.
There are now 1023 sales on Thursday, June 19, 2008.

There are now 1095 sales on Friday, June 20, 2008.

There are now 1205 sales on Saturday, June 21, 2008.

I would like to have only the most recently updated line (the very last line) of text display on my site, but am unsure of exactly how this could be done. I currently have this code:

<?php

$subject = file_get_contents('http://[Website].com');
$regex = '%There are now ([^]]+) 2008%';

preg_match($regex, $subject, $match);

echo "$match[1]";
?>

However, this simply displays as:

1005 sales on Wednesday, June 18, 2008.
There are now 1023 sales on Thursday, June 19, 2008.

There are now 1095 sales on Friday, June 20, 2008.

There are now 1205 sales on Saturday, June 21,

But the line I am aiming to get is only the very last one, which when using my code, I want to display as:

1205 sales on Saturday, June 21,

Is there some way to make it so that I can match only the very last instance of the text between "There are now" and "2008"?

Thanks, any help would be greatly appreciated.

nashruddin · June 27, 2008

adding some tricks:

<?php
$text = "There are now 1005 sales on Wednesday, June 18, 2008.\n"
      . "There are now 1023 sales on Thursday, June 19, 2008.\n"
      . "There are now 1095 sales on Friday, June 20, 2008.\n"
      . "There are now 1205 sales on Saturday, June 21, 2008.\n";

$lines = explode("\n", trim($text));
$line  = $lines[count($lines) - 1];

preg_match("/^There are now (.+), 2008.$/", $line, $matches);

echo $matches[1];  /* will print: 1205 sales on Saturday, June 21 */

sasa · June 27, 2008

try

<?php
$text = 'There are now 5 sales on Wednesday, June 18, 2008.
There are now 1023 sales on Thursday, June 19, 2008.
There are now 1205 sales on Friday, June 20, 2008.
There are now 1205 sales on Saturday, June 21, 2008.';
preg_match_all('/There are now (\d+) .*?(?= 2008\.)/', $text, $out);
$max = max($out[1]);
$keys = array_keys($out[1], $max);
foreach ($keys as $key) echo $out[0][$key],"\n";
?>

Someone789 · June 27, 2008

Thanks for the replies!

However, I'm afraid while those scripts work nicely, I need something that will continually update itself with the most recent version of the page I'm pulling from. The external site is continually updated about every day, so I need the script to pull the entire contents from the page upon each load using something like the file_get_contents command, without having to actually update the $text variable by hand each time.

I tried combining the first script posted with the file_get_contents command, but I'm afraid that doesn't work either, but this is along the lines of what I'm looking for:

<?php
$text = file_get_contents('http://[Website].com');

$lines = explode("\n", trim($text));
$line  = $lines[count($lines) - 1];

preg_match("/^There are now (.+), 2008.$/", $line, $matches);
echo $matches[1]; 
?>

Further help would be greatly appreciated.

sasa · June 27, 2008

try

<?php
$text = file_get_contents('http://[Website].com');
preg_match_all('/There are now (\d+) .*?(?= 2008\.)/', $text, $out);
$max = max($out[1]);
$keys = array_keys($out[1], $max);
foreach ($keys as $key) echo $out[0][$key],"\n";
?>

you don't need to explode text in line

Someone789 · June 27, 2008

Good stuff, thanks, that did the trick.

Someone789 · June 29, 2008

Spoke too soon I'm afraid - looks like I'll need a bit more help.

I have the following section of text that, as before, is continually updated in descending order (newest updates appearing below the previous ones):

Statistics: March: Model 55: Slow

[some random text here..]

Statistics: April: Model 55: Medium

[some random text here..]

Statistics: May: Model 55: Fast

[some random text here..]

Statistics: June: Model 55: Medium

[some random text here..]

Using the word 'Statistics' as my starting point, I'd like to pull the entire last line of text beginning with 'Statistics' (the bolded line) and display it on my site. And as last time, since that last line is constantly updating, I'd like the script to always pull the text from the line beginning with the very last instance of "Statistics". This will ensure that the most recently updated information is pulled.

For example, I would want my code, if properly working right now, to pull the text of "Statistics: June: Model 55: Medium"; however if the page was updated to say something like "Statistics: July: Model 55: Slow", then it would display that data instead as that line of text would then be farther down the page than the previous month's data.

Here is the code I have so far:

<?php
$text2 = file_get_contents('http://[Website].com');
$text = strtolower($text2);

preg_match_all('/statistics(.+) .*?(?=<br>)/', $text, $out);
$max = max($out[1]);
$keys = array_keys($out[1], $max);
foreach ($keys as $key) echo $out[0][$key],"\n";
?>

I'm almost positive that my error is in the preg_match_all() line, but just can't figure it out. I used the

(?=<br>)

to show that the parsing should end after the text line has ended where there would be a break tag, but perhaps that doesn't work too well?

Thanks in advance for any help!

effigy · June 30, 2008

What does the data look like, code and all?

Perhaps this? /^statistics.+/mi

Someone789 · June 30, 2008

Thanks for the response. No luck with that code I'm afraid.

I would have given my entire code, but I don't think that would really matter much as the text, tags, placement of everything between the 'Statistics' lines are completely different from each other (and also could change as well), with just a break tag preceding each Statistics line as so:

<br>Statistics: March: Model 55: Slow
[Random stuff here..]
<br>Statistics: April: Model 55: Medium
[Random stuff here..]
<br>Statistics: May: Model 55: Fast
[Random stuff here..]
<br>Statistics: June: Model 55: Medium
[Random stuff here..]

To state my question in a bit different way - I want to specify a keyword, and display the next 35 characters that come after that keyword using regular expressions. The only catch is that there are multiple instances of this keyword on the page - I just want to use the very last instance of that keyword as my starting point.

sasa · July 1, 2008

try

<?php
$text = '<br>Statistics: March: Model 55: Slow
[Random stuff here..]
<br>Statistics: April: Model 55: Medium
[Random stuff here..]
<br>Statistics: May: Model 55: Fast
[Random stuff here..]
<br>Statistics: June: Model 55: Medium
[Random stuff here..]';
$key_word = 'Statistics:';
$start = strrpos($text, $key_word) + strlen($key_word);
$out = substr($text, $start, 35);
echo $out;
?>

Someone789 · July 1, 2008

The [Random stuff here..] literally meant just that - but thanks so much, that was a huge help and exactly what I was looking for!

Here's what worked:

<?php
$text = file_get_contents('http://[Website.com]');
$key_word = 'Statistics';
$start = strrpos($text, $key_word) + strlen($key_word);
$out = substr($text, $start, 35);
echo $out;
?>

Sign In

[SOLVED] Parsing from an external site that contains similar text

Recommended Posts

Someone789

Link to comment

Share on other sites

nashruddin

Link to comment

Share on other sites

sasa

Link to comment

Share on other sites

Someone789

Link to comment

Share on other sites

sasa

Link to comment

Share on other sites

Someone789

Link to comment

Share on other sites

Someone789

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

Someone789

Link to comment

Share on other sites

sasa

Link to comment

Share on other sites

Someone789

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information