Jump to content

[SOLVED] Parsing from an external site that contains similar text


Someone789

Recommended Posts

Hi,

 

Firstly, I'm completely new to Regex.

 

There is a page on another website which continually displays updated information, with the most recent text appearing in descending order (as in, under the previous update). For example:

 

There are now 1005 sales on Wednesday, June 18, 2008.

There are now 1023 sales on Thursday, June 19, 2008.

There are now 1095 sales on Friday, June 20, 2008.

There are now 1205 sales on Saturday, June 21, 2008.

I would like to have only the most recently updated line (the very last line) of text display on my site, but am unsure of exactly how this could be done. I currently have this code:

<?php

$subject = file_get_contents('http://[Website].com');
$regex = '%There are now ([^]]+) 2008%';

preg_match($regex, $subject, $match);

echo "$match[1]";
?>

However, this simply displays as:

1005 sales on Wednesday, June 18, 2008.

There are now 1023 sales on Thursday, June 19, 2008.

There are now 1095 sales on Friday, June 20, 2008.

There are now 1205 sales on Saturday, June 21,

But the line I am aiming to get is only the very last one, which when using my code, I want to display as:

1205 sales on Saturday, June 21,

Is there some way to make it so that I can match only the very last instance of the text between "There are now" and "2008"?

 

Thanks, any help would be greatly appreciated.  :)

 

Link to comment
Share on other sites

adding some tricks:

 

<?php
$text = "There are now 1005 sales on Wednesday, June 18, 2008.\n"
      . "There are now 1023 sales on Thursday, June 19, 2008.\n"
      . "There are now 1095 sales on Friday, June 20, 2008.\n"
      . "There are now 1205 sales on Saturday, June 21, 2008.\n";

$lines = explode("\n", trim($text));
$line  = $lines[count($lines) - 1];

preg_match("/^There are now (.+), 2008.$/", $line, $matches);

echo $matches[1];  /* will print: 1205 sales on Saturday, June 21 */

Link to comment
Share on other sites

try

<?php
$text = 'There are now 5 sales on Wednesday, June 18, 2008.
There are now 1023 sales on Thursday, June 19, 2008.
There are now 1205 sales on Friday, June 20, 2008.
There are now 1205 sales on Saturday, June 21, 2008.';
preg_match_all('/There are now (\d+) .*?(?= 2008\.)/', $text, $out);
$max = max($out[1]);
$keys = array_keys($out[1], $max);
foreach ($keys as $key) echo $out[0][$key],"\n";
?>

Link to comment
Share on other sites

Thanks for the replies!

 

However, I'm afraid while those scripts work nicely, I need something that will continually update itself with the most recent version of the page I'm pulling from. The external site is continually updated about every day, so I need the script to pull the entire contents from the page upon each load using something like the file_get_contents command, without having to actually update the $text variable by hand each time.

 

I tried combining the first script posted with the file_get_contents command, but I'm afraid that doesn't work either, but this is along the lines of what I'm looking for:

 

<?php
$text = file_get_contents('http://[Website].com');

$lines = explode("\n", trim($text));
$line  = $lines[count($lines) - 1];

preg_match("/^There are now (.+), 2008.$/", $line, $matches);
echo $matches[1]; 
?>

 

Further help would be greatly appreciated.

Link to comment
Share on other sites

 

try

<?php
$text = file_get_contents('http://[Website].com');
preg_match_all('/There are now (\d+) .*?(?= 2008\.)/', $text, $out);
$max = max($out[1]);
$keys = array_keys($out[1], $max);
foreach ($keys as $key) echo $out[0][$key],"\n";
?>

you don't need to explode text in line

Link to comment
Share on other sites

Spoke too soon I'm afraid - looks like I'll need a bit more help.

 

I have the following section of text that, as before, is continually updated in descending order (newest updates appearing below the previous ones):

Statistics: March: Model 55: Slow

[some random text here..]

Statistics: April: Model 55: Medium

[some random text here..]

Statistics: May: Model 55: Fast

[some random text here..]

Statistics: June: Model 55: Medium

[some random text here..]

Using the word 'Statistics' as my starting point, I'd like to pull the entire last line of text beginning with 'Statistics' (the bolded line) and display it on my site. And as last time, since that last line is constantly updating, I'd like the script to always pull the text from the line beginning with the very last instance of "Statistics". This will ensure that the most recently updated information is pulled.

 

For example, I would want my code, if properly working right now, to pull the text of "Statistics: June: Model 55: Medium"; however if the page was updated to say something like "Statistics: July: Model 55: Slow", then it would display that data instead as that line of text would then be farther down the page than the previous month's data.

 

Here is the code I have so far:

 

<?php
$text2 = file_get_contents('http://[Website].com');
$text = strtolower($text2);

preg_match_all('/statistics(.+) .*?(?=<br>)/', $text, $out);
$max = max($out[1]);
$keys = array_keys($out[1], $max);
foreach ($keys as $key) echo $out[0][$key],"\n";
?>

I'm almost positive that my error is in the preg_match_all() line, but just can't figure it out. I used the

(?=<br>)

to show that the parsing should end after the text line has ended where there would be a break tag, but perhaps that doesn't work too well?

 

Thanks in advance for any help!

Link to comment
Share on other sites

Thanks for the response. No luck with that code I'm afraid.

 

I would have given my entire code, but I don't think that would really matter much as the text, tags, placement of everything between the 'Statistics' lines are completely different from each other (and also could change as well), with just a break tag preceding each Statistics line as so:

<br>Statistics: March: Model 55: Slow
[Random stuff here..]
<br>Statistics: April: Model 55: Medium
[Random stuff here..]
<br>Statistics: May: Model 55: Fast
[Random stuff here..]
<br>Statistics: June: Model 55: Medium
[Random stuff here..]

 

To state my question in a bit different way - I want to specify a keyword, and display the next 35 characters that come after that keyword using regular expressions. The only catch is that there are multiple instances of this keyword on the page - I just want to use the very last instance of that keyword as my starting point.

Link to comment
Share on other sites

try

<?php
$text = '<br>Statistics: March: Model 55: Slow
[Random stuff here..]
<br>Statistics: April: Model 55: Medium
[Random stuff here..]
<br>Statistics: May: Model 55: Fast
[Random stuff here..]
<br>Statistics: June: Model 55: Medium
[Random stuff here..]';
$key_word = 'Statistics:';
$start = strrpos($text, $key_word) + strlen($key_word);
$out = substr($text, $start, 35);
echo $out;
?>

Link to comment
Share on other sites

The [Random stuff here..] literally meant just that - but thanks so much, that was a huge help and exactly what I was looking for!

 

Here's what worked:

<?php
$text = file_get_contents('http://[Website.com]');
$key_word = 'Statistics';
$start = strrpos($text, $key_word) + strlen($key_word);
$out = substr($text, $start, 35);
echo $out;
?>

 

 

 

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.