Jump to content

Getting information from a site


Arkane

Recommended Posts

Hey,

 

I'm trying to write a script to get data off of another site (with admins approval) but I'm having a bit of a problem to take the actual data.

 

I'm trying to use preg_match to find the data I need, but just cant get the thing to work for anything more complicated that telling me if there is a t in test.

 

   <td class="reltdd">Serial Code</td>
   <td class="reltdv">5445-9826</td>

This is the html that i am trying to scrape.  theres more, but its all the same pretty much.  What I'm looking to get is the '5445-9826' but since the td is a class that is referred to multiple times, the only thing unique to the date is the 'Serial Code' text.

 

I've gotten the site info via file_get_contents() and its all in the one variable - $raw .

 

I've tried

preg_match("/<td class="reltdd">(.*)</td>/", $html, $matches);
echo $matches;

but it had no return whatsoever.  i have also echoed $html so I know it got the data correctly.

 

I know that what I have there should only return 'Serial Code' but sinceI can't even get that to work I have no chance with the rest.

 

Any help would be appreciated.

Link to comment
Share on other sites

thanks for getting back so quickly.

 

I've tried that, but I'm not having any luck with it.

 

Basically my entire code:

<?php
$url = "http://www.advanscene.com/html/Releases/dbrelpsp.php?id=1908";
$raw = file_get_contents($url);

preg_match('~<td class=\"*+\">Serial Code</td>\s<td class=\"*+\">([0-9\-]+)</td>~', $raw, $matches);
print_r($matches);
?>

I'm intending to take about 4 different pieces from the page and write them to variables, but obviously getting nowhere.  Even trying the bit you gave me displays nothing but "Array ( )".  What am i missing?

Link to comment
Share on other sites

Your problem is that $raw does not contain

 

   <td class="reltdd">Serial Code</td>
   <td class="reltdv">5445-9826</td>

 

The closest it comes is

 

   <td class="reltdd">UMD Serial</td>
   <td class="reltdv">ULUS-10457</td>

 

If you use this

 

<?php
$url = "http://www.advanscene.com/html/Releases/dbrelpsp.php?id=1908";
$raw = file_get_contents($url);

preg_match('~<td class=\"*+\">([0-9\-]+)</td>~', $raw, $matches);
print_r($matches);
?>

It'll match anything that's a number or a number with dashes in it. If you want something more specific, you need to know exactly what you're looking for, or atleast the pattern which you're looking for.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.