Arkane Posted September 12, 2009 Share Posted September 12, 2009 Hey, I'm trying to write a script to get data off of another site (with admins approval) but I'm having a bit of a problem to take the actual data. I'm trying to use preg_match to find the data I need, but just cant get the thing to work for anything more complicated that telling me if there is a t in test. <td class="reltdd">Serial Code</td> <td class="reltdv">5445-9826</td> This is the html that i am trying to scrape. theres more, but its all the same pretty much. What I'm looking to get is the '5445-9826' but since the td is a class that is referred to multiple times, the only thing unique to the date is the 'Serial Code' text. I've gotten the site info via file_get_contents() and its all in the one variable - $raw . I've tried preg_match("/<td class="reltdd">(.*)</td>/", $html, $matches); echo $matches; but it had no return whatsoever. i have also echoed $html so I know it got the data correctly. I know that what I have there should only return 'Serial Code' but sinceI can't even get that to work I have no chance with the rest. Any help would be appreciated. Quote Link to comment Share on other sites More sharing options...
Garethp Posted September 12, 2009 Share Posted September 12, 2009 '~<td class=\"*+\">Serial Code</td>\s<td class=\"*+\">([0-9\-]+)</td>~' and you should print_r($matches); not echo, because it's an array Quote Link to comment Share on other sites More sharing options...
Arkane Posted September 12, 2009 Author Share Posted September 12, 2009 thanks for getting back so quickly. I've tried that, but I'm not having any luck with it. Basically my entire code: <?php $url = "http://www.advanscene.com/html/Releases/dbrelpsp.php?id=1908"; $raw = file_get_contents($url); preg_match('~<td class=\"*+\">Serial Code</td>\s<td class=\"*+\">([0-9\-]+)</td>~', $raw, $matches); print_r($matches); ?> I'm intending to take about 4 different pieces from the page and write them to variables, but obviously getting nowhere. Even trying the bit you gave me displays nothing but "Array ( )". What am i missing? Quote Link to comment Share on other sites More sharing options...
Garethp Posted September 12, 2009 Share Posted September 12, 2009 Your problem is that $raw does not contain <td class="reltdd">Serial Code</td> <td class="reltdv">5445-9826</td> The closest it comes is <td class="reltdd">UMD Serial</td> <td class="reltdv">ULUS-10457</td> If you use this <?php $url = "http://www.advanscene.com/html/Releases/dbrelpsp.php?id=1908"; $raw = file_get_contents($url); preg_match('~<td class=\"*+\">([0-9\-]+)</td>~', $raw, $matches); print_r($matches); ?> It'll match anything that's a number or a number with dashes in it. If you want something more specific, you need to know exactly what you're looking for, or atleast the pattern which you're looking for. Quote Link to comment Share on other sites More sharing options...
.josh Posted September 12, 2009 Share Posted September 12, 2009 If you have admin approval, why not work with them to do something other than scrape the page? Not calling you a liar, just saying...it would make your life easier if they could set something up for you... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.