jiat Posted April 13, 2011 Share Posted April 13, 2011 Hi all, I'm new to php (and the forum) and I'm trying to figure out something I feel should be relatively easy but can't figure out. I am wanting to parse the source of multiple web pages to get a list of classes. Here is what I've kind of tried to follow, with the help of the internet: $buffer = file_get_contents("http://catalog.utk.edu/content.php?catoid=5&navoid=386&cpage=1"); $regex = '/ something something /'; preg_match($regex,$buffer,$match); var_dump($match); echo $match[1]; } I'm trying to extract the course number and name from part of the source that looks like this: <td width="15">  </td> <td width="100%">•  <a href="preview_course_nopop.php?catoid=1&coid=32666" onClick="showCourse('1', '32666',this, 'a:2:{s:8:~location~;s:8:~template~;s:28:~course_program_display_field~;N;}'); return false;" target="_blank">ACCT 200 - Foundations of Accounting</a> </td> </tr> Now, here's the thing, beside the fact that I can't use regex properly, I want to be able to put this into a loop for multiple courses per page of source. I am pretty fluent in c++, but this is throwing me for a serious loop (pun intended) As a side note, I want to be able to do this for multiple pages as well, the only thing that changes in the page URL is the page=# part, so would it be possible to automate it for all 33 pages? Thanks for any help. jiat Quote Link to comment https://forums.phpfreaks.com/topic/233576-parsing-source/ Share on other sites More sharing options...
requinix Posted April 13, 2011 Share Posted April 13, 2011 DOM is better for parsing HTML than a regular expression can ever be. Quote Link to comment https://forums.phpfreaks.com/topic/233576-parsing-source/#findComment-1201011 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.