jiat Posted April 13, 2011 Share Posted April 13, 2011 Hi all, I'm new to php (and the forum) and I'm trying to figure out something I feel should be relatively easy but can't figure out. I am wanting to parse the source of multiple web pages to get a list of classes. Here is what I've kind of tried to follow, with the help of the internet: $buffer = file_get_contents("http://catalog.utk.edu/content.php?catoid=5&navoid=386&cpage=1"); $regex = '/ something something /'; preg_match($regex,$buffer,$match); var_dump($match); echo $match[1]; } I'm trying to extract the course number and name from part of the source that looks like this: <td width="15">  </td> <td width="100%">•  <a href="preview_course_nopop.php?catoid=1&coid=32666" onClick="showCourse('1', '32666',this, 'a:2:{s:8:~location~;s:8:~template~;s:28:~course_program_display_field~;N;}'); return false;" target="_blank">ACCT 200 - Foundations of Accounting</a> </td> </tr> Now, here's the thing, beside the fact that I can't use regex properly, I want to be able to put this into a loop for multiple courses per page of source. I am pretty fluent in c++, but this is throwing me for a serious loop (pun intended) As a side note, I want to be able to do this for multiple pages as well, the only thing that changes in the page URL is the page=# part, so would it be possible to automate it for all 33 pages? Thanks for any help. jiat Link to comment https://forums.phpfreaks.com/topic/233576-parsing-source/ Share on other sites More sharing options...
requinix Posted April 13, 2011 Share Posted April 13, 2011 DOM is better for parsing HTML than a regular expression can ever be. Link to comment https://forums.phpfreaks.com/topic/233576-parsing-source/#findComment-1201011 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.