Parsing source

jiat · April 13, 2011

Hi all,

I'm new to php (and the forum) and I'm trying to figure out something I feel should be relatively easy but can't figure out.

I am wanting to parse the source of multiple web pages to get a list of classes. Here is what I've kind of tried to follow, with the help of the internet:

$buffer = file_get_contents("http://catalog.utk.edu/content.php?catoid=5&navoid=386&cpage=1"); 
$regex = '/ something something /';
preg_match($regex,$buffer,$match);
var_dump($match);
echo $match[1];
}

I'm trying to extract the course number and name from part of the source that looks like this:

<td width="15">&#160;&#160;</td>
		<td width="100%">&#8226;&#160;				<a href="preview_course_nopop.php?catoid=1&coid=32666" onClick="showCourse('1', '32666',this, 'a:2:{s:8:~location~;s:8:~template~;s:28:~course_program_display_field~;N;}'); return false;" target="_blank">ACCT 200 - Foundations of Accounting</a>

		</td>
	</tr>

Now, here's the thing, beside the fact that I can't use regex properly, I want to be able to put this into a loop for multiple courses per page of source. I am pretty fluent in c++, but this is throwing me for a serious loop (pun intended)

As a side note, I want to be able to do this for multiple pages as well, the only thing that changes in the page URL is the page=# part, so would it be possible to automate it for all 33 pages?

Thanks for any help.

jiat

requinix · April 13, 2011

DOM is better for parsing HTML than a regular expression can ever be.

Sign In

Parsing source

Recommended Posts

jiat

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information