Code extraction from HTML

moola · January 14, 2007

Hey guys I have a bunch of these [code](value="http://www.youtube.com/v/shgRSgYwBv0" />)[/code] in an html file as you can see below. I want to run through the whole code and extract shgRSgYwBv0 (video ids).

NOTE:
I also have urls that look like this [code]http://www.youtube.com/watch?v=shgRSgYwBv0[/code] in the html file. I don't want to include those results in the extraction process. Or vice versa.

How do I run through the file and echo all the ids out or write them to a file. I know the basics (reading opening etc) but I don't know how to identify the id and extract it from the plethora of other codes around. Could someone help. Thanks

Sample from html file:
[code] <object width="250" height="250">
<param name="movie" value="http://www.youtube.com/v/shgRSgYwBv0" />
</param>
<embed src="http://www.youtube.com/v/shgRSgYwBv0" type="application/x-shockwave-flash" width="250" height="250"></embed>
</object>
<object width="250" height="250">
<param name="movie" value="http://www.youtube.com/v/zHKlefHeed8" />
</param>
<embed src="http://www.youtube.com/v/zHKlefHeed8" type="application/x-shockwave-flash" width="250" height="250"></embed>[/code]

effigy · January 14, 2007

Try something like this:

[code]
preg_match_all('%(?<=/v/).+?(?=")%', $data, $matches);
echo '<pre>', print_r($matches, true), '</pre>';
[/code]

moola · January 14, 2007

One last thing. (Each id is now in a file called (test.txt);
How do I write a page in php that reads the first 5 ids? and the next page reads the next 5 and so on...
I want to index them for easy access with a little bit of math.

I'm thinking this in the txt file which is going to be read(test.txt):
" 0: bW55-ybLzYE"
" 1: ilewoJIaYLg"
...

So page 1 will have ids 0 to 5. Page two will have 6 to 10. etc...

But the problem is another regex (which I suck at). Maybe setting a tab delimeter between 0: and the id would help?

c4onastick · January 14, 2007

[quote author=effigy link=topic=122379.msg504746#msg504746 date=1168805955]
Try something like this:
[code]
preg_match_all('%(?<=/v/).+?(?=")%', $data, $matches);
echo '<pre>', print_r($matches, true), '</pre>';
[/code]
[/quote]The lookaround master! :D

Reading id's from a flat file would require regex, it would be easier to do what you're talking about in a database. There are some good tutorials on that here: http://www.phpfreaks.com/tutorial_cat/25/Page-Number--Pagination.php
Otherwise, you'd have to read the file into an array with regex and just display 0-5 or 6-10, etc. You could look into SQLite (a flat file database system, I'm not very familiar with it, so I'm afraid I can't offer much advice). The file read would look something like this:
[code]$fh = fopen("/path_to/test.txt", "r");
$contents = fread($fh, filesize("/path_to/test.txt"));
fclose($fh);

// Parse the entries
preg_match_all('/^(\d+): (.+)/', $contents, $matches, PREG_SET_ORDER);

// Echo the ones you want
$max_entry = 5; // Set this via GET or something for each page
for($i = $max_entry - 5; $i < $max_entry; $i++) {
echo $matches[$i][1].": ";
echo $matches[$i][2]."\n";
}
[/code]

Sign In

Code extraction from HTML

Recommended Posts

moola

Link to comment

Share on other sites

effigy

Link to comment

Share on other sites

moola

Link to comment

Share on other sites

c4onastick

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information