Jump to content

Code extraction from HTML


moola

Recommended Posts

Hey guys I have a bunch of these [code](value="http://www.youtube.com/v/shgRSgYwBv0" />)[/code] in an html file as you can see below. I want to run through the whole code and extract shgRSgYwBv0 (video ids).

NOTE:
I also have urls that look like this [code]http://www.youtube.com/watch?v=shgRSgYwBv0[/code] in the html file. I don't want to include those results in the extraction process. Or vice versa.

How do I run through the file and echo all the ids out or write them to a file. I know the basics (reading opening etc) but I don't know how to identify the id and extract it from the plethora of other codes around. Could someone help. Thanks

Sample from html file:
[code] <object width="250" height="250">
        <param name="movie" value="http://www.youtube.com/v/shgRSgYwBv0" />
        </param>
        <embed src="http://www.youtube.com/v/shgRSgYwBv0" type="application/x-shockwave-flash" width="250" height="250"></embed>
      </object>
      <object width="250" height="250">
        <param name="movie" value="http://www.youtube.com/v/zHKlefHeed8" />
        </param>
        <embed src="http://www.youtube.com/v/zHKlefHeed8" type="application/x-shockwave-flash" width="250" height="250"></embed>[/code]

Link to comment
https://forums.phpfreaks.com/topic/34160-code-extraction-from-html/
Share on other sites

One last thing. (Each id is now in a file called (test.txt);
How do I write a page in php that reads the first 5 ids? and the next page reads the next 5 and so on...
I want to index them for easy access with a little bit of math.

I'm thinking this in the txt file which is going to be read(test.txt):
" 0: bW55-ybLzYE"
" 1: ilewoJIaYLg"
...

So page 1 will have ids 0 to 5. Page two will have 6 to 10. etc...

But the problem is another regex  (which I suck at). Maybe setting a tab delimeter between 0: and the id would help?
[quote author=effigy link=topic=122379.msg504746#msg504746 date=1168805955]
Try something like this:
[code]
preg_match_all('%(?<=/v/).+?(?=")%', $data, $matches);
echo '<pre>', print_r($matches, true), '</pre>';
[/code]
[/quote]The lookaround master! :D


Reading id's from a flat file would require regex, it would be easier to do what you're talking about in a database. There are some good tutorials on that here: http://www.phpfreaks.com/tutorial_cat/25/Page-Number--Pagination.php
Otherwise, you'd have to read the file into an array with regex and just display 0-5 or 6-10, etc. You could look into SQLite (a flat file database system, I'm not very familiar with it, so I'm afraid I can't offer much advice). The file read would look something like this:
[code]$fh = fopen("/path_to/test.txt", "r");
$contents = fread($fh, filesize("/path_to/test.txt"));
fclose($fh);

// Parse the entries
preg_match_all('/^(\d+): (.+)/', $contents, $matches, PREG_SET_ORDER);

// Echo the ones you want
$max_entry = 5; // Set this via GET or something for each page
for($i = $max_entry - 5; $i < $max_entry; $i++) {
  echo $matches[$i][1].": ";
  echo $matches[$i][2]."\n";
}
[/code]

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.