Jump to content

Code extraction from HTML


moola

Recommended Posts

Hey guys I have a bunch of these [code](value="http://www.youtube.com/v/shgRSgYwBv0" />)[/code] in an html file as you can see below. I want to run through the whole code and extract shgRSgYwBv0 (video ids).

NOTE:
I also have urls that look like this [code]http://www.youtube.com/watch?v=shgRSgYwBv0[/code] in the html file. I don't want to include those results in the extraction process. Or vice versa.

How do I run through the file and echo all the ids out or write them to a file. I know the basics (reading opening etc) but I don't know how to identify the id and extract it from the plethora of other codes around. Could someone help. Thanks

Sample from html file:
[code] <object width="250" height="250">
        <param name="movie" value="http://www.youtube.com/v/shgRSgYwBv0" />
        </param>
        <embed src="http://www.youtube.com/v/shgRSgYwBv0" type="application/x-shockwave-flash" width="250" height="250"></embed>
      </object>
      <object width="250" height="250">
        <param name="movie" value="http://www.youtube.com/v/zHKlefHeed8" />
        </param>
        <embed src="http://www.youtube.com/v/zHKlefHeed8" type="application/x-shockwave-flash" width="250" height="250"></embed>[/code]

Link to comment
Share on other sites

One last thing. (Each id is now in a file called (test.txt);
How do I write a page in php that reads the first 5 ids? and the next page reads the next 5 and so on...
I want to index them for easy access with a little bit of math.

I'm thinking this in the txt file which is going to be read(test.txt):
" 0: bW55-ybLzYE"
" 1: ilewoJIaYLg"
...

So page 1 will have ids 0 to 5. Page two will have 6 to 10. etc...

But the problem is another regex  (which I suck at). Maybe setting a tab delimeter between 0: and the id would help?
Link to comment
Share on other sites

[quote author=effigy link=topic=122379.msg504746#msg504746 date=1168805955]
Try something like this:
[code]
preg_match_all('%(?<=/v/).+?(?=")%', $data, $matches);
echo '<pre>', print_r($matches, true), '</pre>';
[/code]
[/quote]The lookaround master! :D


Reading id's from a flat file would require regex, it would be easier to do what you're talking about in a database. There are some good tutorials on that here: http://www.phpfreaks.com/tutorial_cat/25/Page-Number--Pagination.php
Otherwise, you'd have to read the file into an array with regex and just display 0-5 or 6-10, etc. You could look into SQLite (a flat file database system, I'm not very familiar with it, so I'm afraid I can't offer much advice). The file read would look something like this:
[code]$fh = fopen("/path_to/test.txt", "r");
$contents = fread($fh, filesize("/path_to/test.txt"));
fclose($fh);

// Parse the entries
preg_match_all('/^(\d+): (.+)/', $contents, $matches, PREG_SET_ORDER);

// Echo the ones you want
$max_entry = 5; // Set this via GET or something for each page
for($i = $max_entry - 5; $i < $max_entry; $i++) {
  echo $matches[$i][1].": ";
  echo $matches[$i][2]."\n";
}
[/code]
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.