Jump to content

[SOLVED] Parsing a link using CURL


HaLo2FrEeEk

Recommended Posts

Hey, I want to be able to retrieve the source code from a webpage and parse a single link from it, the link is a flv file in this format:

 

http://files.redvsblue.com/RvB05/5x(*)/fl4sh/(*).flv

 

Where (*) is a wildcard.  Here is one page I will be parsing:

 

http://rvb.roosterteeth.com/archive/episode.php?id=244

 

Look in the source code and you will see the url there, I want to download that flv file without having to search through the code to find that url.  Here is the code I have to get the page using CURL so far:

 

$ch = curl_init("http://rvb.roosterteeth.com/archive/episode.php?id=244");
$text=curl_exec($ch);
curl_close($ch);

 

And here is what I have for getting the link (it doesn't work, error is after the code):

 

preg_match("http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv",$text,$matches);
print_r($matches);

 

Error:

 

Warning: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash in /home/.hortense/halo2freeek/claninfectionist.com/misc/testing/curl.php on line 5

 

Can anyone help me please?

Link to comment
Share on other sites

perl regexps need a delimiter, like

 

preg_match('/a/', $str, $matches);

 

So try:

 

preg_match("/http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv/",$text,$matches);

 

Or you can change the delimiter to avoid all that escaping:

 

preg_match("|http://files\.redvsblue\.com/RvB05/5x(.*)/fl4sh/(.*)\.flv|",$text,$matches);

Link to comment
Share on other sites

This returns an empty array, no error, but nothing in the array.  Here is an example like that I will be pulling out:

 

http://files.redvsblue.com/RvB05/5x91distract/fl4sh/RvB91_009.flv

 

And these are the parts that change with every different link:

 

http://files.redvsblue.com/RvB05/5x(*)/fl4sh/RvB(*).flv

 

But when I use this code to parse it:

 

preg_match("|http://files\.redvsblue\.com/RvB05/5x(.*)/fl4sh/(.*)\.flv|",$text,$matches);

 

Or even this:

 

preg_match("|http://(.*).flv|",$text,$matches);

 

I get an empty array.  What am I doing wrong?

Link to comment
Share on other sites

Why?  Its not javascript, of course, I don't know about REGEX at all, but this is all php, I figured it would go in the php section.  And I still can't get it to work, I've tried a lot of different things, and I can't seem to get it to work, here is my code:

 

<?php
$ch = curl_init("http://rvb.roosterteeth.com/archive/episode.php?id=244");
$text=curl_exec($ch);
curl_close($ch);
preg_match("/http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv/",$text,$matches);
print_r($matches);
?>

 

I also don't want the code from the page to print on the page, is there a way to get it and set it to a variable without printing it out?  (its the curl_exec that does it, but I can't find an alternative.)

Link to comment
Share on other sites

I got it, I used this code:

 

<?php
include('../snoopy.php');
$snoopy = new Snoopy;
if($snoopy->fetch("http://rvb.roosterteeth.com/archive/episode.php?id=244"))
$text = ($snoopy->results);
preg_match("|http://files\.redvsblue\.com/RvB05/5x(.+?)/fl4sh/(.+?)\.flv|",$text, $url_arr);
preg_match("|episodeNum=(.+?)&|", $text, $episodenum_arr);
preg_match("|episode=(.+?)&|", $text, $episode_arr);
$url = $url_arr[0];
$episodenum = $episodenum_arr[1];
$episode = ucwords($episode_arr[1]);
echo "Download Red vs. Blue, " . $episodenum . ": " . $episode . ", in High Res flash video format: <a href=\"" . $url . "\">Here</a>.";
?>

 

Snoopy is a free library that uses CURL to store the page in a variable without actually printing out the page.  And here is the result:

 

http://claninfectionist.com/misc/testing/curl.php

 

I will change it now to make it so that the person can put in their own url and it will get it for them, the reason for this is that Rooster Teeth uses flash video format files to stream their videos over the net, but also allows non subscribers to download lo res versions in wmv, I found out that these flv's are high(er) res, and better quality, so I began downloading them the hard way (look at the code, then use a script that I made that will make a link for you) and downloading it that way, it worked, but it was slow, now I can put in the url of the page itself and this code will get the flv url and make a link for me.  Thank you everyone that helped me, I appreciate it.

 

I also figured out regular expression statements, I think, to a degree, so I could also get the episode number and episode name from this pag as well as the url.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.