[SOLVED] Parsing a link using CURL

HaLo2FrEeEk · March 14, 2007

Hey, I want to be able to retrieve the source code from a webpage and parse a single link from it, the link is a flv file in this format:

http://files.redvsblue.com/RvB05/5x(*)/fl4sh/(*).flv

Where (*) is a wildcard. Here is one page I will be parsing:

http://rvb.roosterteeth.com/archive/episode.php?id=244

Look in the source code and you will see the url there, I want to download that flv file without having to search through the code to find that url. Here is the code I have to get the page using CURL so far:

$ch = curl_init("http://rvb.roosterteeth.com/archive/episode.php?id=244");
$text=curl_exec($ch);
curl_close($ch);

And here is what I have for getting the link (it doesn't work, error is after the code):

preg_match("http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv",$text,$matches);
print_r($matches);

Error:

Warning: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash in /home/.hortense/halo2freeek/claninfectionist.com/misc/testing/curl.php on line 5

Can anyone help me please?

btherl · March 14, 2007

perl regexps need a delimiter, like

preg_match('/a/', $str, $matches);

So try:

preg_match("/http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv/",$text,$matches);

Or you can change the delimiter to avoid all that escaping:

preg_match("|http://files\.redvsblue\.com/RvB05/5x(.*)/fl4sh/(.*)\.flv|",$text,$matches);

HaLo2FrEeEk · March 14, 2007

Ok, lemme try it, my server is being dumb right now, so I can't try it RIGHT now, but as soon as I get a chance, I will try that out.

HaLo2FrEeEk · March 14, 2007

This returns an empty array, no error, but nothing in the array. Here is an example like that I will be pulling out:

http://files.redvsblue.com/RvB05/5x91distract/fl4sh/RvB91_009.flv

And these are the parts that change with every different link:

http://files.redvsblue.com/RvB05/5x(*)/fl4sh/RvB(*).flv

But when I use this code to parse it:

preg_match("|http://files\.redvsblue\.com/RvB05/5x(.*)/fl4sh/(.*)\.flv|",$text,$matches);

Or even this:

preg_match("|http://(.*).flv|",$text,$matches);

I get an empty array. What am I doing wrong?

HaLo2FrEeEk · March 14, 2007

Bump!

per1os · March 14, 2007

Wouldn't this be better placed in the Regex Forum?

HaLo2FrEeEk · March 15, 2007

Why? Its not javascript, of course, I don't know about REGEX at all, but this is all php, I figured it would go in the php section. And I still can't get it to work, I've tried a lot of different things, and I can't seem to get it to work, here is my code:

<?php
$ch = curl_init("http://rvb.roosterteeth.com/archive/episode.php?id=244");
$text=curl_exec($ch);
curl_close($ch);
preg_match("/http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv/",$text,$matches);
print_r($matches);
?>

I also don't want the code from the page to print on the page, is there a way to get it and set it to a variable without printing it out? (its the curl_exec that does it, but I can't find an alternative.)

HaLo2FrEeEk · March 15, 2007

Ok, I have a library called snoopy that I can use to get the source code from the website and parse it, but I need to know how to parse it, the preg_replace I have been using doesn't work. I need some help people, please.

per1os · March 15, 2007

I said regex, because the preg_match(); uses regular expressions. A lot of php coders are not very good at regex (such as myself) I can do it but i is always a guessing game for me.

HaLo2FrEeEk · March 15, 2007

I got it, I used this code:

<?php
include('../snoopy.php');
$snoopy = new Snoopy;
if($snoopy->fetch("http://rvb.roosterteeth.com/archive/episode.php?id=244"))
$text = ($snoopy->results);
preg_match("|http://files\.redvsblue\.com/RvB05/5x(.+?)/fl4sh/(.+?)\.flv|",$text, $url_arr);
preg_match("|episodeNum=(.+?)&|", $text, $episodenum_arr);
preg_match("|episode=(.+?)&|", $text, $episode_arr);
$url = $url_arr[0];
$episodenum = $episodenum_arr[1];
$episode = ucwords($episode_arr[1]);
echo "Download Red vs. Blue, " . $episodenum . ": " . $episode . ", in High Res flash video format: <a href=\"" . $url . "\">Here</a>.";
?>

Snoopy is a free library that uses CURL to store the page in a variable without actually printing out the page. And here is the result:

http://claninfectionist.com/misc/testing/curl.php

I will change it now to make it so that the person can put in their own url and it will get it for them, the reason for this is that Rooster Teeth uses flash video format files to stream their videos over the net, but also allows non subscribers to download lo res versions in wmv, I found out that these flv's are high(er) res, and better quality, so I began downloading them the hard way (look at the code, then use a script that I made that will make a link for you) and downloading it that way, it worked, but it was slow, now I can put in the url of the page itself and this code will get the flv url and make a link for me. Thank you everyone that helped me, I appreciate it.

I also figured out regular expression statements, I think, to a degree, so I could also get the episode number and episode name from this pag as well as the url.

Sign In

[SOLVED] Parsing a link using CURL

Recommended Posts

HaLo2FrEeEk

Link to comment

Share on other sites

btherl

Link to comment

Share on other sites

HaLo2FrEeEk

Link to comment

Share on other sites

HaLo2FrEeEk

Link to comment

Share on other sites

HaLo2FrEeEk

Link to comment

Share on other sites

per1os

Link to comment

Share on other sites

HaLo2FrEeEk

Link to comment

Share on other sites

HaLo2FrEeEk

Link to comment

Share on other sites

per1os

Link to comment

Share on other sites

HaLo2FrEeEk

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information