HaLo2FrEeEk Posted March 14, 2007 Share Posted March 14, 2007 Hey, I want to be able to retrieve the source code from a webpage and parse a single link from it, the link is a flv file in this format: http://files.redvsblue.com/RvB05/5x(*)/fl4sh/(*).flv Where (*) is a wildcard. Here is one page I will be parsing: http://rvb.roosterteeth.com/archive/episode.php?id=244 Look in the source code and you will see the url there, I want to download that flv file without having to search through the code to find that url. Here is the code I have to get the page using CURL so far: $ch = curl_init("http://rvb.roosterteeth.com/archive/episode.php?id=244"); $text=curl_exec($ch); curl_close($ch); And here is what I have for getting the link (it doesn't work, error is after the code): preg_match("http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv",$text,$matches); print_r($matches); Error: Warning: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash in /home/.hortense/halo2freeek/claninfectionist.com/misc/testing/curl.php on line 5 Can anyone help me please? Quote Link to comment Share on other sites More sharing options...
btherl Posted March 14, 2007 Share Posted March 14, 2007 perl regexps need a delimiter, like preg_match('/a/', $str, $matches); So try: preg_match("/http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv/",$text,$matches); Or you can change the delimiter to avoid all that escaping: preg_match("|http://files\.redvsblue\.com/RvB05/5x(.*)/fl4sh/(.*)\.flv|",$text,$matches); Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted March 14, 2007 Author Share Posted March 14, 2007 Ok, lemme try it, my server is being dumb right now, so I can't try it RIGHT now, but as soon as I get a chance, I will try that out. Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted March 14, 2007 Author Share Posted March 14, 2007 This returns an empty array, no error, but nothing in the array. Here is an example like that I will be pulling out: http://files.redvsblue.com/RvB05/5x91distract/fl4sh/RvB91_009.flv And these are the parts that change with every different link: http://files.redvsblue.com/RvB05/5x(*)/fl4sh/RvB(*).flv But when I use this code to parse it: preg_match("|http://files\.redvsblue\.com/RvB05/5x(.*)/fl4sh/(.*)\.flv|",$text,$matches); Or even this: preg_match("|http://(.*).flv|",$text,$matches); I get an empty array. What am I doing wrong? Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted March 14, 2007 Author Share Posted March 14, 2007 Bump! Quote Link to comment Share on other sites More sharing options...
per1os Posted March 14, 2007 Share Posted March 14, 2007 Wouldn't this be better placed in the Regex Forum? Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted March 15, 2007 Author Share Posted March 15, 2007 Why? Its not javascript, of course, I don't know about REGEX at all, but this is all php, I figured it would go in the php section. And I still can't get it to work, I've tried a lot of different things, and I can't seem to get it to work, here is my code: <?php $ch = curl_init("http://rvb.roosterteeth.com/archive/episode.php?id=244"); $text=curl_exec($ch); curl_close($ch); preg_match("/http:\/\/files\.redvsblue\.com\/RvB05\/5x(.*)\/fl4sh\/(.*)\.flv/",$text,$matches); print_r($matches); ?> I also don't want the code from the page to print on the page, is there a way to get it and set it to a variable without printing it out? (its the curl_exec that does it, but I can't find an alternative.) Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted March 15, 2007 Author Share Posted March 15, 2007 Ok, I have a library called snoopy that I can use to get the source code from the website and parse it, but I need to know how to parse it, the preg_replace I have been using doesn't work. I need some help people, please. Quote Link to comment Share on other sites More sharing options...
per1os Posted March 15, 2007 Share Posted March 15, 2007 I said regex, because the preg_match(); uses regular expressions. A lot of php coders are not very good at regex (such as myself) I can do it but i is always a guessing game for me. Quote Link to comment Share on other sites More sharing options...
HaLo2FrEeEk Posted March 15, 2007 Author Share Posted March 15, 2007 I got it, I used this code: <?php include('../snoopy.php'); $snoopy = new Snoopy; if($snoopy->fetch("http://rvb.roosterteeth.com/archive/episode.php?id=244")) $text = ($snoopy->results); preg_match("|http://files\.redvsblue\.com/RvB05/5x(.+?)/fl4sh/(.+?)\.flv|",$text, $url_arr); preg_match("|episodeNum=(.+?)&|", $text, $episodenum_arr); preg_match("|episode=(.+?)&|", $text, $episode_arr); $url = $url_arr[0]; $episodenum = $episodenum_arr[1]; $episode = ucwords($episode_arr[1]); echo "Download Red vs. Blue, " . $episodenum . ": " . $episode . ", in High Res flash video format: <a href=\"" . $url . "\">Here</a>."; ?> Snoopy is a free library that uses CURL to store the page in a variable without actually printing out the page. And here is the result: http://claninfectionist.com/misc/testing/curl.php I will change it now to make it so that the person can put in their own url and it will get it for them, the reason for this is that Rooster Teeth uses flash video format files to stream their videos over the net, but also allows non subscribers to download lo res versions in wmv, I found out that these flv's are high(er) res, and better quality, so I began downloading them the hard way (look at the code, then use a script that I made that will make a link for you) and downloading it that way, it worked, but it was slow, now I can put in the url of the page itself and this code will get the flv url and make a link for me. Thank you everyone that helped me, I appreciate it. I also figured out regular expression statements, I think, to a degree, so I could also get the episode number and episode name from this pag as well as the url. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.