Jump to content

[SOLVED] preg match url


djtozz

Recommended Posts

Hi,

 

I'm creating a crawler for megaupload.com downloadlinks.

 

sample link:

http://www.megaupload.com/?d=SFMTFBRV

 

Currently I'm not using the correct pattern,  I'm only getting a part of the url 'http://www.megaupload.com/?d'

 

get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com\/\?(\d+)/");

 

Can somebody advice me how to use the correct pattern?

Thanks

Link to comment
https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/
Share on other sites

'~(?:http://)?(?:www\.)?megaupload\.com/\?d=[0-9a-z]{8}~i'

 

Assuming the ID consists of a-z, A-Z and/or 0-9, and that it's always 8 in length.

 

Thanks for the help!

I think I made a little typo in LINE 3 while integrating in my script, because I'm getting following error:

 

(Warning:  preg_match_all() [function.preg-match-all]: Unknown modifier '\' )

 

The others are working fine!

 

	get_urls_by_kwd("\"rapidshare.com/files\" ".$row[1],"/rapidshare\.com\/files\/(\d+)\/([^\'^\"^\s^>^<^\\^\/]+)/",1);
get_urls_by_kwd("\"badongo.com/file\" ".$row[1],"/badongo\.com\/file\/(\d+)/",2);
get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com/\?d=[0-9a-z]{8}~i/",3);
get_urls_by_kwd("\"sendspace.com/file\" ".$row[1],"/sendspace\.com\/file\/(\w+)/",4);
get_urls_by_kwd("\"4shared.com/file\" ".$row[1],"/4shared\.com\/file\/(\d+)\/(\w+)\/([^\'^\"^\s^>^<^\\^\/]+)/",5);

 

$html = file_get_contents('http://www.megaupload.com');

//d=SFMTFBRV
preg_match_all('~d\s?=\s?(.*?)~is', $html, $matches);

foreach ($matches[1] as $link) {

$link = trim($link);
echo "http://www.megaupload.com/?{$link}<br>";
}

 

Thanks for the feedback, but I'm not sure how to integrate it in my current code:

Since the code is already working for the other file sharing sites, I guess I only need to change the patern in line 3:

 

get_urls_by_kwd("\"rapidshare.com/files\" ".$row[1],"/rapidshare\.com\/files\/(\d+)\/([^\'^\"^\s^>^<^\\^\/]+)/",1);
get_urls_by_kwd("\"badongo.com/file\" ".$row[1],"/badongo\.com\/file\/(\d+)/",2);
get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com/\?d=[0-9a-z]{8}~i/",3);
get_urls_by_kwd("\"sendspace.com/file\" ".$row[1],"/sendspace\.com\/file\/(\w+)/",4);
get_urls_by_kwd("\"4shared.com/file\" ".$row[1],"/4shared\.com\/file\/(\d+)\/(\w+)\/([^\'^\"^\s^>^<^\\^\/]+)/",5);

 

I'm not shure  how to.

Thanks

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.