djtozz Posted September 30, 2009 Share Posted September 30, 2009 Hi, I'm creating a crawler for megaupload.com downloadlinks. sample link: http://www.megaupload.com/?d=SFMTFBRV Currently I'm not using the correct pattern, I'm only getting a part of the url 'http://www.megaupload.com/?d' get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com\/\?(\d+)/"); Can somebody advice me how to use the correct pattern? Thanks Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/ Share on other sites More sharing options...
djtozz Posted September 30, 2009 Author Share Posted September 30, 2009 anybody? Thanks Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-927597 Share on other sites More sharing options...
thebadbad Posted September 30, 2009 Share Posted September 30, 2009 '~(?:http://)?(?:www\.)?megaupload\.com/\?d=[0-9a-z]{8}~i' Assuming the ID consists of a-z, A-Z and/or 0-9, and that it's always 8 in length. Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-927599 Share on other sites More sharing options...
dreamwest Posted September 30, 2009 Share Posted September 30, 2009 $html = file_get_contents('http://www.megaupload.com'); //d=SFMTFBRV preg_match_all('~d\s?=\s?(.*?)~is', $html, $matches); foreach ($matches[1] as $link) { $link = trim($link); echo "http://www.megaupload.com/?{$link}<br>"; } Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-927604 Share on other sites More sharing options...
djtozz Posted September 30, 2009 Author Share Posted September 30, 2009 '~(?:http://)?(?:www\.)?megaupload\.com/\?d=[0-9a-z]{8}~i' Assuming the ID consists of a-z, A-Z and/or 0-9, and that it's always 8 in length. Thanks for the help! I think I made a little typo in LINE 3 while integrating in my script, because I'm getting following error: (Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '\' ) The others are working fine! get_urls_by_kwd("\"rapidshare.com/files\" ".$row[1],"/rapidshare\.com\/files\/(\d+)\/([^\'^\"^\s^>^<^\\^\/]+)/",1); get_urls_by_kwd("\"badongo.com/file\" ".$row[1],"/badongo\.com\/file\/(\d+)/",2); get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com/\?d=[0-9a-z]{8}~i/",3); get_urls_by_kwd("\"sendspace.com/file\" ".$row[1],"/sendspace\.com\/file\/(\w+)/",4); get_urls_by_kwd("\"4shared.com/file\" ".$row[1],"/4shared\.com\/file\/(\d+)\/(\w+)\/([^\'^\"^\s^>^<^\\^\/]+)/",5); Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-927625 Share on other sites More sharing options...
djtozz Posted September 30, 2009 Author Share Posted September 30, 2009 $html = file_get_contents('http://www.megaupload.com'); //d=SFMTFBRV preg_match_all('~d\s?=\s?(.*?)~is', $html, $matches); foreach ($matches[1] as $link) { $link = trim($link); echo "http://www.megaupload.com/?{$link}<br>"; } Thanks for the feedback, but I'm not sure how to integrate it in my current code: Since the code is already working for the other file sharing sites, I guess I only need to change the patern in line 3: get_urls_by_kwd("\"rapidshare.com/files\" ".$row[1],"/rapidshare\.com\/files\/(\d+)\/([^\'^\"^\s^>^<^\\^\/]+)/",1); get_urls_by_kwd("\"badongo.com/file\" ".$row[1],"/badongo\.com\/file\/(\d+)/",2); get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com/\?d=[0-9a-z]{8}~i/",3); get_urls_by_kwd("\"sendspace.com/file\" ".$row[1],"/sendspace\.com\/file\/(\w+)/",4); get_urls_by_kwd("\"4shared.com/file\" ".$row[1],"/4shared\.com\/file\/(\d+)\/(\w+)\/([^\'^\"^\s^>^<^\\^\/]+)/",5); I'm not shure how to. Thanks Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-927631 Share on other sites More sharing options...
thebadbad Posted October 1, 2009 Share Posted October 1, 2009 get_urls_by_kwd("\"megaupload.com/?d=\" ".$row[1],"/megaupload\.com\/\?d=[0-9a-z]{8}/i",3); I was using ~ as pattern delimiters, and you're using /. Fixed that. The i modifier makes the search case in-sensitive. Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-928268 Share on other sites More sharing options...
redarrow Posted October 1, 2009 Share Posted October 1, 2009 i think 0,10) better for the end... ~ as pattern delimiters (why was it changed... Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-928275 Share on other sites More sharing options...
thebadbad Posted October 1, 2009 Share Posted October 1, 2009 i think 0,10) better for the end... ~ as pattern delimiters (why was it changed... Why?? Megaupload IDs are always 8 chars long AFAIK. I swapped to slashes since the OP are using them in the rest of the patterns (to be less confusing). Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-928283 Share on other sites More sharing options...
djtozz Posted October 1, 2009 Author Share Posted October 1, 2009 Thank you guys for the help! it Works like a charm! Link to comment https://forums.phpfreaks.com/topic/176042-solved-preg-match-url/#findComment-928317 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.