djtozz Posted September 30, 2009 Share Posted September 30, 2009 I have a link checker script that can check if files hosted on sendspace are still availble or not. If so, it grabs the "filename" and "size" and write that down. I made some modifications, planning to use it for files hosted by hotfile.com But.... The link checker works, however I"m not receiving "Filename" and "Filesize) So I guess I"m just grabbing the wrong peace of code from the downloadpages. I'm not very good with the preg_match function.. so could use some help. It's a pretty small script, so can't be that hard :-) So basicly I just would like to get Filename and file size from those pages. }if($row[2]==2) // Hotfile check { $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; $index=getpage($row[1]); preg_match_all('/(<span .*>)(.*)(<\/span>)/',$index,$match); $desc=array(); for($i=0;$i<3;$i++) if(isset($match[1][$i])) $desc[$match[1][$i]]=trim($match[2][$i]); unset($match); if(strpos($index,"Downloading")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { $words=trim($desc['Name']); $words=preg_split("/[_\.\-\s]/",$words); $lastword=array_pop($words); $words=implode(" ",$words); $words=preg_replace("/\s{2,}/"," ",$words); $caption=mysql_real_escape_string($words); unset($words); print "$caption :: ".$desc['Size']."\n"; logstr("log-c.txt","$caption :: ".$desc['Size']."\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$desc['Size']."',`caption`='$caption' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; } } Sample download link: http://hotfile.com/dl/13717843/4c6dfad/cl_backup.part3.rar.html Thank you! Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/ Share on other sites More sharing options...
djtozz Posted September 30, 2009 Author Share Posted September 30, 2009 Anybody please? The link check part works, I just want to fetch the filename ($caption) from the page, current code: $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); $caption=0; if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$caption) { print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { unset($words); print "$caption"; Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928021 Share on other sites More sharing options...
leafer Posted September 30, 2009 Share Posted September 30, 2009 Anybody please? The link check part works, I just want to fetch the filename ($caption) from the page, current code: $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); $caption=0; if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$caption) { print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { unset($words); print "$caption"; Is this the part your trying to match on the page: Downloading cl_backup.part3.rar | 95.8Mb from this code? <table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span></td></tr></table> Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928035 Share on other sites More sharing options...
djtozz Posted September 30, 2009 Author Share Posted September 30, 2009 Anybody please? The link check part works, I just want to fetch the filename ($caption) from the page, current code: $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); $caption=0; if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$caption) { print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { unset($words); print "$caption"; Is this the part your trying to match on the page: Downloading cl_backup.part3.rar | 95.8Mb from this code? <table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span></td></tr></table> Thanks for your help, Yes, that's the part I would like to use, and if possible I would like to split the Filename and filesize into 2 variables Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928043 Share on other sites More sharing options...
leafer Posted October 1, 2009 Share Posted October 1, 2009 Anybody please? The link check part works, I just want to fetch the filename ($caption) from the page, current code: $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); $caption=0; if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$caption) { print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { unset($words); print "$caption"; Is this the part your trying to match on the page: Downloading cl_backup.part3.rar | 95.8Mb from this code? <table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span></td></tr></table> Thanks for your help, Yes, that's the part I would like to use, and if possible I would like to split the Filename and filesize into 2 variables For the double variable action you owe me a beer though Here's one I whipped up quick and tested using preg_match: $pattern = '/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/'; preg_match($pattern, $result, $output); Array ( [0] => <table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span> [1] => cl_backup.part3.rar [2] => 95.8Mb Then just grab $output[1] and $output[2] and put them wherever you want. That info will always be in those slots. Honestly I'm terrible at regex but what I always attempt is to grab the entire line first and work my way inward. The brackets you see above causes regex to spit any info found within that portion into the [1] and [2] you see above. The 0 will always be the entire match using that ugly statement I made above. Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928054 Share on other sites More sharing options...
djtozz Posted October 1, 2009 Author Share Posted October 1, 2009 Then just grab $output[1] and $output[2] and put them wherever you want. That info will always be in those slots. Thanks for the help, I think I own you already more than a Beer :-) Your pattern works like a charm, Excactly what I needed, but I'm not sure how to intergrate it in my current code, Wen I replace my original pattern by yours, then I'm getting the data... but the link check part won't work anymore So I'm not sure where to paste it... The ony 2 variables I need are $caption=$output[1] and $fsize=$output[2] $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { print "$output[1] :: $output[2]\n"; logstr("log-c.txt","$output[1] :: $output[2]\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928455 Share on other sites More sharing options...
leafer Posted October 1, 2009 Share Posted October 1, 2009 Do you want to check if the link exists: <a href="http://hotfile.com/dl/13717843/4c6dfad/cl_backup.part3.rar.html?uploadid=13717843&fname=cl_backup.part3.rar.html&hash=4c6dfad&lang=it">Italian</a> Or turn what you've got into a link to then begin a download. Basically simulating a click on regular download? http://hotfile.com/get/13717843/4ac4ddbd/24c4d38/cl_backup.part3.rar If thats the case you need to grab all of the form values: <table border="0" cellspacing=0 cellpadding=2 class=premtable2 style="margin: 0 auto 10px auto; width: 640px;"> <form style="margin:0;padding:0;" action="/dl/13717843/4c6dfad/cl_backup.part3.rar.html" method=post name=f> <input type=hidden name=action value=capt> <input type=hidden name=tm value=1254416003> <input type=hidden name=tmhash value=0a193d19d3dec26a23c477237175da1de5be0a90> <input type=hidden name=wait value=30> <input type=hidden name=waithash value=baca488ee0ae179f444ad6f97b6d25ef0c4d5c22> <tr> <td style="width:267px;"> </td> <td align=center style="width:188px; padding: 0;"><input type=button class=but value="HIGH SPEED DOWNLOAD" style="width:162px; margin: 0; height: 32px; padding: 5px 5px 6px 5px;" onclick="location='/premium.html?id=13717843'"></td> <td align=center style="width:185px; padding: 0;"><input type=button class="but" value=" REGULAR DOWNLOAD " style="width:162px; margin: 0; height: 32px; padding: 5px 5px 6px 5px;" onclick="starttimer();"></td></tr> </table> Then craft a post statement and wait for the link to come out. The LiveHTTPHeaders for firefox will help you find out the exact structure of the url. Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928537 Share on other sites More sharing options...
djtozz Posted October 1, 2009 Author Share Posted October 1, 2009 Do you want to check if the link exists: <a href="http://hotfile.com/dl/13717843/4c6dfad/cl_backup.part3.rar.html?uploadid=13717843&fname=cl_backup.part3.rar.html&hash=4c6dfad&lang=it">Italian</a> I want to keep it simple, no download, just check if a link exists, if so I want to parse filename and size and write that down. as you can see in the curent code.. the script first checks first the link is ok : if(strpos($index,"Downloading")===false-) if not ok it prints 'bad link' If link ok... else Then I would like to parse filename and filesize ( using your pattern) and write that down in the text file as you can see in my code. the link checking part works fine, I'm getting following output: [1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html bad link [2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov bad link [3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html :: [4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn bad link [5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html :: I just not getting the filename and filesize from the good links. So what I would like to see: [1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html bad link [2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov bad link [3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html 2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar :: 55MB [4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn bad link [5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html Please_Please_Me-Mono_2009_Remastered.part1.rar :: 75MB Here is my full code: }if($row[2]==2) // Hotfile check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { print "$output[1] :: $output[2]\n"; logstr("log-c.txt","$output[1] :: $output[2]\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; } } Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928552 Share on other sites More sharing options...
leafer Posted October 1, 2009 Share Posted October 1, 2009 Do you want to check if the link exists: <a href="http://hotfile.com/dl/13717843/4c6dfad/cl_backup.part3.rar.html?uploadid=13717843&fname=cl_backup.part3.rar.html&hash=4c6dfad&lang=it">Italian</a> I want to keep it simple, no download, just check if a link exists, if so I want to parse filename and size and write that down. as you can see in the curent code.. the script first checks first the link is ok : if(strpos($index,"Downloading")===false-) if not ok it prints 'bad link' If link ok... else Then I would like to parse filename and filesize ( using your pattern) and write that down in the text file as you can see in my code. the link checking part works fine, I'm getting following output: [1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html bad link [2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov bad link [3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html :: [4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn bad link [5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html :: I just not getting the filename and filesize from the good links. So what I would like to see: [1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html bad link [2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov bad link [3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html 2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar :: 55MB [4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn bad link [5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html Please_Please_Me-Mono_2009_Remastered.part1.rar :: 75MB Here is my full code: }if($row[2]==2) // Hotfile check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { print "$output[1] :: $output[2]\n"; logstr("log-c.txt","$output[1] :: $output[2]\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; } } Then add a curl routine to the good links found which visits the link, grabs the output and parses the filename and size. Basically: }if($row[2]==2) // Hotfile check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { 1) Curl good link and return output 2) Then use my pregmatch to grab the file name and size 3) Then print "$filename $output1 $output2" 4) Insert into DB print "$output[1] :: $output[2]\n"; logstr("log-c.txt","$output[1] :: $output[2]\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; } } If you want. PM me the entire code your using with all of the sensitive data removed (including curl statement). Firing off the second curl statement should work though. My method for that would be: 1) Pulls links from DB. 2) Send curl statement 3) If link not correct return "bad link" 4) If link good return output, pregmatch the necessary info. Actually you shouldn't need the 2nd statement. Post the actual curl statement your using and from there I can easily give you the routine. Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928585 Share on other sites More sharing options...
leafer Posted October 1, 2009 Share Posted October 1, 2009 Here's a general function I use for simple curl checks. I'll remove the function part of it. $url = "http://whatever.com"; $agent - "Mozilla/5.0 (Windows; U; Windows NT 5.2 en-US;rv:1.9.0.7)Gecko/2009021910 Firefox/3.0.7"; $ch = curl_init(); curl_setopt($ch, CURLOPT_FAILONERROR, 1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch , CURLOPT_CONNECTTIMEOUT, 10); curl_setopt($ch, CURLOPT_HEADER, 1); curl_setopt($ch, CURLOPT_TIMEOUT, 10); curl_setopt($ch, CURLOPT_USERAGENT, $agent); curl_setopt($ch, CURLOPT_URL, $url); $result = curl_exec($ch); curl_close($ch); $pattern = '/(?s)\<item\>.*?\<\/item\>/'; preg_match ($pattern, $result, $output); //ADD IN THE VARS TO GRAB EACH PORTION if $output is empty then = bad link. Else put vars into appropriate places Instead of that pattern, put the one I gave you and move the data into the appropriate vars. Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928591 Share on other sites More sharing options...
djtozz Posted October 2, 2009 Author Share Posted October 2, 2009 If you want. PM me the entire code your using with all of the sensitive data removed... Hey Leafer, First.... I wanne drop you a "BIG" THANK YOU! and to be honest.... I really appreciate your help! I don't know much about php and programming, but euhm.... I'm trying...! I spend the past few days several hours by reading tutorials about preg_match ect... before answering your posts... but it's hard for me... I hope that you understand that too! Now, about my script... the first part is a crawler that can collect different links from the major filesharing sites like rapidshare, sendspace, 4shared ect.... it puts all links in a database... (working fine) The second part of the script checks if the fetched links are still ok. When Bad link: "bad link" is displayed in log file and mysql table 'checked' will get value -1 When Good link: The script must catch filename and filesize from downloadpage.. and mysql table 'checked' will get value 1 The scripts works like a charm for rapidshare, sendspace .... but Hotfile.com was not supported. I made a copy af the "sendspace" check function and made some modifications to make it 'hotfile.com' compatible... and here's my problem... it can check if a hotfile link is valid or not, however... I cannot get Filename and Filesize to add them to my mysql db... The pattern you made, should do the job I think, I'm just having problems to integrate it with te linkcheck part... I will post my full code below, this can make things easyer... So basicly.. everything works... except that I don't retrieve filename and filesize for files on hotfile... the hotfile check starts at line 70 and ends at 89 #!/usr/bin/php <?php chdir("/path_where_script_is_installed/search/cron/"); set_time_limit(1000); ignore_user_abort(true); include "./../functions.php"; include "./../config.php"; if(file_exists("run-c.flag")) { exec("rm run-c.flag"); sleep(1); if(file_exists("run-c.flag")) exit("can't clean run-c.flag"); sleep(60); } exec("rm log-c.txt"); logstr("log-c.txt","[start]".date("Y-m-d H:i:s")."[/start] \n"); touch("run-c.flag"); print "<pre>\n"; //mysql_query("UPDATE `v2links` SET checked=0"); $query=mysql_query("SELECT `id`,`url`,`type` FROM `v2links` WHERE `checked`>=0 ORDER BY `lastcheck`,RAND() LIMIT 10000"); print mysql_error(); $counter=0; while($row=mysql_fetch_row($query)) { if(!trim($row[1])) continue; // delay if type the same if($prevtype==$row[2]) sleep(mt_rand(1,2)); $prevtype=$row[2]; print "[".$counter++."] ".$row[1]."\n"; logstr("log-c.txt","[$counter] ".$row[1]."\n"); if($row[2]==1) // rapidshare check { $index=getpage($row[1]); if(strpos($index,"The file could not be found. Please check the download link.")===false && strpos($index,"Due to a violation of our terms of use, the file has been removed from the server.")===false) { preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match); //print $index; if($match[1]) { $fpath=$match[1]; $index=getpage($fpath,"dl.start=Free",$row[1]); preg_match("#<font[^>]*?>(.*?)<\/font>#",$index,$match); $fsize=0; if($match[1]) $fsize= substr(mysql_real_escape_string(strip_tags($match[1])), 2); print $fsize."\n"; logstr("log-c.txt",$fsize."\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`fsize`='$fsize',`lastcheck`=NOW() WHERE `id`=".$row[0]); if(mysql_errno()) print mysql_error()."\n"; } else { print "bad link\n"; logstr("log-c.txt","bad link\n"); mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); if(mysql_errno()) print mysql_error()."\n"; } } else { print "bad link\n"; logstr("log-c.txt","bad link\n"); mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); if(mysql_errno()) print mysql_error()."\n"; } }if($row[2]==2) // Hotfile check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match); if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1])); if(strpos($index,"Downloading")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { print "$output[1] :: $output[2]\n"; logstr("log-c.txt","$output[1] :: $output[2]\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; } } if($row[2]==3) // sendspace check { $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; $index=getpage($row[1]); preg_match_all("/<b>(\w+)\:<\/b>([^<]+)<br>/",$index,$match); $desc=array(); for($i=0;$i<3;$i++) if(isset($match[1][$i])) $desc[$match[1][$i]]=trim($match[2][$i]); unset($match); if(strpos($index,"The download link is located below")===false || !$desc) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { $words=trim($desc['Name']); $words=preg_split("/[_\.\-\s]/",$words); $lastword=array_pop($words); $words=implode(" ",$words); $words=preg_replace("/\s{2,}/"," ",$words); $caption=mysql_real_escape_string($words); unset($words); print "$caption :: ".$desc['Size']."\n"; logstr("log-c.txt","$caption :: ".$desc['Size']."\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$desc['Size']."',`caption`='$caption' WHERE `id`=".$row[0]); unset($desc); if(mysql_errno()) print mysql_error()."\n"; } } if($row[2]==4) // badongo check { $row[1]="http://www.".$row[1]; //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; //print_r($row); $index=getpage($row[1]); preg_match_all("/<td> ([^<]+)<\/td>/",$index,$match); if(strpos($index,"This file has been deleted because it has been inactive for over 30 days")!==false || strpos($index,"This file has been removed due to copyright infrigment")!==false || strpos($index,"File deactivated!")!==false || !$match[1]) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { //print_r($match[1]); if(strlen($match[1][0])>strlen($match[1][3])) { $words=trim($match[1][0]); $words=preg_split("/[_\.\-\s]/",$words); $lastword=array_pop($words); if($lastword=="html") array_pop($words); $words=implode(" ",$words); $words=preg_replace("/\s{2,}/"," ",$words); $caption=mysql_real_escape_string($words); unset($words); } else { $words=$match[1][3]; $words=preg_split("/[_\.\-\s]/",$words); $words=implode(" ",$words); $words=preg_replace("/\s{2,}/"," ",$words); $caption=mysql_real_escape_string($words); unset($words); } print "$caption :: ".$match[1][1]."\n"; logstr("log-c.txt","$caption :: ".$match[1][1]."\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$match[1][1]."',`caption`='$caption' WHERE `id`=".$row[0]); unset($match); if(mysql_errno()) print mysql_error()."\n"; } } if($row[2]==5) // mediafire check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); //print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; preg_match("/You requested ([^\(]+)\(([^\)]+)\)<\/div>/",$index,$match); if(strpos($index," Please enter the reason for reporting this file:")===false || !$match[1]) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { $words=trim($match[1]); $words=preg_split("/[_\.\-\s]/",$words); $lastword=array_pop($words); $words=implode(" ",$words); $words=preg_replace("/\s{2,}/"," ",$words); $caption=mysql_real_escape_string($words); unset($words); print "$caption :: ".$match[2]."\n"; logstr("log-c.txt","$caption :: ".$match[2]."\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$match[2]."',`caption`='$caption' WHERE `id`=".$row[0]); unset($match); if(mysql_errno()) print mysql_error()."\n"; } } if($row[2]==6) // 4shared check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); preg_match("/Download ([^<]+)<\/div>/",$index,$match); if(strpos($index,"Kaspersky Anti-Virus")===false || !$match) { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { $words=trim($match[1]); $words=preg_split("/[_\.\-\s]/",$words); $lastword=array_pop($words); $words=implode(" ",$words); $words=preg_replace("/\s{2,}/"," ",$words); $caption=mysql_real_escape_string($words); unset($words); unset($match); preg_match("/<td>Size\:<\/td><td>([^<]+)<\/td>/",$index,$match); $fsize=$match[1]; unset($match); print "$caption :: $fsize\n"; logstr("log-c.txt","$caption :: $fsize\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$fsize',`caption`='$caption' WHERE `id`=".$row[0]); if(mysql_errno()) print mysql_error()."\n"; } } if(!file_exists("run-c.flag")) exit("run-c.flag was deleted\n"); } print "</pre>\n"; exec("rm run-c.flag"); logstr("log-c.txt","[end]".date("Y-m-d H:i:s")."[/end] \n"); ?> I can give you ftp acces if this could make things easyer.. Thanks again! Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928790 Share on other sites More sharing options...
leafer Posted October 2, 2009 Share Posted October 2, 2009 I can give you ftp acces if this could make things easyer.. Thanks again! The perfect word to explain regex is torture. It's an extremely powerful tool as long as your willing to lose a bit of hair in the process. Anyways, I'll assume the getpage() function is a curl call and your passing the url to it. I think I see your problem here and its a bit ironic. Remember I was asking you if you wanted to download the file? Thats exactly what it's trying to here by downloading the form code once it's successful: preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match); //print $index; See that. Its grabbing the form code when it should be going directly to the regex I gave you. So put in my regex into that place like so: $index=getpage($row[1]); if(strpos($index,"The file could not be found. Please check the download link.")===false && strpos($index,"Due to a violation of our terms of use, the file has been removed from the server.")===false) { preg_match("/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/",$index,$match); //HERE Grab the $match[1] and $match which are the values we spoke about before THEN Insert into the DB or output to a txt file. Erase all of this: //print $index; if($match[1]) { $fpath=$match[1]; $index=getpage($fpath,"dl.start=Free",$row[1]); preg_match("#<font[^>]*?>(.*?)<\/font>#",$index,$match); $fsize=0; if($match[1]) $fsize= substr(mysql_real_escape_string(strip_tags($match[1])), 2); print $fsize."\n"; logstr("log-c.txt",$fsize."\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`fsize`='$fsize',`lastcheck`=NOW() WHERE `id`=".$row[0]); if(mysql_errno()) print mysql_error()."\n"; } } else { print "bad link\n"; logstr("log-c.txt","bad link\n"); mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); if(mysql_errno()) print mysql_error()."\n"; Reason I'm telling you to erase that part is because you mentioned you werent interesting in downloading anything just grabbing the link, file size and filename. That's it. Besides I wouldn't use that code anyhow and prefer a regex over it. Anyways removing all of that code should make it work especially since your first regex statement is NOT looking for a form but rather those values you wanted. Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-928906 Share on other sites More sharing options...
djtozz Posted October 2, 2009 Author Share Posted October 2, 2009 The perfect word to explain regex is torture. It's an extremely powerful tool as long as your willing to lose a bit of hair in the process. I noticed! except... I probably already lost more then a bit Thanks again for the info, you probably switched the "rapidshare file check" part with Hotfile. preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match); //print $index; is only used for rapidshare in my code... but.. Your explanation was exactly what I needed! I finally understand it and realize that I was started wrong! I started from scratch again and its working now :-) Hip Hip if($row[2]==2) // Hotfile Check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; if(strpos($index,"Downloading")===false) //check if page contains the word downloading if not = bad link { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { $pattern = '/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/'; preg_match($pattern, $index, $output); $caption=$output[1]; $fsize=$output[2]; print "$caption :: $fsize\n"; logstr("log-c.txt","$caption :: $fsize\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$fsize',`caption`='$caption' WHERE `id`=".$row[0]); unset($match); if(mysql_errno()) print mysql_error()."\n"; } } Works link a charm like this! Thanks again for your time and if you have a paypall email please pm me... I'm gonna pay you that BEER! Promised! Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-929050 Share on other sites More sharing options...
leafer Posted October 2, 2009 Share Posted October 2, 2009 The perfect word to explain regex is torture. It's an extremely powerful tool as long as your willing to lose a bit of hair in the process. I noticed! except... I probably already lost more then a bit Thanks again for the info, you probably switched the "rapidshare file check" part with Hotfile. preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match); //print $index; is only used for rapidshare in my code... but.. Your explanation was exactly what I needed! I finally understand it and realize that I was started wrong! I started from scratch again and its working now :-) Hip Hip if($row[2]==2) // Hotfile Check { $row[1]="http://www.".$row[1]; $index=getpage($row[1]); print "<a href=\"".$row[1]."\">".$row[1]."</a>\n"; if(strpos($index,"Downloading")===false) //check if page contains the word downloading if not = bad link { mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]); print "bad link\n"; logstr("log-c.txt","bad link\n"); } else { $pattern = '/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/'; preg_match($pattern, $index, $output); $caption=$output[1]; $fsize=$output[2]; print "$caption :: $fsize\n"; logstr("log-c.txt","$caption :: $fsize\n"); mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$fsize',`caption`='$caption' WHERE `id`=".$row[0]); unset($match); if(mysql_errno()) print mysql_error()."\n"; } } Works link a charm like this! Thanks again for your time and if you have a paypall email please pm me... I'm gonna pay you that BEER! Promised! Glad to hear it worked out for you. As far as the beer goes I only kid. As a fellow noob I enjoy giving back whenever I can. Have a good one. Quote Link to comment https://forums.phpfreaks.com/topic/176074-solved-help-with-preg_match/#findComment-929289 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.