Jump to content

[SOLVED] Help with Preg_match


djtozz

Recommended Posts

I have a link checker script that can check if files hosted on sendspace are still availble or not.

If so, it grabs the "filename" and "size" and write that down.

 

I made some modifications, planning to use it for files hosted by hotfile.com

 

But.... The link checker works, however I"m not receiving "Filename" and "Filesize)

 

So I guess I"m just grabbing the wrong peace of code from the downloadpages.

I'm not very good with the preg_match function.. so could use some help.

 

It's a pretty small script, so can't be that hard :-)

So basicly I just would like to get Filename and file size from those pages.

 

}if($row[2]==2) // Hotfile check
{
	$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";

	$index=getpage($row[1]);
	preg_match_all('/(<span .*>)(.*)(<\/span>)/',$index,$match);
               $desc=array();
	for($i=0;$i<3;$i++) if(isset($match[1][$i])) $desc[$match[1][$i]]=trim($match[2][$i]);
	unset($match);
	if(strpos($index,"Downloading")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		$words=trim($desc['Name']);
		$words=preg_split("/[_\.\-\s]/",$words);
		$lastword=array_pop($words);
		$words=implode(" ",$words);
		$words=preg_replace("/\s{2,}/"," ",$words);
		$caption=mysql_real_escape_string($words);
		unset($words);
		print "$caption :: ".$desc['Size']."\n";
		logstr("log-c.txt","$caption :: ".$desc['Size']."\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$desc['Size']."',`caption`='$caption' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

 

Sample download link:

http://hotfile.com/dl/13717843/4c6dfad/cl_backup.part3.rar.html

 

Thank you!

 

Link to comment
Share on other sites

Anybody please?

 

The link check part works, I just want to fetch the filename  ($caption) from the page, current code:

		$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";

	$index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
                $caption=0;
	if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$caption)
	{
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{

	unset($words);
        print "$caption";

Link to comment
Share on other sites

Anybody please?

 

The link check part works, I just want to fetch the filename  ($caption) from the page, current code:

		$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";

	$index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
                $caption=0;
	if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$caption)
	{
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{

	unset($words);
        print "$caption";

 

Is this the part your trying to match on the page: Downloading cl_backup.part3.rar | 95.8Mb

 

from this code?

<table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span></td></tr></table>

Link to comment
Share on other sites

Anybody please?

 

The link check part works, I just want to fetch the filename  ($caption) from the page, current code:

		$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";

	$index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
                $caption=0;
	if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$caption)
	{
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{

	unset($words);
        print "$caption";

 

Is this the part your trying to match on the page: Downloading cl_backup.part3.rar | 95.8Mb

 

from this code?

<table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span></td></tr></table>

 

Thanks for your help,

 

Yes, that's the part I would like to use, and if possible I would like to split the Filename and filesize into 2 variables

Link to comment
Share on other sites

Anybody please?

 

The link check part works, I just want to fetch the filename  ($caption) from the page, current code:

		$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";

	$index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
                $caption=0;
	if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$caption)
	{
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{

	unset($words);
        print "$caption";

 

Is this the part your trying to match on the page: Downloading cl_backup.part3.rar | 95.8Mb

 

from this code?

<table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span></td></tr></table>

 

Thanks for your help,

 

Yes, that's the part I would like to use, and if possible I would like to split the Filename and filesize into 2 variables

 

For the double variable action you owe me a beer though ;)

 

Here's one I whipped up quick and tested using preg_match:

 

$pattern = '/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/';

preg_match($pattern, $result, $output);

 

Array
(
    [0] => <table class="downloading"><tr><td>Downloading <b>cl_backup.part3.rar</b> <span class="size">| 95.8Mb</span>
    [1] => cl_backup.part3.rar
    [2] => 95.8Mb

 

Then just grab $output[1] and $output[2] and put them wherever you want. That info will always be in those slots.

 

Honestly I'm terrible at regex but what I always attempt is to grab the entire line first and work my way inward. The brackets you see above causes regex to spit any info found within that portion into the [1] and [2] you see above. The 0 will always be the entire match using that ugly statement I made above.

Link to comment
Share on other sites

Then just grab $output[1] and $output[2] and put them wherever you want. That info will always be in those slots.

 

Thanks for the help, I think I own you already more than a Beer :-)

Your pattern works like a charm, Excactly what I needed,

 

but I'm not sure how to intergrate it in my current code, Wen I replace my original pattern by yours, then I'm getting the data... but the link check part won't work anymore :(

 

 

So I'm not sure where to paste it...

The ony 2 variables I need are $caption=$output[1] and $fsize=$output[2]

 

$index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
	if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		print "$output[1] :: $output[2]\n";
		logstr("log-c.txt","$output[1] :: $output[2]\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";

Link to comment
Share on other sites

Do you want to check if the link exists:

 

<a href="http://hotfile.com/dl/13717843/4c6dfad/cl_backup.part3.rar.html?uploadid=13717843&fname=cl_backup.part3.rar.html&hash=4c6dfad&lang=it">Italian</a>

 

Or turn what you've got into a link to then begin a download.  Basically simulating a click on regular download?

 

http://hotfile.com/get/13717843/4ac4ddbd/24c4d38/cl_backup.part3.rar

 

If thats the case you need to grab all of the form values:

 

<table border="0" cellspacing=0 cellpadding=2 class=premtable2 style="margin: 0 auto 10px auto; width: 640px;">

<form style="margin:0;padding:0;" action="/dl/13717843/4c6dfad/cl_backup.part3.rar.html" method=post name=f>

<input type=hidden name=action value=capt>

<input type=hidden name=tm value=1254416003>

<input type=hidden name=tmhash value=0a193d19d3dec26a23c477237175da1de5be0a90>

 

<input type=hidden name=wait value=30>

<input type=hidden name=waithash value=baca488ee0ae179f444ad6f97b6d25ef0c4d5c22>

<tr>

<td style="width:267px;"> </td>

<td align=center style="width:188px; padding: 0;"><input type=button class=but value="HIGH SPEED DOWNLOAD" style="width:162px; margin: 0; height: 32px; padding: 5px 5px 6px 5px;" onclick="location='/premium.html?id=13717843'"></td>

<td align=center style="width:185px; padding: 0;"><input type=button class="but" value=" REGULAR DOWNLOAD " style="width:162px; margin: 0; height: 32px; padding: 5px 5px 6px 5px;" onclick="starttimer();"></td></tr>

</table>

 

Then craft a post statement and wait for the link to come out. The LiveHTTPHeaders for firefox will help you find out the exact structure of the url.

 

 

Link to comment
Share on other sites

 

I want to keep it simple, no download, just check if a link exists, if so I want to parse filename and size and write that down.

 

as you can see in the curent code..

the script first checks first the link is ok :

 if(strpos($index,"Downloading")===false-)

if not ok it prints 'bad link'

 

 

If link ok...

else

Then I would like to parse filename and filesize ( using your pattern) and  write that down in the text file as you can see in my code.

 

 

the link checking part works fine, I'm getting following output:

[1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html
bad link
[2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov
bad link
[3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html
:: 
[4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn
bad link
[5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html
:: 

 

 

I just not getting the filename and filesize from the good links.

So what I would like to see:

[1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html
bad link
[2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov
bad link
[3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html
2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar :: 55MB
[4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn
bad link
[5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html
Please_Please_Me-Mono_2009_Remastered.part1.rar :: 75MB

 

 

 

Here is my full code:

 

	}if($row[2]==2) // Hotfile check
{
	$row[1]="http://www.".$row[1];
                $index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
	if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		print "$output[1] :: $output[2]\n";
		logstr("log-c.txt","$output[1] :: $output[2]\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

Link to comment
Share on other sites

 

I want to keep it simple, no download, just check if a link exists, if so I want to parse filename and size and write that down.

 

as you can see in the curent code..

the script first checks first the link is ok :

 if(strpos($index,"Downloading")===false-)

if not ok it prints 'bad link'

 

 

If link ok...

else

Then I would like to parse filename and filesize ( using your pattern) and  write that down in the text file as you can see in my code.

 

 

the link checking part works fine, I'm getting following output:

[1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html
bad link
[2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov
bad link
[3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html
:: 
[4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn
bad link
[5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html
:: 

 

 

I just not getting the filename and filesize from the good links.

So what I would like to see:

[1] hotfile.com/dl/6491715/ca36ee7/BURP_Kenshi.part02.rar.html
bad link
[2] hotfile.com/dl/10201361/1867cae/H.P.5.Order.of.Phoenix.2007.DvD.RiP.Hindi.By.Innov
bad link
[3] hotfile.com/dl/11825481/1c87a38/2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar.html
2009_-_Magical_Mystery_Tour_Stereo_Remaster.part2.rar :: 55MB
[4] hotfile.com/dl/10201433/68abce6/Harry.Potter.6.Half.Blood.Prince.2009.Hindi.By.Inn
bad link
[5] hotfile.com/dl/11629240/a9f4bec/Please_Please_Me-Mono_2009_Remastered.part1.rar.html
Please_Please_Me-Mono_2009_Remastered.part1.rar :: 75MB

 

 

 

Here is my full code:

 

	}if($row[2]==2) // Hotfile check
{
	$row[1]="http://www.".$row[1];
                $index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
	if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		print "$output[1] :: $output[2]\n";
		logstr("log-c.txt","$output[1] :: $output[2]\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

 

Then add a curl routine to the good links found which visits the link, grabs the output and parses the filename and size.

 

Basically:

 

	}if($row[2]==2) // Hotfile check
{
	$row[1]="http://www.".$row[1];
                $index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
	if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
                        1) Curl good link and return output 
                        2) Then use my pregmatch to grab the file name and size
                        3) Then print "$filename $output1 $output2"
                        4) Insert into DB

		print "$output[1] :: $output[2]\n";
		logstr("log-c.txt","$output[1] :: $output[2]\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

 

If you want. PM me the entire code your using with all of the sensitive data removed (including curl statement). Firing off the second curl statement should work though.

 

My method for that would be:

 

1) Pulls links from DB.

2) Send curl statement

3) If link not correct return "bad link"

4) If link good return output, pregmatch the necessary info.

 

Actually you shouldn't need the 2nd statement.

 

Post the actual curl statement your using and from there I can easily give you the routine.

Link to comment
Share on other sites

Here's a general function I use for simple curl checks. I'll remove the function part of it.

 

$url = "http://whatever.com";
$agent - "Mozilla/5.0 (Windows; U; Windows NT 5.2 en-US;rv:1.9.0.7)Gecko/2009021910 Firefox/3.0.7"; 

$ch = curl_init();
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch , CURLOPT_CONNECTTIMEOUT, 10);
        curl_setopt($ch, CURLOPT_HEADER, 1); 
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, $agent); 
curl_setopt($ch, CURLOPT_URL, $url);

$result = curl_exec($ch);
    
curl_close($ch);
         
$pattern = '/(?s)\<item\>.*?\<\/item\>/';
preg_match ($pattern, $result, $output);
        //ADD IN THE VARS TO GRAB EACH PORTION	

if $output is empty then = bad link. 

       Else put vars into appropriate places

 

Instead of that pattern, put the one I gave you and move the data into the appropriate vars.

Link to comment
Share on other sites

 

If you want. PM me the entire code your using with all of the sensitive data removed...

 

Hey Leafer,

First.... I wanne drop you a "BIG" THANK YOU! and to be honest.... I really appreciate your help!

I don't know much about php and programming, but euhm.... I'm trying...!  I spend the past few days several hours by reading tutorials about preg_match ect... before answering your posts... but it's hard for me... I hope that you understand that too!

 

Now, about my script... the first part is a crawler that can collect different links from the major filesharing sites like rapidshare, sendspace, 4shared ect.... it puts all links in a database... (working fine)

 

The second part of the script checks if the fetched links are still ok.

 

When Bad link: "bad link" is displayed in log file and mysql table 'checked' will get value -1

When Good link: The script must catch filename and filesize from downloadpage.. and mysql table 'checked' will get value 1

 

The scripts works like a charm for rapidshare, sendspace .... but Hotfile.com was not supported.

I made a copy af the "sendspace" check function and made some modifications to make it 'hotfile.com' compatible...

 

and here's my problem...  it can check if a hotfile link is valid or not, however...  I cannot get Filename and Filesize to add them to my mysql db...

 

The pattern you made, should do the job I think, I'm just having problems to integrate it with te linkcheck part...

 

I  will post my full code below, this can make things easyer...

So basicly.. everything works... except that I don't retrieve filename and filesize for files on hotfile... the hotfile check starts at line 70 and ends at 89

 

#!/usr/bin/php
<?php
chdir("/path_where_script_is_installed/search/cron/");
set_time_limit(1000);
ignore_user_abort(true);
include "./../functions.php";
include "./../config.php";


if(file_exists("run-c.flag"))
{
exec("rm run-c.flag");
sleep(1);
if(file_exists("run-c.flag")) exit("can't clean run-c.flag");
sleep(60);
}
exec("rm log-c.txt");
logstr("log-c.txt","[start]".date("Y-m-d H:i:s")."[/start] \n");
touch("run-c.flag");

print "<pre>\n";

//mysql_query("UPDATE `v2links` SET checked=0");
$query=mysql_query("SELECT `id`,`url`,`type` FROM `v2links` WHERE `checked`>=0 ORDER BY `lastcheck`,RAND() LIMIT 10000");
print mysql_error();
$counter=0;
while($row=mysql_fetch_row($query))
{
if(!trim($row[1])) continue;
// delay if type the same
if($prevtype==$row[2]) sleep(mt_rand(1,2));
$prevtype=$row[2];

print "[".$counter++."] ".$row[1]."\n";
logstr("log-c.txt","[$counter] ".$row[1]."\n");
if($row[2]==1) // rapidshare check
{
	$index=getpage($row[1]);
	if(strpos($index,"The file could not be found. Please check the download link.")===false && strpos($index,"Due to a violation of our terms of use, the file has been removed from the server.")===false)
	{
		preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match);
		//print $index;
		if($match[1])
		{
			$fpath=$match[1];
			$index=getpage($fpath,"dl.start=Free",$row[1]);
			preg_match("#<font[^>]*?>(.*?)<\/font>#",$index,$match);
			$fsize=0;
			if($match[1]) $fsize= substr(mysql_real_escape_string(strip_tags($match[1])), 2);
			print $fsize."\n";
			logstr("log-c.txt",$fsize."\n");
			mysql_query("UPDATE `v2links` SET `checked`='1',`fsize`='$fsize',`lastcheck`=NOW() WHERE `id`=".$row[0]);
			if(mysql_errno()) print mysql_error()."\n";
		}
		else
		{
			print "bad link\n";
			logstr("log-c.txt","bad link\n");
			mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
			if(mysql_errno()) print mysql_error()."\n";
		}
	}
	else
	{
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		if(mysql_errno()) print mysql_error()."\n";
	}
}if($row[2]==2) // Hotfile check
{
	$row[1]="http://www.".$row[1];
                $index=getpage($row[1]);
	preg_match_all('/(<td>Downloading <b>)(.*)(<\/b>)/',$index,$match);
	if($match[1]) $desc=mysql_real_escape_string(strip_tags($match[1]));

	if(strpos($index,"Downloading")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		print "$output[1] :: $output[2]\n";
		logstr("log-c.txt","$output[1] :: $output[2]\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$output[2]',`caption`='$output[1]' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

        if($row[2]==3) // sendspace check
{
	$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";

	$index=getpage($row[1]);
	preg_match_all("/<b>(\w+)\:<\/b>([^<]+)<br>/",$index,$match);
	$desc=array();
	for($i=0;$i<3;$i++) if(isset($match[1][$i])) $desc[$match[1][$i]]=trim($match[2][$i]);
	unset($match);
	if(strpos($index,"The download link is located below")===false || !$desc)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		$words=trim($desc['Name']);
		$words=preg_split("/[_\.\-\s]/",$words);
		$lastword=array_pop($words);
		$words=implode(" ",$words);
		$words=preg_replace("/\s{2,}/"," ",$words);
		$caption=mysql_real_escape_string($words);
		unset($words);
		print "$caption :: ".$desc['Size']."\n";
		logstr("log-c.txt","$caption :: ".$desc['Size']."\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$desc['Size']."',`caption`='$caption' WHERE `id`=".$row[0]);
		unset($desc);
		if(mysql_errno()) print mysql_error()."\n";
	}
}	
      
        
if($row[2]==4) // badongo check
{
	$row[1]="http://www.".$row[1];
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";
	//print_r($row);
	$index=getpage($row[1]);
	preg_match_all("/<td> ([^<]+)<\/td>/",$index,$match);
	if(strpos($index,"This file has been deleted because it has been inactive for over 30 days")!==false ||
		 strpos($index,"This file has been removed due to copyright infrigment")!==false || 
		 strpos($index,"File deactivated!")!==false || !$match[1])
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		//print_r($match[1]);
		if(strlen($match[1][0])>strlen($match[1][3]))
		{
			$words=trim($match[1][0]);
			$words=preg_split("/[_\.\-\s]/",$words);
			$lastword=array_pop($words);
			if($lastword=="html") array_pop($words);
			$words=implode(" ",$words);
			$words=preg_replace("/\s{2,}/"," ",$words);
			$caption=mysql_real_escape_string($words);
			unset($words);
		}
		else
		{
			$words=$match[1][3];
			$words=preg_split("/[_\.\-\s]/",$words);
			$words=implode(" ",$words);
			$words=preg_replace("/\s{2,}/"," ",$words);
			$caption=mysql_real_escape_string($words);
			unset($words);
		}
		print "$caption :: ".$match[1][1]."\n";
		logstr("log-c.txt","$caption :: ".$match[1][1]."\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$match[1][1]."',`caption`='$caption' WHERE `id`=".$row[0]);
		unset($match);
		if(mysql_errno()) print mysql_error()."\n";
	}
}	
if($row[2]==5) // mediafire check
{
	$row[1]="http://www.".$row[1];
	$index=getpage($row[1]);
	//print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";
	preg_match("/You requested ([^\(]+)\(([^\)]+)\)<\/div>/",$index,$match);
	if(strpos($index," Please enter the reason for reporting this file:")===false || !$match[1])
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		$words=trim($match[1]);
		$words=preg_split("/[_\.\-\s]/",$words);
		$lastword=array_pop($words);
		$words=implode(" ",$words);
		$words=preg_replace("/\s{2,}/"," ",$words);
		$caption=mysql_real_escape_string($words);
		unset($words);
		print "$caption :: ".$match[2]."\n";
		logstr("log-c.txt","$caption :: ".$match[2]."\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='".$match[2]."',`caption`='$caption' WHERE `id`=".$row[0]);
		unset($match);
		if(mysql_errno()) print mysql_error()."\n";
	}
}	

if($row[2]==6) // 4shared check
{
	$row[1]="http://www.".$row[1];
	$index=getpage($row[1]);
	preg_match("/Download ([^<]+)<\/div>/",$index,$match);
	if(strpos($index,"Kaspersky Anti-Virus")===false || !$match)
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
		$words=trim($match[1]);
		$words=preg_split("/[_\.\-\s]/",$words);
		$lastword=array_pop($words);
		$words=implode(" ",$words);
		$words=preg_replace("/\s{2,}/"," ",$words);
		$caption=mysql_real_escape_string($words);
		unset($words);
		unset($match);
		preg_match("/<td>Size\:<\/td><td>([^<]+)<\/td>/",$index,$match);
		$fsize=$match[1];
		unset($match);
		print "$caption :: $fsize\n";
		logstr("log-c.txt","$caption :: $fsize\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$fsize',`caption`='$caption' WHERE `id`=".$row[0]);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

if(!file_exists("run-c.flag")) exit("run-c.flag was deleted\n");
}
print "</pre>\n";
exec("rm run-c.flag");
logstr("log-c.txt","[end]".date("Y-m-d H:i:s")."[/end] \n");
?>

 

I can give you ftp acces if this could make things easyer..

 

Thanks again!

Link to comment
Share on other sites

I can give you ftp acces if this could make things easyer..

 

Thanks again!

 

The perfect word to explain regex is torture.

It's an extremely powerful tool as long as your willing to lose a bit of hair in the process. ;)

 

Anyways, I'll assume the getpage() function is a curl call and your passing the url to it.

 

I think I see your problem here and its a bit ironic. Remember I was asking you if you wanted to download the file? Thats exactly what it's trying to here by downloading the form code once it's successful:

 

 preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match);
         //print $index;

 

See that. Its grabbing the form code when it should be going directly to the regex I gave you.

 

So put in my regex into that place like so:

 

$index=getpage($row[1]);

      if(strpos($index,"The file could not be found. Please check the download link.")===false && strpos($index,"Due to a violation of our terms of use, the file has been removed from the server.")===false)

      {

preg_match("/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/",$index,$match);

//HERE Grab the $match[1] and $match which are the values we spoke about before THEN Insert into the DB or output to a txt file.

 

Erase all of this:

     

  //print $index;

        if($match[1])

        {

            $fpath=$match[1];

            $index=getpage($fpath,"dl.start=Free",$row[1]);

            preg_match("#<font[^>]*?>(.*?)<\/font>#",$index,$match);

            $fsize=0;

            if($match[1]) $fsize= substr(mysql_real_escape_string(strip_tags($match[1])), 2);

            print $fsize."\n";

            logstr("log-c.txt",$fsize."\n");

            mysql_query("UPDATE `v2links` SET `checked`='1',`fsize`='$fsize',`lastcheck`=NOW() WHERE `id`=".$row[0]);

            if(mysql_errno()) print mysql_error()."\n";

        }

}

        else

        {

            print "bad link\n";

            logstr("log-c.txt","bad link\n");

            mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);

            if(mysql_errno()) print mysql_error()."\n";

Reason I'm telling you to erase that part is because you mentioned you werent interesting in downloading anything just grabbing the link, file size and filename. That's it.

Besides I wouldn't use that code anyhow and prefer a regex over it.

 

Anyways removing all of that code should make it work especially since your first regex statement is NOT looking for a form but rather those values you wanted.

 

Link to comment
Share on other sites

 

The perfect word to explain regex is torture.

It's an extremely powerful tool as long as your willing to lose a bit of hair in the process. ;)

 

I noticed! except... I probably already lost more then a bit ;)

 

Thanks again for the info, you probably switched the "rapidshare file check" part with Hotfile.

preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match);
         //print $index;

is only used for rapidshare in my code... but.. Your explanation was exactly what I needed!

I finally understand it and realize that I was started wrong!

 

I started from scratch again and its working now :-) Hip Hip

if($row[2]==2) // Hotfile Check
{
	$row[1]="http://www.".$row[1];
	$index=getpage($row[1]);
	print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";
	if(strpos($index,"Downloading")===false) //check if page contains the word downloading  if not = bad link
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
                        $pattern = '/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/';
                        preg_match($pattern, $index, $output);

                        $caption=$output[1];
                        $fsize=$output[2];


		print "$caption :: $fsize\n";
		logstr("log-c.txt","$caption :: $fsize\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$fsize',`caption`='$caption' WHERE `id`=".$row[0]);
		unset($match);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

 

Works link a charm like this!

Thanks again for your time and if you have a paypall email please pm me...  I'm gonna pay you that BEER!

Promised!

 

Link to comment
Share on other sites

 

The perfect word to explain regex is torture.

It's an extremely powerful tool as long as your willing to lose a bit of hair in the process. ;)

 

I noticed! except... I probably already lost more then a bit ;)

 

Thanks again for the info, you probably switched the "rapidshare file check" part with Hotfile.

preg_match("/<form action=\"([^\"]+)\" method=\"post\">/",$index,$match);
         //print $index;

is only used for rapidshare in my code... but.. Your explanation was exactly what I needed!

I finally understand it and realize that I was started wrong!

 

I started from scratch again and its working now :-) Hip Hip

if($row[2]==2) // Hotfile Check
{
	$row[1]="http://www.".$row[1];
	$index=getpage($row[1]);
	print "<a href=\"".$row[1]."\">".$row[1]."</a>\n";
	if(strpos($index,"Downloading")===false) //check if page contains the word downloading  if not = bad link
	{
		mysql_query("UPDATE `v2links` SET `checked`='-1',`lastcheck`=NOW() WHERE `id`=".$row[0]);
		print "bad link\n";
		logstr("log-c.txt","bad link\n");
	}
	else
	{
                        $pattern = '/<table\sclass="downloading"\>.+<b>(.*?)\<\/b\>.+\<span\sclass="size"\>\| (.*?)\<\/span\>/';
                        preg_match($pattern, $index, $output);

                        $caption=$output[1];
                        $fsize=$output[2];


		print "$caption :: $fsize\n";
		logstr("log-c.txt","$caption :: $fsize\n");
		mysql_query("UPDATE `v2links` SET `checked`='1',`lastcheck`=NOW(),`fsize`='$fsize',`caption`='$caption' WHERE `id`=".$row[0]);
		unset($match);
		if(mysql_errno()) print mysql_error()."\n";
	}
}

 

Works link a charm like this!

Thanks again for your time and if you have a paypall email please pm me...  I'm gonna pay you that BEER!

Promised!

 

Glad to hear it worked out for you. As far as the beer goes I only kid. :)

 

As a fellow noob I enjoy giving back whenever I can. Have a good one.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.