[SOLVED] Regex Pattern help

djtozz · October 3, 2009

Hello, I'm trying to get the filesize for files hosted on 4shared.

The page has following structure:

<td class="finfoleft"><b>Size:</b></td><td class="finforight">12,670 KB</td>

So I was hoping folowing pattern would do the job, but I'm getting an empty field

preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",$index,$match);

sample link: http://www.4shared.com/file/102359073/b135d053/Plastik_Funk_-_Rise____Houseshaker_Mix_DRM__.html

Is there somebody who can help me with this?

Thanks

Mchl · October 3, 2009

<?php
preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",'<td class="finfoleft"><b>Size:</b></td><td class="finforight">12,670 KB</td>',$match);
var_dump($match);

Works for me.

[edit]

The site you provided link to autodetects user's locale and displays translated text accordingly. For me it didn't display 'Size:' but equivalent in my language. This might be causing your problem.

djtozz · October 3, 2009

The site you provided link to autodetects user's locale and displays translated text accordingly. For me it didn't display 'Size:' but equivalent in my language. This might be causing your problem.

Thanks for the feedback! You're probably right!

My server is located in th Czech Republic and that might cause the language problem.

Is there anyway to make this working?

Thanks,

Mchl · October 3, 2009

The simplest? echo $index, to see what's exactly in there.

djtozz · October 3, 2009

The simplest? echo $index, to see what's exactly in there.

Thanks again.... The page is displayed in English... so I'm al little surprized here....

My code should have to grab filename and filesize.... the filename is displayed.. just not getting File size

                        preg_match("/<h2 id=\"fileNameText\">(.*?)<\/h2>/i",$index,$match);
		if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));

                        preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",$index,$match2); 
		if($match2[1]) $fsize=mysql_real_escape_string(strip_tags($match2[1]));

                        print "$caption :: $fsize\n";

Am I missing something here?

nrg_alpha · October 3, 2009

I would approach this using DOM / XPath.

Example:

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://www.4shared.com/file/102359073/b135d053/Plastik_Funk_-_Rise____Houseshaker_Mix_DRM__.html');
$xpath = new DOMXPath($dom);
$tdTag = $xpath->query('//td[@class="finforight"]/text()[contains(.,"KB")]'); // find text belonging to td node that contains the characters KB
foreach($tdTag as $val){
    echo $val->nodeValue . "<br />\n";
}

Output:

12,670 KB

djtozz · October 4, 2009

I would approach this using DOM / XPath.

Thank you for the Info!

I've tried to implementate this in my code, while $index is the url.

But I'm not getting the size, while your code is working.

Did I made some error?

                       
                        preg_match("/<h2 id=\"fileNameText\">(.*?)<\/h2>/i",$index,$match); // grab filename
                        if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));
                                    
                        $dom = new DOMDocument;
                        @$dom->loadHTMLFile('$index');
                        $xpath = new DOMXPath($dom);
                        $tdTag = $xpath->query('//td[@class="finforight"]/text()[contains(.,"KB")]'); // find text belonging to td node that contains the characters KB
                        foreach($tdTag as $val){
                         }
      
                        $fsize=$val->nodeValue . 
                        		
		        print "$caption :: $fsize\n"

thebadbad · October 5, 2009

You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use

@$dom->loadHTML($index);

djtozz · October 5, 2009

You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use
@$dom->loadHTML($index);

Thank you, I changed that but still not getting the right value.. I'm getting 1

So not sure what's wrong.

[start]2009-10-05 13:48:50[/start] 
[1] 4shared.com/file/32493315/a7e960a1/Calabria_2008_-_Enur_Ft_Natasja__Nicolas_Montecino_Remix_.html?s=1
Calabria 2008 - Enur Ft Natasja (Nicolas Montecino Remix).mp3 :: 1 
[2] 4shared.com/file/64047131/566538fa/12Alex_Gaudino_Feat_Crystal_Waters_-_Destination_Calabria.html?s=1&showComments=true#firstC
12.Alex Gaudino Feat. Crystal Waters - Destination Calabria.mp3 :: 1 
[3] 4shared.com/file/54010987/8959a596/Destination_Calabria__-dj_Klein.html?s=1&showComments=true#firstC
Destination Calabria  -dj Klein.mp3 :: 1 
[4] 4shared.com/file/42243265/cd66b2e4/Big_Ali_Feat_Lil_Jon__Vynil_Squad_-_Calabria_En_Fuego.html?s=1
Big Ali Feat. Lil Jon & Vynil Squad - Calabria En Fuego.mp3 :: 1 
[5] 4shared.com/file/54010987/8959a596/Destination_Calabria__-dj_Klein.html?s=1
Destination Calabria  -dj Klein.mp3 :: 1

nrg_alpha · October 5, 2009

                       
                       ...
                        foreach($tdTag as $val){
                         }
      
                        $fsize=$val->nodeValue . 
                        		
		        print "$caption :: $fsize\n"

If your code still has this part, understand that you have to have the $fsize=$val line within the foreach loop in order to properly cycle through the $val value, otherwise you'll get improper results.. As it stands, your foreach loop is empty...

Consider this:

$arr = array(1,2,3,4,5,6);
foreach($arr as $val){
}
echo $val . "<br />\n"; // Output: 6

vs this:

$arr = array(1,2,3,4,5,6);
foreach($arr as $val){
    echo $val . "<br />\n"; // Output: 1 2 3 4 5 6
}

And as thebadbad mentions, $index is assumed to be source code? If so, then use his suggestion of @dom->loadHTML($index).. if $index represents an entire url, then use @dom->loadHTMLFile($index) instead.

djtozz · October 6, 2009

And as thebadbad mentions, $index is assumed to be source code? If so, then use his suggestion of @dom->loadHTML($index).. if $index represents an entire url, then use @dom->loadHTMLFile($index) instead.

Thank you guys!

Sorry.. I only removed the quotes after the first answer, I was still using loadHTMLFile($index)

now I'm using loadHTML($index) and it works like a charm!

Thanks a bunch for this excellent info!

Keep up the good work!

djtozz · October 6, 2009

You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use
@$dom->loadHTML($index);

Sorry.. I didn't read your comment properly, I only removed the quotes and I was still using loadHTMLfile, It works like a charm!

thanks a bunch!

MadTechie · October 6, 2009

Solved ?

(if so please click topic solved at the bottom)

Sign In

[SOLVED] Regex Pattern help

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived

Important Information