Jump to content

[SOLVED] Regex Pattern help


djtozz

Recommended Posts

Hello, I'm trying to get the filesize for files hosted on 4shared.

 

The page has following structure:

 

<td class="finfoleft"><b>Size:</b></td><td class="finforight">12,670 KB</td>

 

So I was hoping folowing pattern would do the job, but I'm getting an empty field

 

preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",$index,$match);

 

sample link: http://www.4shared.com/file/102359073/b135d053/Plastik_Funk_-_Rise____Houseshaker_Mix_DRM__.html

 

Is there somebody who can help me with this?

Thanks

Link to comment
Share on other sites

<?php
preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",'<td class="finfoleft"><b>Size:</b></td><td class="finforight">12,670 KB</td>',$match);
var_dump($match);

 

Works for me.

 

[edit]

 

The site you provided link to autodetects user's locale and displays translated text accordingly. For me it didn't display 'Size:' but equivalent in my language. This might be causing your problem.

Link to comment
Share on other sites

The site you provided link to autodetects user's locale and displays translated text accordingly. For me it didn't display 'Size:' but equivalent in my language. This might be causing your problem.

 

Thanks for the feedback! You're probably right!

My server is located in th Czech Republic and that might cause the language problem.

 

Is there anyway to make this working?

 

Thanks,

 

Link to comment
Share on other sites

The simplest? echo $index, to see what's exactly in there.

 

Thanks again.... The page is displayed in English... so I'm al little surprized here....

My code should have to grab filename and filesize.... the filename is displayed.. just not getting File size

 

                        preg_match("/<h2 id=\"fileNameText\">(.*?)<\/h2>/i",$index,$match);
		if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));

                        preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",$index,$match2); 
		if($match2[1]) $fsize=mysql_real_escape_string(strip_tags($match2[1]));

                        print "$caption :: $fsize\n";

 

Am I missing something here?

 

 

Link to comment
Share on other sites

I would approach this using DOM / XPath.

 

Example:

$dom = new DOMDocument;
@$dom->loadHTMLFile('http://www.4shared.com/file/102359073/b135d053/Plastik_Funk_-_Rise____Houseshaker_Mix_DRM__.html');
$xpath = new DOMXPath($dom);
$tdTag = $xpath->query('//td[@class="finforight"]/text()[contains(.,"KB")]'); // find text belonging to td node that contains the characters KB
foreach($tdTag as $val){
    echo $val->nodeValue . "<br />\n";
}

 

Output:

12,670 KB

Link to comment
Share on other sites

I would approach this using DOM / XPath.

 

Thank you for the Info!

 

I've tried to implementate this in my code, while $index is the url.

But I'm not getting the size, while your code is working.

Did I made some error?

 

                       
                        preg_match("/<h2 id=\"fileNameText\">(.*?)<\/h2>/i",$index,$match); // grab filename
                        if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1]));
                                    
                        $dom = new DOMDocument;
                        @$dom->loadHTMLFile('$index');
                        $xpath = new DOMXPath($dom);
                        $tdTag = $xpath->query('//td[@class="finforight"]/text()[contains(.,"KB")]'); // find text belonging to td node that contains the characters KB
                        foreach($tdTag as $val){
                         }
      
                        $fsize=$val->nodeValue . 
                        		
		        print "$caption :: $fsize\n"

Link to comment
Share on other sites

You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use

 

@$dom->loadHTML($index);

 

Thank you, I changed that but still not getting the right value.. I'm getting 1

So not sure what's wrong.

 

[start]2009-10-05 13:48:50[/start] 
[1] 4shared.com/file/32493315/a7e960a1/Calabria_2008_-_Enur_Ft_Natasja__Nicolas_Montecino_Remix_.html?s=1
Calabria 2008 - Enur Ft Natasja (Nicolas Montecino Remix).mp3 :: 1 
[2] 4shared.com/file/64047131/566538fa/12Alex_Gaudino_Feat_Crystal_Waters_-_Destination_Calabria.html?s=1&showComments=true#firstC
12.Alex Gaudino Feat. Crystal Waters - Destination Calabria.mp3 :: 1 
[3] 4shared.com/file/54010987/8959a596/Destination_Calabria__-dj_Klein.html?s=1&showComments=true#firstC
Destination Calabria  -dj Klein.mp3 :: 1 
[4] 4shared.com/file/42243265/cd66b2e4/Big_Ali_Feat_Lil_Jon__Vynil_Squad_-_Calabria_En_Fuego.html?s=1
Big Ali Feat. Lil Jon & Vynil Squad - Calabria En Fuego.mp3 :: 1 
[5] 4shared.com/file/54010987/8959a596/Destination_Calabria__-dj_Klein.html?s=1
Destination Calabria  -dj Klein.mp3 :: 1 

Link to comment
Share on other sites

                       
                       ...
                        foreach($tdTag as $val){
                         }
      
                        $fsize=$val->nodeValue . 
                        		
		        print "$caption :: $fsize\n"

 

If your code still has this part, understand that you have to have the $fsize=$val line within the foreach loop in order to properly cycle through the $val value, otherwise you'll get improper results.. As it stands, your foreach loop is empty...

 

Consider this:

 

$arr = array(1,2,3,4,5,6);
foreach($arr as $val){
}
echo $val . "<br />\n"; // Output: 6

 

vs this:

 

$arr = array(1,2,3,4,5,6);
foreach($arr as $val){
    echo $val . "<br />\n"; // Output: 1 2 3 4 5 6
}

 

And as thebadbad mentions, $index is assumed to be source code? If so, then use his suggestion of @dom->loadHTML($index).. if $index represents an entire url, then use @dom->loadHTMLFile($index) instead.

Link to comment
Share on other sites

And as thebadbad mentions, $index is assumed to be source code? If so, then use his suggestion of @dom->loadHTML($index).. if $index represents an entire url, then use @dom->loadHTMLFile($index) instead.

 

Thank you guys!

Sorry.. I only removed the quotes after the first answer, I was still using loadHTMLFile($index)

now I'm using loadHTML($index) and it works like a charm!

 

Thanks a bunch for this excellent info!

Keep up the good work!

 

Link to comment
Share on other sites

You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use

 

@$dom->loadHTML($index);

 

Sorry.. I didn't read your comment properly, I only removed the quotes and I was still using loadHTMLfile, It works like a charm!

 

thanks a bunch!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.