djtozz Posted October 3, 2009 Share Posted October 3, 2009 Hello, I'm trying to get the filesize for files hosted on 4shared. The page has following structure: <td class="finfoleft"><b>Size:</b></td><td class="finforight">12,670 KB</td> So I was hoping folowing pattern would do the job, but I'm getting an empty field preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",$index,$match); sample link: http://www.4shared.com/file/102359073/b135d053/Plastik_Funk_-_Rise____Houseshaker_Mix_DRM__.html Is there somebody who can help me with this? Thanks Quote Link to comment Share on other sites More sharing options...
Mchl Posted October 3, 2009 Share Posted October 3, 2009 <?php preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",'<td class="finfoleft"><b>Size:</b></td><td class="finforight">12,670 KB</td>',$match); var_dump($match); Works for me. [edit] The site you provided link to autodetects user's locale and displays translated text accordingly. For me it didn't display 'Size:' but equivalent in my language. This might be causing your problem. Quote Link to comment Share on other sites More sharing options...
djtozz Posted October 3, 2009 Author Share Posted October 3, 2009 The site you provided link to autodetects user's locale and displays translated text accordingly. For me it didn't display 'Size:' but equivalent in my language. This might be causing your problem. Thanks for the feedback! You're probably right! My server is located in th Czech Republic and that might cause the language problem. Is there anyway to make this working? Thanks, Quote Link to comment Share on other sites More sharing options...
Mchl Posted October 3, 2009 Share Posted October 3, 2009 The simplest? echo $index, to see what's exactly in there. Quote Link to comment Share on other sites More sharing options...
djtozz Posted October 3, 2009 Author Share Posted October 3, 2009 The simplest? echo $index, to see what's exactly in there. Thanks again.... The page is displayed in English... so I'm al little surprized here.... My code should have to grab filename and filesize.... the filename is displayed.. just not getting File size preg_match("/<h2 id=\"fileNameText\">(.*?)<\/h2>/i",$index,$match); if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1])); preg_match("/<td class=\"finfoleft\"><b>Size\:<\/b><\/td><td class=\"finforight\">(.*?)<\/td>/i",$index,$match2); if($match2[1]) $fsize=mysql_real_escape_string(strip_tags($match2[1])); print "$caption :: $fsize\n"; Am I missing something here? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 3, 2009 Share Posted October 3, 2009 I would approach this using DOM / XPath. Example: $dom = new DOMDocument; @$dom->loadHTMLFile('http://www.4shared.com/file/102359073/b135d053/Plastik_Funk_-_Rise____Houseshaker_Mix_DRM__.html'); $xpath = new DOMXPath($dom); $tdTag = $xpath->query('//td[@class="finforight"]/text()[contains(.,"KB")]'); // find text belonging to td node that contains the characters KB foreach($tdTag as $val){ echo $val->nodeValue . "<br />\n"; } Output: 12,670 KB Quote Link to comment Share on other sites More sharing options...
djtozz Posted October 4, 2009 Author Share Posted October 4, 2009 I would approach this using DOM / XPath. Thank you for the Info! I've tried to implementate this in my code, while $index is the url. But I'm not getting the size, while your code is working. Did I made some error? preg_match("/<h2 id=\"fileNameText\">(.*?)<\/h2>/i",$index,$match); // grab filename if($match[1]) $caption=mysql_real_escape_string(strip_tags($match[1])); $dom = new DOMDocument; @$dom->loadHTMLFile('$index'); $xpath = new DOMXPath($dom); $tdTag = $xpath->query('//td[@class="finforight"]/text()[contains(.,"KB")]'); // find text belonging to td node that contains the characters KB foreach($tdTag as $val){ } $fsize=$val->nodeValue . print "$caption :: $fsize\n" Quote Link to comment Share on other sites More sharing options...
thebadbad Posted October 5, 2009 Share Posted October 5, 2009 You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use @$dom->loadHTML($index); Quote Link to comment Share on other sites More sharing options...
djtozz Posted October 5, 2009 Author Share Posted October 5, 2009 You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use @$dom->loadHTML($index); Thank you, I changed that but still not getting the right value.. I'm getting 1 So not sure what's wrong. [start]2009-10-05 13:48:50[/start] [1] 4shared.com/file/32493315/a7e960a1/Calabria_2008_-_Enur_Ft_Natasja__Nicolas_Montecino_Remix_.html?s=1 Calabria 2008 - Enur Ft Natasja (Nicolas Montecino Remix).mp3 :: 1 [2] 4shared.com/file/64047131/566538fa/12Alex_Gaudino_Feat_Crystal_Waters_-_Destination_Calabria.html?s=1&showComments=true#firstC 12.Alex Gaudino Feat. Crystal Waters - Destination Calabria.mp3 :: 1 [3] 4shared.com/file/54010987/8959a596/Destination_Calabria__-dj_Klein.html?s=1&showComments=true#firstC Destination Calabria -dj Klein.mp3 :: 1 [4] 4shared.com/file/42243265/cd66b2e4/Big_Ali_Feat_Lil_Jon__Vynil_Squad_-_Calabria_En_Fuego.html?s=1 Big Ali Feat. Lil Jon & Vynil Squad - Calabria En Fuego.mp3 :: 1 [5] 4shared.com/file/54010987/8959a596/Destination_Calabria__-dj_Klein.html?s=1 Destination Calabria -dj Klein.mp3 :: 1 Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 5, 2009 Share Posted October 5, 2009 ... foreach($tdTag as $val){ } $fsize=$val->nodeValue . print "$caption :: $fsize\n" If your code still has this part, understand that you have to have the $fsize=$val line within the foreach loop in order to properly cycle through the $val value, otherwise you'll get improper results.. As it stands, your foreach loop is empty... Consider this: $arr = array(1,2,3,4,5,6); foreach($arr as $val){ } echo $val . "<br />\n"; // Output: 6 vs this: $arr = array(1,2,3,4,5,6); foreach($arr as $val){ echo $val . "<br />\n"; // Output: 1 2 3 4 5 6 } And as thebadbad mentions, $index is assumed to be source code? If so, then use his suggestion of @dom->loadHTML($index).. if $index represents an entire url, then use @dom->loadHTMLFile($index) instead. Quote Link to comment Share on other sites More sharing options...
djtozz Posted October 6, 2009 Author Share Posted October 6, 2009 And as thebadbad mentions, $index is assumed to be source code? If so, then use his suggestion of @dom->loadHTML($index).. if $index represents an entire url, then use @dom->loadHTMLFile($index) instead. Thank you guys! Sorry.. I only removed the quotes after the first answer, I was still using loadHTMLFile($index) now I'm using loadHTML($index) and it works like a charm! Thanks a bunch for this excellent info! Keep up the good work! Quote Link to comment Share on other sites More sharing options...
djtozz Posted October 6, 2009 Author Share Posted October 6, 2009 You're putting $index inside single quotes. And $index isn't the URL, but the actual source code, right? In that case use @$dom->loadHTML($index); Sorry.. I didn't read your comment properly, I only removed the quotes and I was still using loadHTMLfile, It works like a charm! thanks a bunch! Quote Link to comment Share on other sites More sharing options...
MadTechie Posted October 6, 2009 Share Posted October 6, 2009 Solved ? (if so please click topic solved at the bottom) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.