htorbov Posted February 11, 2012 Share Posted February 11, 2012 I have little preg_match problem on my new torrent crawler (using cURL and preg_match): crawler.php $pattern = "/Category<\/td><td[^>]>(.*?)<\/td>/s"; preg_match($pattern, $contents, $categorymatches); if(!isset($categorymatches[1])) { echo "FAILED TO MATCH CATEGORY!"; return FALSE; } Returns: FAILED TO MATCH CATEGORY! HTML that needs to be crawled (the word MOVIES): Type</td><td valign="top" align=left>MOVIES</td> Any help is appriciated! Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/ Share on other sites More sharing options...
AyKay47 Posted February 11, 2012 Share Posted February 11, 2012 category in you pattern will never match type in the string. do you want to match anything in between table columns? if so... $str = "Type</td><td valign='top' align=left>MOVIES</td>"; $pattern = "~<td(?>[^>]+)>((?>[^<]+))</td>~"; preg_match($pattern,$str,$ms); print_r($ms); if not, specify the requirements more thoroughly. Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317020 Share on other sites More sharing options...
htorbov Posted February 11, 2012 Author Share Posted February 11, 2012 Thanks alot, but I also want to set the "Category" in the pattern, for example: $str = "Category</td><td valign='top' align=left>MOVIES</td>"; $pattern = "/Category<td(?>[^>]+)>((?>[^<]+))<\/td>/s"; preg_match($pattern,$str,$ms); print_r($ms); But it gives me an empty array.. EDIT: In the last post I've made an mistake, it's not Type, it's also Category Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317040 Share on other sites More sharing options...
htorbov Posted February 11, 2012 Author Share Posted February 11, 2012 So, in a few words, I want this: $str = "Category</td><td valign='top' align=left>MOVIES</td>"; $pattern = "/Category<td(?>[^>]+)>((?>[^<]+))<\/td>/s"; preg_match($pattern,$str,$ms); print_r($ms); To returns: Array ( [0] => MOVIES [1] => MOVIES ) But it returns: Array ( ) Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317045 Share on other sites More sharing options...
AyKay47 Posted February 11, 2012 Share Posted February 11, 2012 Thanks alot, but I also want to set the "Category" in the pattern, for example: $str = "Category</td><td valign='top' align=left>MOVIES</td>"; $pattern = "/Category<td(?>[^>]+)>((?>[^<]+))<\/td>/s"; preg_match($pattern,$str,$ms); print_r($ms); But it gives me an empty array.. EDIT: In the last post I've made an mistake, it's not Type, it's also Category ah, $str = "Category</td><td valign='top' align=left>MOVIES</td>"; $pattern = "~Category</td><td(?>[^>]+)>((?>[^<]+))</td>~"; preg_match($pattern,$str,$ms); print_r($ms); $ms[1] will hold the captured value, in this case "MOVIES" Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317047 Share on other sites More sharing options...
htorbov Posted February 11, 2012 Author Share Posted February 11, 2012 Hi, thanks alot for the help - it works now, but only if I put it in a new file. Here's what's going on if I use it on the crawler.. // ... the cURL codes (they're working) ... // Content of the Page $contents = curl_exec($crawler->curl); // Find the Title $pattern = "/<title>(.*?)<\/title>/s"; preg_match($pattern, $contents, $titlematches); echo $titlematches[1]."<br/>"; // Find the Category $pattern = "~Тип</td><td(?>[^>]+)>((?>[^<]+))</td>~"; preg_match($pattern, $contents, $categorymatches); echo $categorymatches[1]."<br/>"; The HTML page: ("Тип" means Category and "Филми" means Movies) <title>The Matrix</title> <!--Some Codes Here--!> <tr><td>Тип</td><td valign="top" align=left>Филми</td></tr> <!--Some Codes Here--!> The result: The Matrix Notice: Undefined offset: 1 in /var/www/spider.php on line 117 Very strange! It's showing the title but not the category.. I've tried to echo $categorymatches[0], $categorymatches[2], $categorymatches[3] without any luck. Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317067 Share on other sites More sharing options...
AyKay47 Posted February 11, 2012 Share Posted February 11, 2012 matches for me when tested.. $str = '<td>Тип</td><td valign="top" align=left>Филми</td></tr>'; $pattern = '~Тип</td><td(?>[^>]+)>((?>[^<]+))</td>~'; preg_match($pattern,$str,$ms); print_r($ms); results: Array ( [0] => Тип</td><td valign='top' align=left>Филми</td> [1] => Филми ) Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317081 Share on other sites More sharing options...
htorbov Posted February 11, 2012 Author Share Posted February 11, 2012 Yep, it works that way, but when it opens the content page with the cURL it messes it up (I mean the cURL).. Here's my charset: Accept-Charset: windows-1251,utf-8;q=0.7,*;q=0.3 Is it possible to be the problem? How can I see the cURL results? Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317083 Share on other sites More sharing options...
AyKay47 Posted February 11, 2012 Share Posted February 11, 2012 most likely a charset issue, if you want to view the transfer results as a string, the CURLOPT_RETURNTRANSFER option needs to be set. $ch = curl_init("http://www.test.com"); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); $transf = curl_exec($ch); curl_close($ch); if($transf !== false) var_dump($transf); Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317088 Share on other sites More sharing options...
htorbov Posted February 11, 2012 Author Share Posted February 11, 2012 I'll check it now Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317093 Share on other sites More sharing options...
AyKay47 Posted February 11, 2012 Share Posted February 11, 2012 alright, now wherever you are executing the curl with curl_exec, set it up similar to the code that I have posted, setting the results to a variable, then var_dump()ing the results. Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317094 Share on other sites More sharing options...
htorbov Posted February 11, 2012 Author Share Posted February 11, 2012 OMG!!! I just had to convert the .php file to ANSI! Now it's working!! Really thanks for the help, I'm very happy that it's working now! Thanks again Quote Link to comment https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/#findComment-1317096 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.