Jump to content

Recommended Posts

I have little preg_match problem on my new torrent crawler (using cURL and preg_match):

 

crawler.php

$pattern = "/Category<\/td><td[^>]>(.*?)<\/td>/s";
preg_match($pattern, $contents, $categorymatches);
if(!isset($categorymatches[1])) {
echo "FAILED TO MATCH CATEGORY!";
return FALSE;
}

 

Returns:

FAILED TO MATCH CATEGORY!

 

HTML that needs to be crawled (the word MOVIES):

Type</td><td valign="top" align=left>MOVIES</td>

 

Any help is appriciated!

 

Link to comment
https://forums.phpfreaks.com/topic/256896-preg_match-crawling-from-html-tags/
Share on other sites

category in you pattern will never match type in the string.

 

do you want to match anything in between table columns?

 

if so...

 

$str = "Type</td><td valign='top' align=left>MOVIES</td>";
$pattern = "~<td(?>[^>]+)>((?>[^<]+))</td>~";
preg_match($pattern,$str,$ms);
print_r($ms);

 

if not, specify the requirements more thoroughly.

Thanks alot, but I also want to set the "Category" in the pattern, for example:

$str = "Category</td><td valign='top' align=left>MOVIES</td>";
$pattern = "/Category<td(?>[^>]+)>((?>[^<]+))<\/td>/s";
preg_match($pattern,$str,$ms);
print_r($ms);

 

But it gives me an empty array..

 

EDIT: In the last post I've made an mistake, it's not Type, it's also Category :)

So, in a few words, I want this:

$str = "Category</td><td valign='top' align=left>MOVIES</td>";
$pattern = "/Category<td(?>[^>]+)>((?>[^<]+))<\/td>/s";
preg_match($pattern,$str,$ms);
print_r($ms);

 

To returns:

Array ( [0] => MOVIES [1] => MOVIES )

 

But it returns:

Array ( )

 

Thanks alot, but I also want to set the "Category" in the pattern, for example:

$str = "Category</td><td valign='top' align=left>MOVIES</td>";
$pattern = "/Category<td(?>[^>]+)>((?>[^<]+))<\/td>/s";
preg_match($pattern,$str,$ms);
print_r($ms);

 

But it gives me an empty array..

 

EDIT: In the last post I've made an mistake, it's not Type, it's also Category :)

 

ah,

 

$str = "Category</td><td valign='top' align=left>MOVIES</td>";
$pattern = "~Category</td><td(?>[^>]+)>((?>[^<]+))</td>~";
preg_match($pattern,$str,$ms);
print_r($ms);

 

$ms[1] will hold the captured value, in this case "MOVIES"

Hi, thanks alot for the help - it works now, but only if I put it in a new file.

 

Here's what's going on if I use it on the crawler..

// ... the cURL codes (they're working) ...
// Content of the Page
$contents = curl_exec($crawler->curl);

// Find the Title
$pattern = "/<title>(.*?)<\/title>/s";
preg_match($pattern, $contents, $titlematches);
echo $titlematches[1]."<br/>";

// Find the Category
$pattern = "~Тип</td><td(?>[^>]+)>((?>[^<]+))</td>~";
preg_match($pattern, $contents, $categorymatches);
echo $categorymatches[1]."<br/>";

 

The HTML page: ("Тип" means Category and "Филми" means Movies)

<title>The Matrix</title>
<!--Some Codes Here--!>
<tr><td>Тип</td><td valign="top" align=left>Филми</td></tr>
<!--Some Codes Here--!>

 

The result:

The Matrix
Notice: Undefined offset: 1 in /var/www/spider.php on line 117

 

Very strange! It's showing the title but not the category..

I've tried to echo $categorymatches[0], $categorymatches[2], $categorymatches[3] without any luck.

 

matches for me when tested..

 

$str = '<td>Тип</td><td valign="top" align=left>Филми</td></tr>';
$pattern = '~Тип</td><td(?>[^>]+)>((?>[^<]+))</td>~';
preg_match($pattern,$str,$ms);
print_r($ms);

 

results:

 

Array
(
    [0] => Тип</td><td valign='top' align=left>Филми</td>
    [1] => Филми
)

Yep, it works that way, but when it opens the content page with the cURL it messes it up (I mean the cURL)..

Here's my charset: Accept-Charset: windows-1251,utf-8;q=0.7,*;q=0.3

Is it possible to be the problem?

How can I see the cURL results?

 

most likely a charset issue, if you want to view the transfer results as a string, the CURLOPT_RETURNTRANSFER option needs to be set.

 

$ch = curl_init("http://www.test.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
$transf = curl_exec($ch);
curl_close($ch);
if($transf !== false)
    var_dump($transf);

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.