Jump to content

PHP Web Scrape Problem?


wwfc_barmy_army

Recommended Posts

Hello.

 

I am testing out web scraping and I have a page I am testing with and it has a div with a class 'total_price'.

 

I am using the Simple HTML DOM Parser with this code:

// Create DOM from URL or file
$html = file_get_html('**My Test Page**');

$ret = $html->find('.total_price');

print_r($ret);

 

Although it seems to get stuck in some kind of loop and I get a lot of output text:

Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => p [attr] => Array ( [class] => total_price ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => £44.99 ) [dom:private] => simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 6 [tag] => unknown [attr] =>  Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [0] => 2 [4] => ) [dom:private] => simple_html_dom Object *RECURSION* ) [1] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => html [attr] =>  Array ( [xmlns] => http://www.w3.org/1999/xhtml [lang] => en-GB ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => head [attr] => Array ( [id] => ctl00_HtmlHead1_Head1 ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => script [attr] => Array ( [type] => text/javascript ) [children]

..etc...etc...

 

There is one point it returns the value I am trying to get (the price) which is:

Array ( [4] => £44.99 )

 

Anyone see what i'm doing wrong?

 

Thanks.

Link to comment
https://forums.phpfreaks.com/topic/207460-php-web-scrape-problem/
Share on other sites

My test page is simply:

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<h1>My test page</h1>
<div id="header"></div>
<div class="total_price">£44.99</div>
<div id="footer">FOOTER</div>
</body>
</html>

 

Any ideas?

 

Thanks.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.