PHP Web Scrape Problem?

wwfc_barmy_army · July 12, 2010

Hello.

I am testing out web scraping and I have a page I am testing with and it has a div with a class 'total_price'.

I am using the Simple HTML DOM Parser with this code:

// Create DOM from URL or file
$html = file_get_html('**My Test Page**');

$ret = $html->find('.total_price');

print_r($ret);

Although it seems to get stuck in some kind of loop and I get a lot of output text:

Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => p [attr] => Array ( [class] => total_price ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => Â£44.99 ) [dom:private] => simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 6 [tag] => unknown [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [0] => 2 [4] => ) [dom:private] => simple_html_dom Object *RECURSION* ) [1] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => html [attr] => Array ( [xmlns] => http://www.w3.org/1999/xhtml [lang] => en-GB ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => head [attr] => Array ( [id] => ctl00_HtmlHead1_Head1 ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => script [attr] => Array ( [type] => text/javascript ) [children]

..etc...etc...

There is one point it returns the value I am trying to get (the price) which is:

Array ( [4] => Â£44.99 )

Anyone see what i'm doing wrong?

Thanks.

.josh · July 12, 2010

not really enough info...my best guess is maybe your test page's html is so mal-formed the parser fails to parse it correctly.

wwfc_barmy_army · July 12, 2010

My test page is simply:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<h1>My test page</h1>
<div id="header"></div>
<div class="total_price">£44.99</div>
<div id="footer">FOOTER</div>
</body>
</html>

Any ideas?

Thanks.

.josh · July 12, 2010

okay well what does file_get_html() look like?

Sign In

PHP Web Scrape Problem?

Recommended Posts

wwfc_barmy_army

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

wwfc_barmy_army

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information