Jump to content

PHP Web Scrape Problem?


wwfc_barmy_army

Recommended Posts

Hello.

 

I am testing out web scraping and I have a page I am testing with and it has a div with a class 'total_price'.

 

I am using the Simple HTML DOM Parser with this code:

// Create DOM from URL or file
$html = file_get_html('**My Test Page**');

$ret = $html->find('.total_price');

print_r($ret);

 

Although it seems to get stuck in some kind of loop and I get a lot of output text:

Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => p [attr] => Array ( [class] => total_price ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => £44.99 ) [dom:private] => simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 6 [tag] => unknown [attr] =>  Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [0] => 2 [4] => ) [dom:private] => simple_html_dom Object *RECURSION* ) [1] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => html [attr] =>  Array ( [xmlns] => http://www.w3.org/1999/xhtml [lang] => en-GB ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => head [attr] => Array ( [id] => ctl00_HtmlHead1_Head1 ) [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => script [attr] => Array ( [type] => text/javascript ) [children]

..etc...etc...

 

There is one point it returns the value I am trying to get (the price) which is:

Array ( [4] => £44.99 )

 

Anyone see what i'm doing wrong?

 

Thanks.

Link to comment
Share on other sites

My test page is simply:

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<h1>My test page</h1>
<div id="header"></div>
<div class="total_price">£44.99</div>
<div id="footer">FOOTER</div>
</body>
</html>

 

Any ideas?

 

Thanks.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.