Read link text between div tags

suggys · February 9, 2013

Hi guys

I am attempting to create my own price checker feature for a site in a small niche. I am trying to do it my self as i am on a budget and the feature wont be used for profit, just to enhance my users experience as such.

My aim is to have my script check the urls in my database on a weekly basis for any updated price changes etc and at the min Ive got it working pretty much the way i want using preg_match but am stuck on the following.

I am trying to read the link text thats within a div.

<div class="ProductPageNav">
  <a href='Categories.asp'>Our Products</a>: <a href=COMPONENTS.htm' onmouseover="javascript:document.getCatPre.idcategory.value='40'; CatPrecallxml='1'; return runPreCatXML('cat_40');" onmouseout="javascript: CatPrecallxml=''; hidetip();">COMPONENTS</a> > <a href=c42.htm' onmouseover="javascript:document.getCatPre.idcategory.value='42'; CatPrecallxml='1'; return runPreCatXML('cat_42');" onmouseout="javascript: CatPrecallxml=''; hidetip();">Small Parts</a>
 </div>

There may be more, or less links within this div box, if someone could help me create the code to read each link and insert them into a friend in my database seperated by a comma ,

So in this example.. COMPONENTS, Small Parts should be extracted from the div and put in the database.

I would like it to ignore Our Products though?

Is this possible?

Thank you

doddsey_65 · February 10, 2013

$str = '<div class="ProductPageNav">
<a href="Categories.asp">Our Products</a>: <a href="COMPONENTS.htm" onmouseover="javascript:document.getCatPre.idcategory.value="40"; CatPrecallxml="1"; return runPreCatXML("cat_40");" onmouseout="javascript: CatPrecallxml=""; hidetip();">COMPONENTS</a> > <a href="c42.htm" onmouseover="javascript:document.getCatPre.idcategory.value="42"; CatPrecallxml="1"; return runPreCatXML("cat_42");" onmouseout="javascript: CatPrecallxml=""; hidetip();">Small Parts</a>
</div>';

$regex = '\<a href=\"(.*?)\"\>(.*?)\<\/a\>';

preg_match_all('/' . $regex . '/i', $str, $matches);

var_dump($matches);

Then you can just pick the ones you want to insert into mysql

Edited February 10, 2013 by doddsey_65

Zane · February 10, 2013

PHP's DOMDocument class is the most effective method for scraping... unless of course you just LOOOVE regex that much... I would imagine not.

Something like this should get you started

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->load('http://somewebsitepage.wtf');


$xpath = new DOMXPath($doc);
$nodes = $xpath->query("//div[@class='ProductPageNav'/a");

foreach ($nodes as $node) {
   echo $node->nodeValue();
}

suggys · February 10, 2013

Thank you guys I will give this a go tonight and let you know how i get one

Sign In

Read link text between div tags

Recommended Posts

suggys

Link to comment

Share on other sites

doddsey_65

Link to comment

Share on other sites

Zane

Link to comment

Share on other sites

suggys

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information