Jump to content

Read link text between div tags


suggys

Recommended Posts

Hi guys

 

I am attempting to create my own price checker feature for a site in a small niche. I am trying to do it my self as i am on a budget and the feature wont be used for profit, just to enhance my users experience as such.

 

My aim is to have my script check the urls in my database on a weekly basis for any updated price changes etc and at the min Ive got it working pretty much the way i want using preg_match but am stuck on the following.

 

I am trying to read the link text thats within a div.

 

<div class="ProductPageNav">
  <a href='Categories.asp'>Our Products</a>: <a href=COMPONENTS.htm' onmouseover="javascript:document.getCatPre.idcategory.value='40'; CatPrecallxml='1'; return runPreCatXML('cat_40');" onmouseout="javascript: CatPrecallxml=''; hidetip();">COMPONENTS</a> > <a href=c42.htm' onmouseover="javascript:document.getCatPre.idcategory.value='42'; CatPrecallxml='1'; return runPreCatXML('cat_42');" onmouseout="javascript: CatPrecallxml=''; hidetip();">Small Parts</a>
 </div>

 

There may be more, or less links within this div box, if someone could help me create the code to read each link and insert them into a friend in my database seperated by a comma ,

 

So in this example.. COMPONENTS, Small Parts should be extracted from the div and put in the database.

 

I would like it to ignore Our Products though?

 

Is this possible?

 

Thank you :)

Link to comment
Share on other sites

$str = '<div class="ProductPageNav">
<a href="Categories.asp">Our Products</a>: <a href="COMPONENTS.htm" onmouseover="javascript:document.getCatPre.idcategory.value="40"; CatPrecallxml="1"; return runPreCatXML("cat_40");" onmouseout="javascript: CatPrecallxml=""; hidetip();">COMPONENTS</a> > <a href="c42.htm" onmouseover="javascript:document.getCatPre.idcategory.value="42"; CatPrecallxml="1"; return runPreCatXML("cat_42");" onmouseout="javascript: CatPrecallxml=""; hidetip();">Small Parts</a>
</div>';

$regex = '\<a href=\"(.*?)\"\>(.*?)\<\/a\>';

preg_match_all('/' . $regex . '/i', $str, $matches);

var_dump($matches);

 

Then you can just pick the ones you want to insert into mysql

Edited by doddsey_65
Link to comment
Share on other sites

PHP's DOMDocument class is the most effective method for scraping... unless of course you just LOOOVE regex that much... I would imagine not.

 

Something like this should get you started

 

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->load('http://somewebsitepage.wtf');


$xpath = new DOMXPath($doc);
$nodes = $xpath->query("//div[@class='ProductPageNav'/a");

foreach ($nodes as $node) {
   echo $node->nodeValue();
}

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.