jk2010 Posted January 2, 2010 Share Posted January 2, 2010 hi guys, how can i use preg_match or any other condition when trying to extract information from a html page? I've tried preg_match but i get an error because i'm using it in xpath query. This is my query code: $prod_quicklinx_node = $xpath->query('//table[@class=maintbl]/descendant::div/td[@class=textblacksmblue]'); $prod_mfr_node = $xpath->query('//table[@class=maintbl]/descendant::td[@class=textblacksm] and not(contains(@*, "Manufacturer"))'); I don't get no results. SOURCE CODE OF HTML ENTITIES (i only need the BOLD items, code, mfr name and mfr#) <tbody><tr> <td align="left" nowrap="nowrap"> <a href="javascript:MM_openBrWindow('http://img.misco.co.uk/images/uploadedimages/large/20091027141548.jpg','LargeImage','scrollbars=no,width=350,height=350')" class="details"> <img src="/images/itemdetails/icon-enlarge.gif" alt="" align="absmiddle" border="0" hspace="0"></a> <!--<img src="http://img3.misco.co.uk/images/misc/pixel-clr.gif" width="2" height="1" alt="">--> <a href="/applications/email/emailafriend.asp"> <img src="/images/itemdetails/icon-email.gif" alt="" align="absmiddle" border="0" hspace="6"></a><a href="http://www.misco.co.uk/applications/SearchTools/item-details-print.asp?EdpNo=336830&Sku=Q151273"><img src="/images/itemdetails/icon-print.gif" alt="" align="absmiddle" border="0" hspace="4"></a> </td> <td width="44" align="right" valign="top"><img src="http://img1.misco.co.uk/images/itemdetails/itemtitle_yellowleft.gif" alt="" width="44" height="24"></td> <td style="background-image: url(http://img.misco.co.uk/images/itemdetails/itemtitle_yellow_bg.gif); background-repeat: repeat-x;" class="textblackmed" width="340" valign="middle"> <table width="100%" border="0" cellpadding="0" cellspacing="0" height="18"> <tbody><tr valign="top"> <td class="textblacksm" width="35" nowrap="nowrap">Misco No: </td> [b]<td class="textblacksmblue" width="40%"><b>Q151273</b></td>[/b] <td align="right" nowrap="nowrap"> <table border="0" cellpadding="3" cellspacing="0"> <tbody><tr valign="top"> <td><div style="position: relative; top: -3px;"><a href="javascript:void(0);" onclick="postReview();" alt="Add Review" style="font-size: 12px;"> <img src="/images/itemdetails/ADD_REVI.GIF" alt="Add Review" border="0"> </a></div></td></tr></tbody></table></td></tr> <!--</td> </tr>--> </tbody></table> </td> <td width="18" align="right" valign="top"><img src="http://img3.misco.co.uk/images/itemdetails/itemtitle_yellowright1.gif" alt="" width="18" height="24"></td> </tr> <tr> <td></td> <td></td> <td align="left"> </td> <td></td> </tr> <tr> <td></td> <td></td> <td align="left"> <table> <tbody><tr><td class="textblacksm" width="110">Manufacturer:</td>[b]<td class="textblacksm" nowrap="nowrap"> <strong>Canon </strong> </td>[/b]</tr> <tr><td class="textblacksm" width="110">Manufacturer Part No:</td>[b]<td class="textblacksm" nowrap="nowrap"> <strong>2925B008AA </strong> </td>[/b] </tr> </tbody></table> </td> <td></td> </tr> </tbody> Thanks for any help you can provide. cheers jari Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/ Share on other sites More sharing options...
salathe Posted January 2, 2010 Share Posted January 2, 2010 One way would to be grab the table which wraps all of your required information, then use a couple of XPath queries looking for the specific details within that wrapping table. For example: $table = $xpath->query('//table[4]')->item(0); $code = $xpath->query("//td[starts-with(text(), 'Misco')]/following-sibling::td/b", $table)->item(0)->nodeValue; $man = rtrim($xpath->query("//td[.='Manufacturer:']/following-sibling::td/strong", $table)->item(0)->nodeValue); $part = rtrim($xpath->query("//td[.='Manufacturer Part No:']/following-sibling::td/strong", $table)->item(0)->nodeValue); var_dump($code, $man, $part); Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987113 Share on other sites More sharing options...
jk2010 Posted January 2, 2010 Author Share Posted January 2, 2010 One way would to be grab the table which wraps all of your required information, then use a couple of XPath queries looking for the specific details within that wrapping table. For example: $table = $xpath->query('//table[4]')->item(0); $code = $xpath->query("//td[starts-with(text(), 'Misco')]/following-sibling::td/b", $table)->item(0)->nodeValue; $man = rtrim($xpath->query("//td[.='Manufacturer:']/following-sibling::td/strong", $table)->item(0)->nodeValue); $part = rtrim($xpath->query("//td[.='Manufacturer Part No:']/following-sibling::td/strong", $table)->item(0)->nodeValue); var_dump($code, $man, $part); Hi Salathe, thank you for the reply mate. I'll give this a go but one problem i have is that the wrapping TABLE does not have any name, class or id and also all the three elements are in a "td" that has the same class which is <td class="textblacksm">. would this make any difference? i'm going to try ur idea first anyway. cheers Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987158 Share on other sites More sharing options...
jk2010 Posted January 2, 2010 Author Share Posted January 2, 2010 hi salathe thanks for you help, got it working now. just one thing one the price side the result looks a bit funny, how can i clean it up? [Price] => £11.74 inc VAT          thanks a lot. jari Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987173 Share on other sites More sharing options...
salathe Posted January 2, 2010 Share Posted January 2, 2010 That's a character encoding issue unrelated to XPath. It's probably better to ask the question in a new thread. Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987175 Share on other sites More sharing options...
cags Posted January 2, 2010 Share Posted January 2, 2010 That looks like it's to do with character encoding. I'd suggest you set the encoding of the page it's being output on to UTF-8 either through the HTML in the <head> section with meta tags or with the header function. Edit: salathe beat me to it. Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987176 Share on other sites More sharing options...
jk2010 Posted January 2, 2010 Author Share Posted January 2, 2010 I'm grateful guys. thanks for your time. cheers. BTW, are the posts on this forum editable? cuz i couldn't find any link for editing. Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987182 Share on other sites More sharing options...
cags Posted January 2, 2010 Share Posted January 2, 2010 Only within 10 minutes of posting them. Mine was technically a 'Faux Edit' (as you can tell by the fact it doesn't have an edit time at the bottom). I clicked post and it warned me salathe had posted so I stuck the disclaimer in before hitting submit again. Quote Link to comment https://forums.phpfreaks.com/topic/186927-extract-specific-text-from-html-elements-using-xpath-help-needed/#findComment-987214 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.