Jump to content

parsing a table with DOMXPath


abazoskib

Recommended Posts

here is the code I am using to parse the table:

 

<?php
$tidy = new tidy();
$tidy->parseString($page);
$tidy->cleanRepair();
//echo $tidy;

$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadHTML($tidy);

$xpath = new DOMXPath( $doc );
$elements = $xpath->query( "//html/body//table/*" );

foreach ( $elements as $item ) {
        $newDom = new DOMDocument;
        $newDom->appendChild($newDom->importNode($item,true));

        $xpath = new DOMXPath( $newDom );

        foreach ($item->attributes as $attribute) {
                for ($node = $item->firstChild; $node !== NULL; $node = $node->nextSibling) {
                        print($node->nodeValue);
                }
        }
        print("\n");
}
?>

 

Here is a sample of the data I am parsing:

 

	<table width="97%" align="center" border="0" cellpadding="4" class="borderTable" cellspacing="0">
	<tr bgcolor="#2a5a7c" class="trheader">
        	
		<th align="left"><a href="domain_a.cfm?int_order=1" class="trheader">Domain</a></th>

		<th align="left"><a href="domain_a.cfm?int_order=5" class="trheader">Registrar</a></th>
		<th align="center">Used</th>
		<th align="center">Live</th>

		<th align="center">Assigned</th>

		<th align="center">Purchased</th>
		<th align="center"><a href="domain_a.cfm?int_order=3" class="trheader">Enabled</a></th>

		<th align="center">Info</th>
		<th align="center">Edit</th>
		<th align="center">Delete</th>

	</tr>

<form action="domain_b.cfm" method="post">
    
    


    	
    	


    <tr id="t149"><td colspan="9" style="padding:0px;" bgcolor="#c6dbde" height="1"><img src="/images/space.gif" width="1" height="1" /></td></tr>
	<tr onmouseover="hilite(149);" onmouseout="hilite2(149);" bgcolor="#EEEEEE" style="border-top:thin" id="149">
        	
		<td><span  id="cf_tooltip_1257270905778">

<a href="http://xyxyxyxyxyxyx.com" target="_blank" class="standard">xyxyxyxyxyxyxyx.com</a>
</span></td>

		<td align="left">DomainSite</td>

		<td align="center"><a href="#" onclick="launchwin('iused149', '<span style=color:#2a5a7c;>Domain ID: 149 Usage History</span>', '/domains/noapp/u_info.cfm?p_domain_id=149',{width:800,height:300,center:true,modal:true});"><img src="/images/red.gif" border="0"/></a></td>
		<td align="center"><div id="s149" style="color:#FF6600"><span style='color:red'>-</span></div></td>
		<td align="center"><a href="#" onclick="launchwin('idomain149', '<span style=color:#2a5a7c;>Domain 149 Info</span>', '/domains/domain_info.cfm?p_domain_id=149',{width:300,height:220,center:true,modal:true});"><img src='/images/green.gif' alt='Yes' border='0'></a></td>

		<td align="center" width="100"><div id="e149"><img src='/images/green.gif' alt='Yes'></div></td>
		<td align="center"><img src='/images/green.gif' alt='Enabled'></td>

		<td align="center"><a href="#" onclick="launchwin('infod149', '<span style=color:#2a5a7c;>Information for Domain ID: 149</span>', '/domains/noapp/d_info.cfm?p_domain_id=149',{width:600});"><img src="/images/idetailpage.gif" border="0" alt="Domain Creation Log" /></a></td>
		<td align="center"><a href="#" onclick="launchwin('edomain149', '<span style=color:#2a5a7c;>Edit Domain</span>', '/domains/new_domain.cfm?edit=149',{width:350,height:250,center:true,modal:true});"><img src="/images/edit.gif" alt="Edit" border="0"></a></td>
		<td align="center"><a href="#" onclick="confirmation('Warning.  If you delete a Domain it will delete all Domain Links. Continue?','miniwin','/domains/domain_b.cfm?del=149');"><img src="/images/drop.gif" alt="Delete" border="0"></a></td>

	</tr>

I can get the link 'xyxyxyxyxyxyx.com' but I need to also get what images appear next to it. They signify the status of the url, green=live ,red=down. Here is what I am getting in output:

 

xyxyxyxyxyxyxyx.com
DomainSite

 

and nothing else.

Link to comment
https://forums.phpfreaks.com/topic/180153-parsing-a-table-with-domxpath/
Share on other sites

Forgot to mention, I know what is happening, I just cant get the information I need. Since there is nothing in the rest of the TD elements my program returns only the text that is available from each tr. I need to pull addition img src attributes from the same tr.

Not entirely sure what u want to do, but this will get the image src under the the 'Used' column:

 

$xpath = new DOMXPath( $doc );
$elements = $xpath->query( "//tr/td[3]/a/img/@src" );
        foreach($elements as $result){
                echo $result->textContent;
        }

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.