DOM

The Little Guy · May 29, 2009

I am trying to get the DOM of this file, but I don't think what I am doing with this file is working...

Here is the file I am reading:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<title><clp name="title" type="text"></clp></title> 
<link rel="stylesheet" href="style.css" />
</head>
<body>
<a href="main.php" class="logo"><img src="images/logo.png" alt="logo" /></a>
<div id="left">
	<clp construct="multiple">
		<clp name="navigation" type="link" construct="multiple"></clp>
		<clp name="sub navigation" type="link" construct="multiple"></clp>
	</clp>
</div>
<div id="right"><clp name="content" type="textarea"></clp></div>
</body>
</html>

And here is the code to "Supposedly" read the DOM of file, but I am not sure if it is working or not.

$filename = "./templates/{$_GET['template']}/{$_GET['style']}";
$doc = new DOMDocument();
$doc->load($filename);
$clp = $doc->getElementsByTagName("clp");
print_r($clp);

Thanks!

The Little Guy · May 29, 2009

All in all, I would really like to obtain an array of all the tags and the tags attribute/value pairs.

.josh · May 29, 2009

so what was wrong with the regex solution you said worked?

roopurt18 · May 29, 2009

Your markup is invalid. Does the DOM API work with invalid markup? I don't know the answer to that, but if it doesn't then that's why it's not working for you.

The Little Guy · May 29, 2009

so what was wrong with the regex solution you said worked?

Because Danial0 recommended DOM, so I wanted to give it a try, and I added a construct to my language.

I tried using: SimpleXML, It seems to read it, but I am not sure if it is returning all the correct values or not..

roopurt18 · May 29, 2009

You have to be very careful when parsing XHTML.

SimpleXML and DOMDocument are the best tools to use if they'll work. I've never used DOMDocument, but I can say the last time I used SimpleXML it barfs all over itself if the file is not valid XML to start with. If you can trust the validity of the document then either should work fine.

I've never used regexps to parse markup, but I have a coworker who tried (after SimpleXML failed due to the document being invalid markup). He basically ran into the problem of being unable to write a regexp that matches an opening tag with the appropriate closing tag. The problem is with nested elements, such as tables or divs. Say you try to write a regexp to match a div with id="foobar"; that is trivial. Now try to match it's appropriate closing div-tag when:

1) You don't know how deep in the page div::id="foobar" is (i.e. it has div parents)

2) You don't know how many div children div::id="foobar" has

The last resort (and most reliable) is to program a parser of your own. I mean a real parser that is a FSM that grabs either a character or a token at a time and switches between states to determine how the current token (or character) should be handled.

Sign In

DOM

Recommended Posts

The Little Guy

Link to comment

Share on other sites

The Little Guy

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

roopurt18

Link to comment

Share on other sites

The Little Guy

Link to comment

Share on other sites

roopurt18

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information