Jump to content

Matching Certain tags


cooldude832

Recommended Posts

I want to use preg_split to split a page about its <div><table><tr><td> tags, so I need a pattern that will match <div> or <table> or <tr>< or <td> any ideas and it has to be able to also handle the fact that a tag could have a styling on it or a class etc Ithink i need something like <div*> but I don't know the rest of it

Link to comment
Share on other sites

<?php
   $code = "Hello\n<div align=\"center\">foo=bar</div>";
   $matches = preg_split('/(\<.*?div.*?\>|\<.*?table.*?\>|\<.*?tr.*\>|\<.*?td.*\>)/', $code, -1, PREG_SPLIT_DELIM_CAPTURE);
   print_r($matches);
?>

A thanks would be nice ;)

Link to comment
Share on other sites

well my first issue was I wanted to remove all the info pre the body tag, I did it using explode, but not all body tags are all lower case, and some had some issues, again a regex issue I  tried "\<body*>\"; no good, if you got an idea to do a preg_split at that I'd love to see it,

 

that clears up some of it, but then in body <script> tags also are need to remove, my goal is to strip a page of everythign but container elements (<div>,<table><tr><td>)) whcih that pattern is doing for me, but then also kill all the script/css tags as those are special cases so I guess I need to find a replace ment for

<script*>*</script> and <style*>*</style>

 

Link to comment
Share on other sites

What about something like this?

 

<pre>
<?php
   $data = <<<DATA
<html>
	<head>
		<title>Title</title>
	</head>
	<body>
		<font>Font Tag</font>
		<div>Div Content</div>
		<div id="1">More Div Content</div>
		<b>Bold</b>
		<table>
			<tr>
				<td>A Cell</td>
			</tr>
		</table>
		<hr>
	</body>
</html>
DATA;
### Split on the begin/end tags of what is desired
### and pull some content along.
$matches = preg_split(
	'%(</?(?:div|t(?:able|[rd]))[^>]*>[^<]*)%',
	$data,
	-1,
	PREG_SPLIT_DELIM_CAPTURE
);
### For each match...
$num_matches = count($matches);
for ($i = 0; $i < $num_matches; $i++) {
	### Strip unwanted tags.
	$matches[$i] = strip_tags($matches[$i], '<div><table><tr><td>');
	### If the entry doesn't start with a "<" (tag) it wasn't
	### included in our split; thus, not desired.
	if (strpos($matches[$i], '<') !== 0) {
		unset($matches[$i]);
	}
	### Otherwise, escape it for viewing purposes.
	else {
		$matches[$i] = htmlspecialchars($matches[$i]);
	}
}
### Display.
print_r($matches);
?>
</pre>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.