Finding A Position In A String To Display

1internet · November 27, 2012

I have

$content = '<h1>heading</h1><p>page content</p>'

in a variable. How can I create another variable just of the p tags

i.e. $newContent = '<p>page content</p>'

Jessica · November 27, 2012

You'll want to use a DOM Parser. HTML is too complex to be handled by string functions, and most regex.

requinix · November 27, 2012

Just the

tags? Or are you stripping out the

?

How are you getting $content? Is it just a string? Do you know if it's always valid HTML? What if it isn't? Is there always just the

and

? How else can $content vary?

Psycho · November 27, 2012

As the others have stated we need to know exactly how the content can vary. But, IF the variable will only contain 1 pair of P tags then a simple regex will suffice. In fact, you can use regex if there are multiple P tag pairs as long as they are not nested and properly paired..

//If only one P tag pair in content
if(preg_match("#<p>[^<]+</p>#i", $content, $match))
{
   //Assign the paragraph to a variable
  $para = array_shift($match);
}
else
{
  $para = false;
}


//If multiple P tag pairs in content
if(preg_match_all("#<p>[^<]+</p>#i", $content, $matches))
{
  //Assign the paragraphs to an array
  $paraAry = array_shift($matches);
}
else
{
  $paraAry = false;
}

requinix · November 27, 2012

In fact, you can use regex if there are multiple P tag pairs as long as they are not nested and properly paired..

Properly paired is a definite requirement, but with that expression

#<p>[^<]+</p>#i

it's quite easy to turn it into something that can handle nested

s. You know, as an academic exercise.

#<p>([^<]+|(?!<p>)<|(?R))+</p>#i

Same as before but the contents of the tag are either a] normal-looking text, b] the start of an HTML tag that isn't "

", or c] the entire expression matched recursively.

Psycho · November 27, 2012

Properly paired is a definite requirement, but with that expression
#<p>[^<]+</p>#i
it's quite easy to turn it into something that can handle nested <p>s. You know, as an academic exercise.
#<p>([^<]+|(?!<p>)<|(?R))+</p>#i
Same as before but the contents of the tag are either a] normal-looking text, b] the start of an HTML tag that isn't "<p>", or c] the entire expression matched recursively.

That's beyond my skillset. But, in testing that code in the hopes of breaking it down it doesn't seem to be working for nested content. Using this as the content:

$content = '<h1>heading</h1><p>page content</p> <p>outer content 1 <p>Nested Content</p> outer content 2 </p>';

The regex is succeeding, but with 0 matches. I'm actually quite interested in this possible solution as I had to implement a workaround to a similar problem in some previous code and I'd like to go back and refactor if there is a simpler solution.

requinix · November 27, 2012

Succeeding? I tried and it does not, even though I can (thought I could) see how it should be able to match something, even if it's the wrong text.

Anyway, the middle part in the list was to exclude the delimiters. I made sure it wasn't "

" but didn't include "

". Together they're "?p>".

#<p>([^<]+|(?!</?p>)<|(?R))+</p>#i

$content = '<h1>heading</h1><p>page content</p> <p>outer content 1 <p>Nested Content</p> outer content 2 </p>';
$regex = '#<p>([^<]+|(?!</?p>)<|(?R))+</p>#i';

preg_match_all($regex, $content, $matches);
var_dump($matches);

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(19) "<p>page content</p>"
    [1]=>
    string(61) "<p>outer content 1 <p>Nested Content</p> outer content 2 </p>"
  }
  [1]=>
  array(2) {
    [0]=>
    string(12) "page content"
    [1]=>
    string(17) " outer content 2 "
  }
}

Without trying to hijack the topic, the basic form is

beginning delimiter ( valid content that isn't either delimiter | (?R) )+ ending delimiter

In this case your original expression defined the valid content to be "not a )". When trying to match paired parentheses the regex would look like

/
\(   # beginning delimiter
(
	[^()]+   # valid content is everything, besides a parenthesis (the delimiter)
	| (?R)   # recursion
)+
\)   # ending delimiter
/ix

1internet · November 28, 2012

The variable is coming from a database. It is actually search results I am trying to contain a snippet to give a brief description of the page. So actually I don't want the tags thinking about it, just the content inside, and I want to limit the characters to e.g. 300.

Does that make sense?

Barand · November 28, 2012

Sans regex method

<?php
$content = '<h1>heading1</h1><p>page content 1</p><h1>heading 2</h1><p>page content 2</p><h1>heading 3</h1><p>page content 3</p>';
$new = parasOnly ($content);
echo htmlentities($new);

function parasOnly($html)
{
   $pos1 = 0;
   $res = '';
   $k = substr_count($html, '<p>');
   for ($i=0; $i<$k; $i++) {
    $pos2 = strpos($html, '<p>', $pos1);
    $pos3 = strpos($html, '</p>', $pos2);
    $res .= substr($html, $pos2, $pos3-$pos2+4);
    $pos1 = $pos3;
   }
   return $res;
}
?>				    

RESULT:

<p>page content 1</p><p>page content 2</p><p>page content 3</p>

Sign In

Finding A Position In A String To Display

Recommended Posts

1internet

Link to comment

Share on other sites

Jessica

Link to comment

Share on other sites

requinix

?

and

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

Psycho

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

1internet

Link to comment

Share on other sites

Barand

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information