Jump to content

Finding A Position In A String To Display


1internet

Recommended Posts

Just the

tags? Or are you stripping out the

?

 

How are you getting $content? Is it just a string? Do you know if it's always valid HTML? What if it isn't? Is there always just the

and

? How else can $content vary?

As the others have stated we need to know exactly how the content can vary. But, IF the variable will only contain 1 pair of P tags then a simple regex will suffice. In fact, you can use regex if there are multiple P tag pairs as long as they are not nested and properly paired..

 

//If only one P tag pair in content
if(preg_match("#<p>[^<]+</p>#i", $content, $match))
{
   //Assign the paragraph to a variable
  $para = array_shift($match);
}
else
{
  $para = false;
}


//If multiple P tag pairs in content
if(preg_match_all("#<p>[^<]+</p>#i", $content, $matches))
{
  //Assign the paragraphs to an array
  $paraAry = array_shift($matches);
}
else
{
  $paraAry = false;
}

  On 11/27/2012 at 9:35 PM, Psycho said:

In fact, you can use regex if there are multiple P tag pairs as long as they are not nested and properly paired..

Properly paired is a definite requirement, but with that expression

#<p>[^<]+</p>#i

it's quite easy to turn it into something that can handle nested

s. You know, as an academic exercise.

#<p>([^<]+|(?!<p>)<|(?R))+</p>#i

Same as before but the contents of the tag are either a] normal-looking text, b] the start of an HTML tag that isn't "

", or c] the entire expression matched recursively.

  On 11/27/2012 at 9:55 PM, requinix said:

Properly paired is a definite requirement, but with that expression

#<p>[^<]+</p>#i

it's quite easy to turn it into something that can handle nested <p>s. You know, as an academic exercise.

#<p>([^<]+|(?!<p>)<|(?R))+</p>#i

Same as before but the contents of the tag are either a] normal-looking text, b] the start of an HTML tag that isn't "<p>", or c] the entire expression matched recursively.

 

That's beyond my skillset. But, in testing that code in the hopes of breaking it down it doesn't seem to be working for nested content. Using this as the content:

$content = '<h1>heading</h1><p>page content</p> <p>outer content 1 <p>Nested Content</p> outer content 2 </p>';

 

The regex is succeeding, but with 0 matches. I'm actually quite interested in this possible solution as I had to implement a workaround to a similar problem in some previous code and I'd like to go back and refactor if there is a simpler solution.

Succeeding? I tried and it does not, even though I can (thought I could) see how it should be able to match something, even if it's the wrong text.

 

Anyway, the middle part in the list was to exclude the delimiters. I made sure it wasn't "

" but didn't include "

". Together they're "?p>".

#<p>([^<]+|(?!</?p>)<|(?R))+</p>#i

$content = '<h1>heading</h1><p>page content</p> <p>outer content 1 <p>Nested Content</p> outer content 2 </p>';
$regex = '#<p>([^<]+|(?!</?p>)<|(?R))+</p>#i';

preg_match_all($regex, $content, $matches);
var_dump($matches);

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(19) "<p>page content</p>"
    [1]=>
    string(61) "<p>outer content 1 <p>Nested Content</p> outer content 2 </p>"
  }
  [1]=>
  array(2) {
    [0]=>
    string(12) "page content"
    [1]=>
    string(17) " outer content 2 "
  }
}

 

Without trying to hijack the topic, the basic form is

beginning delimiter ( valid content that isn't either delimiter | (?R) )+ ending delimiter

In this case your original expression defined the valid content to be "not a )". When trying to match paired parentheses the regex would look like

/
\(   # beginning delimiter
(
	[^()]+   # valid content is everything, besides a parenthesis (the delimiter)
	| (?R)   # recursion
)+
\)   # ending delimiter
/ix

The variable is coming from a database. It is actually search results I am trying to contain a snippet to give a brief description of the page. So actually I don't want the tags thinking about it, just the content inside, and I want to limit the characters to e.g. 300.

Does that make sense?

Sans regex method

 

<?php
$content = '<h1>heading1</h1><p>page content 1</p><h1>heading 2</h1><p>page content 2</p><h1>heading 3</h1><p>page content 3</p>';
$new = parasOnly ($content);
echo htmlentities($new);

function parasOnly($html)
{
   $pos1 = 0;
   $res = '';
   $k = substr_count($html, '<p>');
   for ($i=0; $i<$k; $i++) {
    $pos2 = strpos($html, '<p>', $pos1);
    $pos3 = strpos($html, '</p>', $pos2);
    $res .= substr($html, $pos2, $pos3-$pos2+4);
    $pos1 = $pos3;
   }
   return $res;
}
?>				    

RESULT:

<p>page content 1</p><p>page content 2</p><p>page content 3</p>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.