Jump to content

Finding A Position In A String To Display


1internet

Recommended Posts

As the others have stated we need to know exactly how the content can vary. But, IF the variable will only contain 1 pair of P tags then a simple regex will suffice. In fact, you can use regex if there are multiple P tag pairs as long as they are not nested and properly paired..

 

//If only one P tag pair in content
if(preg_match("#<p>[^<]+</p>#i", $content, $match))
{
   //Assign the paragraph to a variable
  $para = array_shift($match);
}
else
{
  $para = false;
}


//If multiple P tag pairs in content
if(preg_match_all("#<p>[^<]+</p>#i", $content, $matches))
{
  //Assign the paragraphs to an array
  $paraAry = array_shift($matches);
}
else
{
  $paraAry = false;
}

Edited by Psycho
Link to comment
Share on other sites

In fact, you can use regex if there are multiple P tag pairs as long as they are not nested and properly paired..

Properly paired is a definite requirement, but with that expression

#<p>[^<]+</p>#i

it's quite easy to turn it into something that can handle nested

s. You know, as an academic exercise.

#<p>([^<]+|(?!<p>)<|(?R))+</p>#i

Same as before but the contents of the tag are either a] normal-looking text, b] the start of an HTML tag that isn't "

", or c] the entire expression matched recursively.

Edited by requinix
Link to comment
Share on other sites

Properly paired is a definite requirement, but with that expression

#<p>[^<]+</p>#i

it's quite easy to turn it into something that can handle nested <p>s. You know, as an academic exercise.

#<p>([^<]+|(?!<p>)<|(?R))+</p>#i

Same as before but the contents of the tag are either a] normal-looking text, b] the start of an HTML tag that isn't "<p>", or c] the entire expression matched recursively.

 

That's beyond my skillset. But, in testing that code in the hopes of breaking it down it doesn't seem to be working for nested content. Using this as the content:

$content = '<h1>heading</h1><p>page content</p> <p>outer content 1 <p>Nested Content</p> outer content 2 </p>';

 

The regex is succeeding, but with 0 matches. I'm actually quite interested in this possible solution as I had to implement a workaround to a similar problem in some previous code and I'd like to go back and refactor if there is a simpler solution.

Edited by Psycho
Link to comment
Share on other sites

Succeeding? I tried and it does not, even though I can (thought I could) see how it should be able to match something, even if it's the wrong text.

 

Anyway, the middle part in the list was to exclude the delimiters. I made sure it wasn't "

" but didn't include "

". Together they're "?p>".

#<p>([^<]+|(?!</?p>)<|(?R))+</p>#i

$content = '<h1>heading</h1><p>page content</p> <p>outer content 1 <p>Nested Content</p> outer content 2 </p>';
$regex = '#<p>([^<]+|(?!</?p>)<|(?R))+</p>#i';

preg_match_all($regex, $content, $matches);
var_dump($matches);

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(19) "<p>page content</p>"
    [1]=>
    string(61) "<p>outer content 1 <p>Nested Content</p> outer content 2 </p>"
  }
  [1]=>
  array(2) {
    [0]=>
    string(12) "page content"
    [1]=>
    string(17) " outer content 2 "
  }
}

 

Without trying to hijack the topic, the basic form is

beginning delimiter ( valid content that isn't either delimiter | (?R) )+ ending delimiter

In this case your original expression defined the valid content to be "not a )". When trying to match paired parentheses the regex would look like

/
\(   # beginning delimiter
(
	[^()]+   # valid content is everything, besides a parenthesis (the delimiter)
	| (?R)   # recursion
)+
\)   # ending delimiter
/ix

Link to comment
Share on other sites

The variable is coming from a database. It is actually search results I am trying to contain a snippet to give a brief description of the page. So actually I don't want the tags thinking about it, just the content inside, and I want to limit the characters to e.g. 300.

Does that make sense?

Link to comment
Share on other sites

Sans regex method

 

<?php
$content = '<h1>heading1</h1><p>page content 1</p><h1>heading 2</h1><p>page content 2</p><h1>heading 3</h1><p>page content 3</p>';
$new = parasOnly ($content);
echo htmlentities($new);

function parasOnly($html)
{
   $pos1 = 0;
   $res = '';
   $k = substr_count($html, '<p>');
   for ($i=0; $i<$k; $i++) {
    $pos2 = strpos($html, '<p>', $pos1);
    $pos3 = strpos($html, '</p>', $pos2);
    $res .= substr($html, $pos2, $pos3-$pos2+4);
    $pos1 = $pos3;
   }
   return $res;
}
?>				    

RESULT:

<p>page content 1</p><p>page content 2</p><p>page content 3</p>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.