Jump to content

string search and replace


gevans

Recommended Posts

Hey guys, I'm currently using this function;

 

    function input_rte($input, $title = "", $edit = FALSE){
        if('get_magic_quotes_gpc') $input = stripslashes($input);
        $input = mysql_real_escape_string($input);
        $input = str_replace($this->old_string,'',$input);
        if(strpos($input,'<p>') !== FALSE && strpos($input,'</p>') !== FALSE) $input = str_replace($this->ie_fix1,$this->ie_fix2,$input);
        while(substr($input,-4) == '<br>' || substr($input,-5) == '<br/>' || substr($input,-6) == '<br />'){
            if(substr($input,-4 == '<br>')) $input = substr($input,0,-4);
            elseif(substr($input,-5 == '<br/>')) $input = substr($input,0,-5);
            elseif(substr($input,-6 == '<br />')) $input = substr($input,0,-6);
        }
        $input = str_replace('<br><br><br><br>','<br><br>',$input);
        $input = str_replace('<br>','<br />',$input);
        if(strpos($input, '<img') !== FALSE && strpos($input, '<img alt') === FALSE && strpos($input, '<img  alt') === FALSE) $input = str_replace("<img","<img alt=\"$title - content image\"",$input);
        if(strpos($input, '.jpg"><img') !== FALSE) $input = str_replace(".jpg\"><img",".jpg\"><img class=\"second\"",$input);
        $input = trim($input);
        $input = urlencode($input);
        $input = ($input == "") ? NULL : $input;
        return $input; 
    }

 

to sort some POST data from a rich text editor.

 

The next adition to the function is required to search for a link to a pdf.

 

The html would look like this;

 

<a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a>

 

Now what I'm trying to do (that I can't manage at the moment) is to find that bit of code and checking if it is a .pdf extension. If it is The html needs to be replaced with the following;

 

<div class="pdf">
<a href="the link in here" target="_blank" title="link text in here"><img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /></a>
<span class="title">title in here</span>
<span class="info">download pdf</span>
<a href="the link in here" target="_blank" title="link text in here" class="link">DOWNLOAD</a>
</div><div class="pdf-bot2"></div>

 

I'm not entirely sure the best way to do this. I was going to use a regular expression and preg replace. Anyone got any ideas what else i could do?

Link to comment
Share on other sites

Do you  mean something like this

<?php
$html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for update dating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html';
$html = preg_replace('%<a href=(["\'])(.*?\.pdf)\1>(.*)</a>%sim', "<div class=\"pdf\">\r\n<a href=\"the link in here\" target=\"_blank\" title=\"link text in here\"><img class=\"left\" src=\"images/pdf_download.png\" alt=\"Download PDF\" width=\"64\" height=\"74\" /></a>\r\n<span class=\"title\">title in here</span>\r\n<span class=\"info\">download pdf</span>\r\n<a href=\"\2\" target=\"_blank\" title=\"\3\" class=\"link\">DOWNLOAD</a>\r\n</div><div class=\"pdf-bot2\"></div>", $html );

echo $html;

?>

Link to comment
Share on other sites

It's nearly there, it outputs the following

 

<div class="pdf">
<a href="the link in here" target="_blank" title="link text in here"><img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /></a>
<span class="title">title in here</span>
<span class="info">download pdf</span>
<a href="" target="_blank" title="" class="link">DOWNLOAD</a>

</div><div class="pdf-bot2"></div>

 

The first a href needs to be a link (not just text) and the link and title use those strange characters in place of the regular expression caught text

Link to comment
Share on other sites

heres a quick update

<?php
$html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for update dating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html';
$html = preg_replace('%<a href=(["\'])(.*?\.pdf)\1>(.*)</a>%sim', "<div class=\"pdf\">\r\n<a href=\"\2\" target=\"_blank\" title=\"\3\"><img class=\"left\" src=\"images/pdf_download.png\" alt=\"Download PDF\" width=\"64\" height=\"74\" /></a>\r\n<span class=\"title\">\3</span>\r\n<span class=\"info\">download pdf</span>\r\n<a href=\"\2\" target=\"_blank\" title=\"\3\" class=\"link\">DOWNLOAD</a>\r\n</div><div class=\"pdf-bot2\"></div>", $html );

echo $html;

?>

 

what input are you using ? and what do you expect to see ?

Link to comment
Share on other sites

<pre>
<?php
$html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for updating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html';

$replace = <<<REPLACE
<div class="pdf">
<a href="$2" target="_blank" title="$3">
	<img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" />
</a>
<span class="title">$3</span>
<span class="info">download pdf</span>
<a href="$2" target="_blank" title="$3" class="link">DOWNLOAD</a>
</div>
<div class="pdf-bot2"></div>
REPLACE;

$html = preg_replace(
	'%<a href=([\'"])?((?(1).+?|[^\s>]+)\.pdf)(?(1)\1)>(.*?)</a>%si',
	$replace,
	$html
);

echo htmlspecialchars($html);
?>
</pre>

Link to comment
Share on other sites

Tried it with more complex input, didn't work as expected;

 

input html

 

<img alt="Pupil Launch - content image" src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/ks3_04.jpg">
<img src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/cbcv_03.jpg">
<div class="clearfix"></div>
<br />
<strong><span style="text-decoration: underline;">What the Challenge is all about</span></strong>
<br /><br />
The focus is on the website but the challenge involves a wide range of skills - research, presentation, innovative design, graphics, and project management skills are equally important. The Challenge can be used effectively to bring a work related learning element to many aspects of the curriculum including English (presentations); ICT; and Business Studies. It will also help to develop students' 'enterprise skills'.
<br /><br />
The <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a>
<br /><br />
<a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf">Website Challenge Entry Form</a>

 

output html

 

<img  alt="Pupil Launch - content image" src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/ks3_04.jpg"><img  src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/cbcv_03.jpg"><div class="clearfix"></div><br /><strong><span style="text-decoration: underline;">What the Challenge is all about</span></strong><br /><br />The focus is on the website but the challenge involves a wide range of skills - research, presentation, innovative design, graphics, and project management skills are equally important. The Challenge can be used effectively to bring a work related learning element to many aspects of the curriculum including English (presentations); ICT; and Business Studies. It will also help to develop students' 'enterprise skills'.<br /><br />The <div class="pdf">

   <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /><a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form">
      <img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" />
   </a>
   <span class="title">Website Challenge Entry Form</span>
   <span class="info">download pdf</span>
   <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /><a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form" class="link">DOWNLOAD</a>

</div>
<div class="pdf-bot2"></div>

 

excpected output

 

<img  alt="Pupil Launch - content image" src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/ks3_04.jpg"><img  src="http://thinking.uk.com/projects/ebp_cms/images/uploads/small/cbcv_03.jpg"><div class="clearfix"></div><br /><strong><span style="text-decoration: underline;">What the Challenge is all about</span></strong><br /><br />The focus is on the website but the challenge involves a wide range of skills - research, presentation, innovative design, graphics, and project management skills are equally important. The Challenge can be used effectively to bring a work related learning element to many aspects of the curriculum including English (presentations); ICT; and Business Studies. It will also help to develop students' 'enterprise skills'.<br /><br />

The <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br />

<div class="pdf">

   <a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form">
      <img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" />
   </a>
   <span class="title">Website Challenge Entry Form</span>
   <span class="info">download pdf</span>
   <a href="http://www.portsmouthebp.co.uk">Education Business Partnership</a><br /><br /><a href="http://thinking.uk.com/projects/ebp_cms/uploads/pdfs/autumn_07_1229439562_pdf.pdf" target="_blank" title="Website Challenge Entry Form" class="link">DOWNLOAD</a>

</div>
<div class="pdf-bot2"></div>

Link to comment
Share on other sites

Actually, it needs another tweak in case the attributes are not quoted:

%<a href=([\'"])?((?:(?!\1)[^>\s])+\.pdf)(?(1)\1)>(.*?)</a>%si

 

The first non-literal part of the regex looks for a single or double quote, which may not exist at all. Afterwards, it captures one character (that is not whitespace or ">") at a time, but only if it does not encounter the (optional) quote that it began with. In other words, if a single quote was found, match all of its contents up to the ending single quote; the same goes if a double quote was matched. If nothing was found, it stops at the end of the tag. It then backtracks to make sure the URL ends with ".pdf", matches the ending quote if one was found, the end of the tag, the rest of the content up to "</a>", and then "</a>" itself.

 

Keep in mind that this regex only works if no other attributes are present and the formatting is exact.

 

Here's a technical breakdown:

NODE                    EXPLANATION

----------------------------------------------------------------------

  <a href=                '<a href='

----------------------------------------------------------------------

  (                        group and capture to \1 (optional

                          (matching the most amount possible)):

----------------------------------------------------------------------

    ['"]                    any character of: ''', '"'

----------------------------------------------------------------------

  )?                      end of \1 (NOTE: because you're using a

                          quantifier on this capture, only the LAST

                          repetition of the captured pattern will be

                          stored in \1)

----------------------------------------------------------------------

  (                        group and capture to \2:

----------------------------------------------------------------------

    (?:                      group, but do not capture (1 or more

                            times (matching the most amount

                            possible)):

----------------------------------------------------------------------

      (?!                      look ahead to see if there is not:

----------------------------------------------------------------------

        \1                      what was matched by capture \1

----------------------------------------------------------------------

      )                        end of look-ahead

----------------------------------------------------------------------

      [^>\s]                  any character except: '>', whitespace

                              (\n, \r, \t, \f, and " ")

----------------------------------------------------------------------

    )+                      end of grouping

----------------------------------------------------------------------

    \.                      '.'

----------------------------------------------------------------------

    pdf                      'pdf'

----------------------------------------------------------------------

  )                        end of \2

----------------------------------------------------------------------

  (?(1)                    if back-reference \1 matched, then:

----------------------------------------------------------------------

    \1                      what was matched by capture \1

----------------------------------------------------------------------

  |                        else:

----------------------------------------------------------------------

                            succeed

----------------------------------------------------------------------

  )                        end of conditional on \1

----------------------------------------------------------------------

  >                        '>'

----------------------------------------------------------------------

  (                        group and capture to \3:

----------------------------------------------------------------------

    .*?                      any character (0 or more times (matching

                            the least amount possible))

----------------------------------------------------------------------

  )                        end of \3

----------------------------------------------------------------------

  </a>                    '</a>'

----------------------------------------------------------------------

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.