Jump to content

DOMDocument or preg_match_all


Andy-H

Recommended Posts

I am writing a simple templating system for a website I'm making, basically I want to retrieve content from files and insert it into a document using jquery style selectors, I'm unsure whether to use DOMDocument or regular expressions, which one would be faster for this? I'm leaning for regex atm but I'm not too clued up on it:

 

 


<?php


$tag = 'span';
$pat = <<<PAT
   ~(.*?)<{$tag}.*id\s*=\s*["?|'?]testing["?|'?][^>]*>(.*?)</\s*{$tag}\s*>|\s*>|\s*/>(.*?)~i
PAT;
$htm = <<<HTM
test
<span id="test">Test</span>
<span class="testing" id="testing">Testing</span>
tredst
HTM;
preg_match_all($pat, $htm, $matches);
echo '<pre>'. htmlentities(print_r($matches, 1), ENT_QUOTES, 'UTF-8');


?>

OUTPUT:

Array
(
    [0] => Array
        (
            [0] => >
            [1] => >
            [2] => <span class="testing" id="testing">Testing</span>
        )

    [1] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )

    [2] => Array
        (
            [0] => 
            [1] => 
            [2] => Testing
        )

    [3] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )

)

 

Desired:

Array
(
    [0] => Array
        (
            [0] => test
<span id="test">Test</span>
            [1] => <span class="testing" id="testing">
            [2] => Testing
            [3] => </span>
tredst
        )
)

 

Any help appreciated.

 

P.S. Sorry if this should be in regex help, I was unsure as I also wanted advice on whether it was the right decision not to go with DOMDocument.

Link to comment
Share on other sites

If your wanting to parse HTML you should use DOMDocument.  Your HTML will have to be mostly valid though for it to work properly.  Parsing HTML with regex is generally considered a bad idea.  It works sometimes, I'll do it sometimes for one-off re-format scripts or small personal-use scrapers but for something like a template engine you'd be better off going with a real html parsing solution like domdocument.

 

Link to comment
Share on other sites

Now got

 

 



$tag = 'span';
$pat = <<<PAT
   ~(.*)(<{$tag}.*class\s*=\s*["?|'?]testing["?|'?][^>]*>)(.*?)(</\s*{$tag}\s*>)(.*)~is
PAT;
$htm = <<<HTM
<span>
   <span class="testing" id="test">Test</span>
   <span class="testing" id="testing">Testing</span>
</span>
HTM;
preg_match_all($pat, $htm, $matches, PREG_SET_ORDER);
echo '<pre>'. htmlentities(print_r($matches, 1));

 

 

Outputs:


Array
(
    [0] => Array
        (
            [0] => <span>
   <span class="testing" id="test">Test</span>
   <span class="testing" id="testing">Testing</span>
</span>
            [1] => <span>
   <span class="testing" id="test">Test</span>
   
            [2] => <span class="testing" id="testing">
            [3] => Testing
            [4] => </span>
            [5] => 
</span>
        )
)

 

 

Desired output:


Array
(
    [0] => Array
        (
            [0] => <span>
   <span class="testing" id="test">Test</span>
   <span class="testing" id="testing">Testing</span>
</span>
            [1] => <span>
   
            [2] => <span class="testing" id="test">
            [3] => Test
            [4] => </span>
            [5] => <span class="testing" id="testing">Testing</span>
</span>
        )
    [1] => Array
        (
            [0] => <span>
   <span class="testing" id="test">Test</span>
   <span class="testing" id="testing">Testing</span>
</span>
            [1] => <span>
   <span class="testing" id="test">Test</span>
   
            [2] => <span class="testing" id="testing">
            [3] => Testing
            [4] => </span>
            [5] => 
</span>
        )
)

Link to comment
Share on other sites

If your wanting to parse HTML you should use DOMDocument.  Your HTML will have to be mostly valid though for it to work properly.  Parsing HTML with regex is generally considered a bad idea.  It works sometimes, I'll do it sometimes for one-off re-format scripts or small personal-use scrapers but for something like a template engine you'd be better off going with a real html parsing solution like domdocument.

 

 

Ok, so scrap the regex idea, thanks.

 

 

I also have another problem, I want to be able to call templates like so:

 

 

$Page = (new Template('default', ['site' => 'b2c']))->getContent('slider')->insertAfter('#header');

 

 

However, after calling getContent, I want it to return another object for the insertAfter, rather than update a class member to hold the content, this way the insertBefore/after / append/prependTo methods are only exposed when content is loaded, is this the right way to go?

 

 

Here's what I have so far.

Template.class.php


namespace phantom\classes\templating;
class Template {
   protected $_pageContent;
   
   public function __construct($template, array $data = array())
   {
      $this->_pageContent = $this->_getContent($template, $data);
   }
   public function getContent($file_name, array $data = array())
   {
      return new Content($this->_getContent($file_name, $data), $this);
   }
   public function querySelector($selector)
   {
      $selector = expolode('#', $selector);
      $tag      = $selector[0];
      $match    = $selector[1];
   }
   protected function _getContent($file_name, array $data = array())
   {
      ob_start();
      extract($data, EXTR_SKIP);
      include 'templates'. DIRECTORY_SEPARATOR . $file_name .'.tmpl.php';
      return ob_get_clean();
   }
}

Content.class.php


namespace phantom\classes\templating;
class Content {
   protected $_content;
   protected $_template;
   
   public function __construct($content, Template $tmpl)
   {
      $this->_content  = $content;
      $this->_template = $tmpl;
   }
   public function insertBefore($tag)
   {
      
   }
   public function insertAfter($tag)
   {
      
   }
   public function appendTo($tag)
   {
      
   }
   public function prependTo($tag)
   {
      
   }
}

 

 

But now I am unsure of how to update the template contents without exposing public methods to set the content??

Link to comment
Share on other sites

OK, I now have:

Template.class.php


<?php
namespace phantom\classes\templating;
class Template {
   protected $_document;
   
   public function __construct($tmpl_file, array $data = array(), $ver = '4.01', $enc = 'UTF-8')
   {
      $this->_document = new DOMDocument($ver, $enc);
      $this->_document->loadHTML($this->_getContent($tmpl_file, $data));
   }
   public function getContent($file_name, array $data = array())
   {
      return new Content($this->_getContent($file_name, $data), $this->_document);
   }
   protected function _getContent($file_name, array $data = array())
   {
      ob_start();
      extract($data, EXTR_SKIP);
      include 'templates'. DIRECTORY_SEPARATOR . $file_name .'.tmpl.php';
      return ob_get_clean();
   }
}

Content.class.php


<?php
namespace phantom\classes\templating;
class Content {
   protected $_content;
   protected $_document;
   
   public function __construct($content, DOMDocument $document)
   {
      $this->_content  = $content;
      $this->_document = $document;
   }
   public function insertBefore($tag)
   {
      $this->_getElement($tag);
   }
   public function insertAfter($tag)
   {
      
   }
   public function appendTo($tag)
   {
      
   }
   public function prependTo($tag)
   {
      
   }
   protected function _getElement($tag)
   {
      if ( substr($tag, 0, 1) == '#' )
         return $this->_document->getElementById(substr($tag, 1));
   }
}

 

 

I am now stuck as to how to convert a HTML string into a document fragment, I know you can do this with well-formed XHTML, however, I am using HTML 4.01, anyone got any ideas how I could do something along the lines of:

 

 

$DOMDocument->loadFragment('<div class="slider"><h1>Tracking vehichles</h1><p>Blah blah blah</p><>')->insertAfter(DOMNode);

 

 

?? thanks for any help.

Link to comment
Share on other sites

 

 

$doc = new DOMDocument('4.01', 'UTF-8');
$doc->loadHTML('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
   <head>
      <title>Phantom - Tracking Ststems and Accessories</title>
      <!-- META //-->
      <meta name="description"
         content="" >
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" >
      <!-- LINKS AND SCRIPTS //-->
      <link rel="stylesheet" href="css/layout.css" type="text/css" >
   </head>
   <body>
      <div id="clouds"><>
      <div id="container">
         <!-- HEADER //-->
         <div id="header">
            <h1>
               <a href="/">
                  <img src="images/layout/b2c/phantom.png" alt="Phantom vehicle tracking and accessories" >
               </a>
            </h1>
            <div id="header_right">
               <ul id="header_nav" class="navigation">
                  <li class="first"><a href="#">About</a></li>
                  <li><a href="#">News</a></li>
               </ul>
               <img class="telephone" src="images/layout/b2c/tel.png" alt="Telephone number" >
            <>
         <>
         <!-- SLIDER //-->
         <div id="slider">
         <>
         <!-- PRODUCT NAVLIGATION IMAGES //-->
         <ul id="product_navigation">
            <li class="first"><a href="#" id="remap">Engine ECU remapping</a></li>
            <li><a href="#" id="tyre-pro">Tyre protector</a></li>
            <li><a href="#" id="sat-dish">Caravan and motorhome satellite dish</a></li>
            <li><a href="#" id="reverse-sensor">Reverse sensor</a></li>
            <li><a href="#" id="in-car-cam">In car camera</a></li>
            <li><a href="#" id="alarms">Caravan and motorhome alarms</a></li>
            <li><a href="#" id="tracking">Caravan and motorhome tracking</a></li>
            <li><a href="#" id="subs">Renew tracking subscription</a></li>
         </ul>
         <!-- CONTENT //-->
         <div id="content">
            <div class="clr"><>
         <>
      <>
      <!-- FOOTER //-->
      <div id="footer">
         <div id="logo"><>
         <div class="green_banner">
            <div id="motto">Protect, Secure, Enjoy<>
         <>
         <div class="blue_banner"><>
      <>
   </body>
</html>');
$frag = $doc->createDocumentFragment();
$frag->appendXML('
         <!-- MAIN NAVIGATION //-->
         <ul id="main_nav" class="navigation">
            <li class="first"><a href="#">Home</a></li>
            <li><a href="#">Tracking</a></li>
            <li><a href="#">Remapping</a></li>
            <li><a href="#">Tyre protector</a></li>
            <li><a href="#">Alarms</a></li>
            <li><a href="#">Cameras and sensors</a></li>
            <li><a href="#">Insurance</a></li>
            <li><a href="#">Contact us</a></li>
         </ul>');
$doc->getElementById('container')->insertBefore($frag, $doc->getElementById('slider'));
echo $doc->saveHTML();

 

 

Seems to work quite well, as long as I add the /> for non-closing tags, but it outputs them correctly :)

 

 

Cheers

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.