Jump to content


Photo

Improve regex for META tags


  • Please log in to reply
4 replies to this topic

#1 boby

boby
  • Members
  • PipPip
  • Member
  • 25 posts

Posted 05 August 2006 - 02:08 PM

Hello,

I am writing a script that out of a webpage.
Because the "get_meta_tags" function is very slow and cannot handle very well line breaks, I'm using following code:

Can someone please review this regex and if possible improve or suggest something better?
<?php

preg_match_all ('/<[\s]*meta[\s]*name[\s]*=[\s]*["\']?([^>"\']*)["\']?[\s]*content[\s]*=[\s]*["\']?([^>"\']*)["\']?[\s]*[\/]?[\s]*>/si', $content, $matches);

?>

Thank you very much!
Boby

#2 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 August 2006 - 04:48 PM

That's not a good approach because the attributes can be in any order. First match the whole meta tag, then get the attributes. Something like this.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#3 boby

boby
  • Members
  • PipPip
  • Member
  • 25 posts

Posted 05 August 2006 - 05:57 PM

Do you mean to get first the content between <head> and </head> or just meta tags? But how do I get just meta's

Is this OK?

preg_match_all ('/<[\s]*meta(.*)[\/]?[\s]*>/si', $content, $matches);


Thank you

#4 wildteen88

wildteen88
  • Staff Alumni
  • Advanced Member
  • 10,482 posts
  • LocationUK, Bournemouth

Posted 05 August 2006 - 06:44 PM

This what I have to get the meta tag contents:
<?php

$text = '<meta name="keywords" content="PHP, MySQL, bulletin, board, free, open, source, smf, simple, machines, forum" />';

if(preg_match("#<meta([^>]*)>#si", $text, $matches))
{
    //echo '<pre>' . htmlentities(print_r($matches, true)) . '</pre><br /><br />';

    //$matches[1] is what stores the contents of the meta tag
    $matches[1] = str_replace("/", '', $matches[1]);

    // put each attribute and its value into an array
    $attrs = preg_split("#(\"\s)#i", trim($matches[1]));

    // "e now create a meta array. The format of the array will be:
    // Array([attribute_name] => [attribute_value])
    foreach($attrs as $attr)
    {

        // we now get the attribute name and attribute value in sperate variables
        list($attr_name, $attr_value) = explode("=", $attr);

        // we create our meta array, trimming of any double quotes remaining in the attribute value string.
        $meta[$attr_name] = trim($attr_value, '"');
    }

    echo '<pre>' . print_r($meta, true) . '</pre>';
}

?>

I use the meta tag from the forum for reference.

The code outputs the following:
Array
(
    [name] => keywords
    [content] => PHP, MySQL, bulletin, board, free, open, source, smf, simple, machines, forum
)

EDIT Updated code to get each attribute into an array.

#5 boby

boby
  • Members
  • PipPip
  • Member
  • 25 posts

Posted 05 August 2006 - 08:16 PM

Woow, thank you :D

I will test it asap




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users