Jump to content

Archived

This topic is now archived and is closed to further replies.

boby

Improve regex for META tags

Recommended Posts

Hello,

I am writing a script that out of a webpage.
Because the "get_meta_tags" function is very slow and cannot handle very well line breaks, I'm using following code:

Can someone please review this regex and if possible improve or suggest something better?
[code=php]<?php

preg_match_all ('/<[\s]*meta[\s]*name[\s]*=[\s]*["\']?([^>"\']*)["\']?[\s]*content[\s]*=[\s]*["\']?([^>"\']*)["\']?[\s]*[\/]?[\s]*>/si', $content, $matches);

?>[/code]

Thank you very much!
Boby

Share this post


Link to post
Share on other sites
That's not a good approach because the attributes can be in any order. First match the whole meta tag, then get the attributes. Something like [url=http://www.phpfreaks.com/forums/index.php/topic,96844.msg387986.html#msg387986]this[/url].

Share this post


Link to post
Share on other sites
Do you mean to get first the content between <head> and </head> or just meta tags? But how do I get just meta's

Is this OK?

[code=php:0]preg_match_all ('/<[\s]*meta(.*)[\/]?[\s]*>/si', $content, $matches);[/code]


Thank you

Share this post


Link to post
Share on other sites
This what I have to get the meta tag contents:
[code=php:0]<?php

$text = '<meta name="keywords" content="PHP, MySQL, bulletin, board, free, open, source, smf, simple, machines, forum" />';

if(preg_match("#<meta([^>]*)>#si", $text, $matches))
{
    //echo '<pre>' . htmlentities(print_r($matches, true)) . '</pre><br /><br />';

    //$matches[1] is what stores the contents of the meta tag
    $matches[1] = str_replace("/", '', $matches[1]);

    // put each attribute and its value into an array
    $attrs = preg_split("#(\"\s)#i", trim($matches[1]));

    // "e now create a meta array. The format of the array will be:
    // Array([attribute_name] => [attribute_value])
    foreach($attrs as $attr)
    {

        // we now get the attribute name and attribute value in sperate variables
        list($attr_name, $attr_value) = explode("=", $attr);

        // we create our meta array, trimming of any double quotes remaining in the attribute value string.
        $meta[$attr_name] = trim($attr_value, '"');
    }

    echo '<pre>' . print_r($meta, true) . '</pre>';
}

?>[/code]

I use the meta tag from the forum for reference.

The code outputs the following:
[code]Array
(
    [name] => keywords
    [content] => PHP, MySQL, bulletin, board, free, open, source, smf, simple, machines, forum
)[/code]

[b]EDIT[/b] Updated code to get each attribute into an array.

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.