Jump to content

preg_match for css style


sachavdk

Recommended Posts

I'm trying to get the css code out of the style tags of a html documents like this:

<style type="text/css">

<!--

.style1 {

font-size: 11px;

}

-->

</style>

I'm trying to get

<!--

.style1 {

font-size: 11px;

}

-->

 

This is what I have now

 

preg_match("/(<style type=\"text\/css\">)+([.])+(<\/style>)/", $fread, $style);

Link to comment
Share on other sites

Both methods give errors.

The first one gives

Warning: preg_match() [function.preg-match]: Unknown modifier 'c' in ...

The second:

Warning: preg_match() [function.preg-match]: Compilation failed: unmatched parentheses at offset 53 in ...

Link to comment
Share on other sites

preg_match("@<style[^>]+(type=['\"]?[^'\"]+['\"]?)?[^>]+?>(.+?)</style>@is", $fread, $style);
print_r($style);

My bad, had an extra parenthesis.

Well, I'll go through all of the "regex" parts, because I don't think the plain text parts need explanation.

 

[^>]+ - This piece of regex says to search for any characters except the closing HTML tag.  This was put between style and type in case the user maybe had an id attribute, or had multiple spaces.

(type=['\"]?[^\"]+['\"]?)? - This says to search the string for everything in the parenthesis.  The question mark indicates that whatever is in the parenthesis doesn't HAVE to be there, and may not, in case the type="" is omitted, this will still find the user's style.

['\"]? - Some users use single quotes, some use double, some don't use any.  That's what this handles.

[^'\"]+ - This will match anything inside of the type="" tag, searching for any character except a quote.

[^>]+? - This was strapped on there in case there was anything after the type="" tag.

(.+?) - This will match any character between the style tags.

Link to comment
Share on other sites

maybe some last questions, what is the "is" doing after the @delimiter?

and if I just replace style with body, it doesn't work

 

I'm doing:

 

$fread = fread($cont, filesize($file));

preg_match("@<body[^>]+(type=['\"]?[^'\"]+['\"]?)?[^>]+?>(.+?)</body>@is", $fread, $fbody);

preg_match("@<style[^>]+(type=['\"]?[^'\"]+['\"]?)?[^>]+?>(.+?)</style>@is", $fread, $fstyle);

 

$fstyle[2] contains the content between the style tags, but $fbody[2] is empty.

and last, shouldn't $fread[1] contain "text/css"? because it is empty...

Link to comment
Share on other sites

$fread[1] will not contain text/css, $fstyle[1] should contain that.

 

i and s are pattern modifiers.

i (PCRE_CASELESS)

    If this modifier is set, letters in the pattern match both upper and lower case letters.

s (PCRE_DOTALL)

    If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.