Jump to content

[SOLVED] Help with regex


AHA7

Recommended Posts

Hello,

 

I am struggling with a regex format and I am starting to lose it 

 

I want to use PHP's preg_match_all() function to search HTML files for <img> and <embed> tages and extract all the src URLs from those tags on a given HTML document. I want to cover all the possibilities and forms that those tages may be formated in.

 

Here's an example with all the matches highlighted:

 

<html>

<body>

<h1>Multimedia Page</h1>

< img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="

width="425" height="350"></embed> this is another flash object <embed

(there is a newline, a tab and a space characters seperating the rest of this tag from its opening <embed)

type="application/x-shockwave-flash" src="

width="425" height="350"></embed> Here is another image tag <IMG

(newline)

(new line and tab)

(new line)

SRC="http://ex.com/img.jpg" HEIGHT="10">...

<body>

</html>

 

The regex in words:

 

MATCH THE FOLLOWING: <img (or <IMG) followed by any character (including spaces, tabs newlines with any count) followed by src= (or SRC=) which may be followed by a single or double quotation mark followed by anything (this is the URL part which will be the first set of matches stored in the multi-dimentional array generated by preg_match_all()) followed by an optional single or double quotation mark followed by optional anything  (including spaces, tabs and newlines with any count) until the first > (not greedy) OR (|) MATCH THE FOLLOWING: the same scenario but this time for the <embed> tag and the URL (anything in regex) after src= as the second set of matches.

 

I know that the regex would be only one line long or so, but writing all the above is much simpler, at least to me!

Link to comment
Share on other sites

Run and view source:

 

<pre>
<?php

$string = <<<STR
<html>
<body>
<h1>Multimedia Page</h1>
<img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> this is another flash object <embed
type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> Here is another image tag <IMG



SRC="http://ex.com/img.jpg" HEIGHT="10">...
<body>
</html>
STR;

preg_match_all('/<(img|embed)[^>]*src=[\'"]([^\'"]+)/i', $string, $matches, PREG_SET_ORDER);
print_r($matches);

?>
</pre>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.