Jump to content

[SOLVED] Help with regex


AHA7

Recommended Posts

Hello,

 

I am struggling with a regex format and I am starting to lose it 

 

I want to use PHP's preg_match_all() function to search HTML files for <img> and <embed> tages and extract all the src URLs from those tags on a given HTML document. I want to cover all the possibilities and forms that those tages may be formated in.

 

Here's an example with all the matches highlighted:

 

<html>

<body>

<h1>Multimedia Page</h1>

< img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="

width="425" height="350"></embed> this is another flash object <embed

(there is a newline, a tab and a space characters seperating the rest of this tag from its opening <embed)

type="application/x-shockwave-flash" src="

width="425" height="350"></embed> Here is another image tag <IMG

(newline)

(new line and tab)

(new line)

SRC="http://ex.com/img.jpg" HEIGHT="10">...

<body>

</html>

 

The regex in words:

 

MATCH THE FOLLOWING: <img (or <IMG) followed by any character (including spaces, tabs newlines with any count) followed by src= (or SRC=) which may be followed by a single or double quotation mark followed by anything (this is the URL part which will be the first set of matches stored in the multi-dimentional array generated by preg_match_all()) followed by an optional single or double quotation mark followed by optional anything  (including spaces, tabs and newlines with any count) until the first > (not greedy) OR (|) MATCH THE FOLLOWING: the same scenario but this time for the <embed> tag and the URL (anything in regex) after src= as the second set of matches.

 

I know that the regex would be only one line long or so, but writing all the above is much simpler, at least to me!

Link to comment
https://forums.phpfreaks.com/topic/52451-solved-help-with-regex/
Share on other sites

Run and view source:

 

<pre>
<?php

$string = <<<STR
<html>
<body>
<h1>Multimedia Page</h1>
<img src="http://ex.com/img.jpg"> this is just an <img style='margin-top: 10px' src='http://ex.com/img.jpg' >example this is a falsh object <embed type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> this is another flash object <embed
type="application/x-shockwave-flash" src="http://www.youtube.com/v/azWRiwAmGRM" width="425" height="350"></embed> Here is another image tag <IMG



SRC="http://ex.com/img.jpg" HEIGHT="10">...
<body>
</html>
STR;

preg_match_all('/<(img|embed)[^>]*src=[\'"]([^\'"]+)/i', $string, $matches, PREG_SET_ORDER);
print_r($matches);

?>
</pre>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.