Jump to content

[SOLVED] URL Capturing RegEx !


d.shankar

Recommended Posts

I have this three regex .. Each of them retrieve links which are really distinct from another.

 

 

preg_match_all("/<a (?:.*?)href=\"([^\"]+?)\"(?:[^>]*?)>/si", $src, $val);
preg_match_all("/<a[s]+[^>]*hrefs*=s*[\"']?([^'\" >]+)['\" >]/s",$src,$val);
preg_match_all("/href=\"(.*?)\"|<frame.*?src=\"(.*?)\"/",$src,$val);

 

 

 

Is it possible the three regex to united to a single thing ?

Link to comment
https://forums.phpfreaks.com/topic/64383-solved-url-capturing-regex/
Share on other sites

Thanks for reply..

 

I have a HTML source like this and i need to extract the values embedded inside the href tags

 

<html>
.....
<a href="www.google.com">click</a>
<a href=www.yahoo.com>click</a> NOTE: here there is no double quote
<a href='www.yahoo.com'>click</a> NOTE: here there is a single quote
<a class=subclass href="www.ask.com">click</a>
</html>

 

So under these circumstances my code is unable to retrieve these links..

 

So is it possible to frame a regular expression that it will retrieve all the values between href ?

 

please help !

try this

$HTML = $thehtmlpage;
preg_match_all('/(?:href\s?=\s?(?:"|\'))(.*?)(?:"|\')/i', $HTML, $result, PREG_PATTERN_ORDER);
$result = $result[0];
print_r($result);

 

or

 

$HTML = $thehtmlpage;
preg_match_all('/\bwww\.[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i', $HTML , $result, PREG_PATTERN_ORDER);
$result = $result[0];

this will find URL's formatted text

Your code doesnt work for the 4th condition i mentioned above..

 

this regex works perfectly for extracting any value embedded between href.

 

preg_match_all("/<a\s+.*?href=[\"\'\s]?(.*?)>(.*?)<\/a>/i",$source,$result);

 

now i need to get the value between the action attribute.. like this

 

<form action="new.asp" method="post">

 

Is it possible to use my above expression to suit the form thing ??

seams to work here

 

ie

<a class=subclass href="www.ask.com">click</a>

returns

www.ask.com

 

as for

<form action="new.asp" method="post">

 

preg_match_all('/<form(?:.*)(?:action=)(?:"|\')([^"\']*)/si', $subject, $result, PREG_SET_ORDER);

 

will find the value of action

 

or

preg_match_all('/(??:href\s?=\s?|action\s?=\s?)(?:"|\\'))(.*?)(?:"|\\')/si', $subject, $result, PREG_SET_ORDER);

to extend the one above

ok, had to break it up a little

<?php
$subject = '<form action="new.asp" method="post">';
$Reg1 = '/(??:href\s?=\s?|action\s?=\s?)(?:"|';
$Reg2 = "\\')?)(.*?)";
$Reg3 = '(?:"|';
$Reg4 = "\\'|\s)/si";

preg_match_all($Reg1.$Reg2.$Reg3.$Reg4, $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];
print_r($result);
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.