Jump to content

[SOLVED] PHP regular expressions help


Alexk

Recommended Posts

I have just started using php expressions. I am trying to make a script that returns all URLs from links. I thought that first of all, id have too remove the links code, such as:

<a href = 'url'>urltext</a>

this could then be placed in an array of links in a page. Then i would have too remove all of the HTML code around the URL. I have tried too do this myself, using:

 

preg_split("/<a href = '(.*)'>(.*)<\/a>",$src,-1)

 

however, it returns completely useless code (when i use print_r)

 

Has anyone got any ideas so that i could get only the links code?

Link to comment
https://forums.phpfreaks.com/topic/73104-solved-php-regular-expressions-help/
Share on other sites

maybe this

<?php
preg_match_all('%<a href\s?=\s?(?:["|\']?)(.*?)(?:["|\']?)>(.*?)<\/a>%i', $src, $result, PREG_PATTERN_ORDER);
$linkname= $result[1];
$URL = $result[2];
print_r($linkname);
print_r($URL);
?>

 

or to find any valid link (starting with http), try

 

<?php
preg_match_all('/\bhttps?:\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
?>

 

please note we have a Regex within PHP section

$filefound=""."\n";
# if thats properly behaving as perl then escape \< is required (php is dodgey on perl)
# could need to shift +? or {4,*}? in or out of (.) note +? is the easier
while(preg_match("/\<a href=(\"|\'|!\"|!\')(.{4,*}?)\<\/a>/",$inputString,$foundset)){ # single-first match
$filefound.=$foundset[2]; # second subset
}
#.... print it to file

$filefound=""."\n";
# if thats properly behaving as perl then escape \< is required (php is dodgey on perl)
# could need to shift +? or {4,*}? in or out of (.) note +? is the easier
while(preg_match("/\<a href=(\"|\'|!\"|!\')(.{4,*}?)\<\/a>/",$inputString,$foundset)){ # single-first match
$filefound.=$foundset[2]; # second subset
}
#.... print it to file

 

OOPS! SORRY THAT WAS A BIT QUICK!!!

$filefound=""."\n";
$lg=length($inputString);
$walk=0;
# if thats properly behaving as perl then escape \< is required (php is dodgey on perl)
# could need to shift +? or {4,*}? in or out of (.) note +? is the easier --first subset--
while(preg_match("/\<a href=(.{4,*}?) \<\/a>/",($inputString=substring($inputString,$walk)),$foundset)){ # single-first match
$st=substr($foundset[1],0,1); 
$en=substr($foundset[1],-1); 
# trim ends 
if(preg_match("/(\"|\')/",$st){
$foundset[1]=substr($foundset[1],1);
}
if(preg_match("/(\"|\')/",$en){
$foundset[1]=substr($foundset[1],0,($lng=length($foundset[1])-1));
}
$filefound.=$foundset[1];
$walk+=length($foundset[0])+1;
} # enwhile
#.... print it to file
###### oops! that was close

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.