d.shankar Posted August 11, 2007 Share Posted August 11, 2007 I have this three regex .. Each of them retrieve links which are really distinct from another. preg_match_all("/<a (?:.*?)href=\"([^\"]+?)\"(?:[^>]*?)>/si", $src, $val); preg_match_all("/<a[s]+[^>]*hrefs*=s*[\"']?([^'\" >]+)['\" >]/s",$src,$val); preg_match_all("/href=\"(.*?)\"|<frame.*?src=\"(.*?)\"/",$src,$val); Is it possible the three regex to united to a single thing ? Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 11, 2007 Share Posted August 11, 2007 its easier to work from things to want to match but try this preg_match_all('/\b(?<=(<a[s]+|<a[^>]|<frame.*?src=)).*?(https?):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i', $subject, $result, PREG_PATTERN_ORDER); Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 13, 2007 Author Share Posted August 13, 2007 Thanks for reply.. I have a HTML source like this and i need to extract the values embedded inside the href tags <html> ..... <a href="www.google.com">click</a> <a href=www.yahoo.com>click</a> NOTE: here there is no double quote <a href='www.yahoo.com'>click</a> NOTE: here there is a single quote <a class=subclass href="www.ask.com">click</a> </html> So under these circumstances my code is unable to retrieve these links.. So is it possible to frame a regular expression that it will retrieve all the values between href ? please help ! Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 13, 2007 Share Posted August 13, 2007 try this $HTML = $thehtmlpage; preg_match_all('/(?:href\s?=\s?(?:"|\'))(.*?)(?:"|\')/i', $HTML, $result, PREG_PATTERN_ORDER); $result = $result[0]; print_r($result); or $HTML = $thehtmlpage; preg_match_all('/\bwww\.[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i', $HTML , $result, PREG_PATTERN_ORDER); $result = $result[0]; this will find URL's formatted text Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 13, 2007 Author Share Posted August 13, 2007 Thanks buddy will check that out. Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 14, 2007 Author Share Posted August 14, 2007 Your code doesnt work for the 4th condition i mentioned above.. this regex works perfectly for extracting any value embedded between href. preg_match_all("/<a\s+.*?href=[\"\'\s]?(.*?)>(.*?)<\/a>/i",$source,$result); now i need to get the value between the action attribute.. like this <form action="new.asp" method="post"> Is it possible to use my above expression to suit the form thing ?? Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 14, 2007 Share Posted August 14, 2007 seams to work here ie <a class=subclass href="www.ask.com">click</a> returns www.ask.com as for <form action="new.asp" method="post"> preg_match_all('/<form(?:.*)(?:action=)(?:"|\')([^"\']*)/si', $subject, $result, PREG_SET_ORDER); will find the value of action or preg_match_all('/(??:href\s?=\s?|action\s?=\s?)(?:"|\\'))(.*?)(?:"|\\')/si', $subject, $result, PREG_SET_ORDER); to extend the one above Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 14, 2007 Author Share Posted August 14, 2007 will your code work only for the below setup or does it work for <form action="new.asp" method="post"> this type too ? <form action=new.asp method=post> <form name=frm1 action="new.asp"> <form name="frm2" action="new.asp"> Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 14, 2007 Share Posted August 14, 2007 works with all preg_match_all('/(??:href\s?=\s?|action\s?=\s?)(?:"|\\'|))(.*?)(?:"|\\'|\s)/si', $subject, $result, PREG_PATTERN_ORDER); $result = $result[0]; Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 14, 2007 Author Share Posted August 14, 2007 I am getting a paranthesis error in the code. preg_match_all('/(??:href\s?=\s?|action\s?=\s?)(?:"|\\'|))(.*?)(?:"|\\'|\s)/si',$source,$result); Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 14, 2007 Share Posted August 14, 2007 preg_match_all('/(??:href\s?=\s?|action\s?=\s?)(?:"|\\')?)(.*?)(?:"|\\'|\s)/si', $subject, $result, PREG_PATTERN_ORDER); $result = $result[0]; Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 14, 2007 Author Share Posted August 14, 2007 Still the same error MT. No problem. Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 14, 2007 Share Posted August 14, 2007 ok, had to break it up a little <?php $subject = '<form action="new.asp" method="post">'; $Reg1 = '/(??:href\s?=\s?|action\s?=\s?)(?:"|'; $Reg2 = "\\')?)(.*?)"; $Reg3 = '(?:"|'; $Reg4 = "\\'|\s)/si"; preg_match_all($Reg1.$Reg2.$Reg3.$Reg4, $subject, $result, PREG_PATTERN_ORDER); $result = $result[1]; print_r($result); ?> Quote Link to comment Share on other sites More sharing options...
d.shankar Posted August 14, 2007 Author Share Posted August 14, 2007 Gr8 Buddy You Rock !!!! Quote Link to comment Share on other sites More sharing options...
MadTechie Posted August 14, 2007 Share Posted August 14, 2007 coolie click solved if all is fine.. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.