The Little Guy Posted February 3, 2009 Share Posted February 3, 2009 I have cURL grabbing a web page, but I need to parse the page and get a link tag. Is this the best way to grab the tag (I still haven't tested it)? if(preg_match("~<link(.*?)rel=\"image_src\"(.*?)href=\"(.*?)\"~",$opt,$matches)){ $title = $matches[1]; }else{ $title = 'No Title Found!'; } Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted February 3, 2009 Author Share Posted February 3, 2009 OK, I modified it, this does work, but is it the best way to do it? if(preg_match("~<link(.+?)>~",$opt,$matches)){ if(preg_match("~rel=\"image_src\"~",$matches[0],$matches)){ if(preg_match("~href=\"(.*?)\"~",$matches[0],$matches)){ $imgSrc = '<img src="'.$matches[0].'" />'; } } } Quote Link to comment Share on other sites More sharing options...
.josh Posted February 3, 2009 Share Posted February 3, 2009 well it looks like you're trying to get the stuff between the quotes in the href attribute inside a link tag, so... preg_match('~<link.+?href="([^"]*)"[^>]*>~', $string, $match); echo $match[1]; Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted February 3, 2009 Author Share Posted February 3, 2009 yeah, but it also MUST contain this: rel="image_src" Quote Link to comment Share on other sites More sharing options...
.josh Posted February 3, 2009 Share Posted February 3, 2009 Don't "yeah but" me, son. You've been here long enough and have posted more than enough to know how it goes. You didn't get a regex that accounts for that, because you didn't ask for it. You didn't even say what you were trying to get inside the link tag. I just made a guess. Only thing you did actually say is that you were trying to "grab a link tag". Technically I could have given you ~<link[^>]*>~ and sent you on your way. It's obvious to you that that's not what you want, because you know what you want. We don't. We're not psychic. So we've gone from: "I need to grab a link tag" ~<link[^>]*>~ to "I need to get the stuff in-between the quotes of the href="..." inside a link tag (that extra part I assumed from your coding efforts, not from anything you actually bothered to mention) ~<link.+?href="([^"]*)"[^>]*>~ Hopefully you can see the difference between those two patterns, or at least see that they are different, because your information was more specific. Now you want to be more specific and only grab the info from tags that contain rel="image_src". So is that the exact thing that's going to be in there, or is image_src going to be saying different things, and you really mean to say rel="anythingcanbehere" ? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted February 3, 2009 Share Posted February 3, 2009 This is why I cannot emphesis enough that people need to take the time while constructing a post with regards to a regex question to stop and think about what exactly they are looking for.. It is extremely common place here within the regex forum especially, to have people ask one version of their problem, to have it answered, then to come back with requirements that weren't specified initially. Truth be told, it isn't fair to the people helping out, as it turns out more often than not to be a waste of time in the end, as the solution isn't adequate (simply due to miscommunication). There is a reason for this sticky thread. People really ought to adhere to what is mentioned within that thread. It makes logical sense. While I am certainly not trying to pick sides, it feels almost like an epidemic developing. I often request an example string or two, and the end results of what they are looking of (as in, let's pretend we plugged in the correct regex, show me what the absolute end results should look like [not in regex form, but in string form, or array matched/captured form), and include notes of what in the string might be dynamic, and what MUST be matched/captured/what have you). Saves time and frustration from both parties involved. Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted February 4, 2009 Author Share Posted February 4, 2009 Sorry... I have another one, I just cant seem to grab it, please help. preg_match("~<(link(.+?)rel=\"(shortcut icon|icon)\"[^>])*>~"); Here is what I woul like this preg_match to match (if this tag exist in the HTML Document): - rel="shortcut icon" OR rel="icon" I can't get it because: - ref and href are sometimes reversed - all the link tags are on the same line Thanks for the help... Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted February 4, 2009 Share Posted February 4, 2009 $str = '<link href="/css/somefile.css" rel="stylesheet" type="text/css" /><link rel="shortcut icon" href="/favicon.ico" />'; preg_match('#<link.+?(rel=([\'"])(?:shortcut )?icon\2)#i', $str, $match); echo $match[1]; Output: rel="shortcut icon" If it is simply a boolean test you are looking for, you can use: if(preg_match('#<link.+?rel=([\'"])(?:shortcut )?icon\1#i', $str)){ echo 'true'; } else { echo 'false'; } Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted February 6, 2009 Author Share Posted February 6, 2009 $str = '<link href="/css/somefile.css" rel="stylesheet" type="text/css" /><link rel="shortcut icon" href="/favicon.ico" />'; preg_match('#<link.+?(rel=([\'"])(?:shortcut )?icon\2)#i', $str, $match); echo $match[1]; Output: rel="shortcut icon" If it is simply a boolean test you are looking for, you can use: if(preg_match('#<link.+?rel=([\'"])(?:shortcut )?icon\1#i', $str)){ echo 'true'; } else { echo 'false'; } Thanks, can I get that to return the href value though? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted February 6, 2009 Share Posted February 6, 2009 $str = '<link rel="shortcut icon" href="/favicon.ico" />'; if(preg_match('#<link.+?rel=([\'"])(?:shortcut )?icon\1.+?href=([\'"])([^\2]+)\2[^/>]*/?>#i', $str, $match)){ echo $match[3]; } else { echo 'No match found.'; } Quote Link to comment Share on other sites More sharing options...
The Little Guy Posted February 7, 2009 Author Share Posted February 7, 2009 Will that work if the tag looks like one of these? <link rel="shortcut icon" href="/favicon.ico" /> <link href="/favicon.ico" rel="shortcut icon" /> I have see sites where rel is before href and vise verse, so I was just wondering if that will work on those sites? Quote Link to comment Share on other sites More sharing options...
nrg_alpha Posted February 7, 2009 Share Posted February 7, 2009 Ok, Little Guy.. I supplied the meat and potatoes.. you supply the gravy: #$str = '<link rel="shortcut icon" href="/favicon.ico" />'; #$str = '<link href="/favicon.ico" />'; #$str = '<link rel="shortcut icon" />'; $str = '<link href="/favicon.ico" rel="shortcut icon" />'; if(preg_match('#<link(?:.+?(?:href|rel)=[\'"](??<icon>(?:shortcut )?icon)|(?<path>[^\"]+))[\'"])+.*>#i', $str, $match)){ foreach($match as $key=>$val){ if(empty($key) || $key!='icon' && $key != 'path' || $val == ''){ unset($match[$key]); } } echo '<pre>'.print_r($match, true); } else { echo 'Error... no valid link tag found...'; } I gave the captures some names tomake it eaiser for you to choose which one to use (icon and path) [i really didn't need to do that.. it's just more clear labelling for you]. Unfortunately, the regex engine by nature will still assign values to $1, $2, etc.. so I just strip out any empty, non icon and non path results.. what you are left with is simply [icon] and / or [path]. Configure / fine tune this to your liking. Cheers Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.