carrotcake1029 Posted December 16, 2008 Share Posted December 16, 2008 Hello all! I am having an issue with a regular expression I am using for preg_match_all(). What is does is look at whatever data I throw at it and returns any links it finds in an array. Well, for the most part, it is doing it's job, but it's getting a little too much. All the links it returns look like this http://www.google.com<br So obviously it is grabbing a little too much and I can't see how to fix it. Can you guy let me know what you think? $regex = '/https?\:\/\/[^\" ]+/i'; Edit: Sorry, I didn't see until now you had a whole regex subforum. You can move this if you would like. Sorry for any hassle. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/ Share on other sites More sharing options...
effigy Posted December 16, 2008 Share Posted December 16, 2008 %https?://[^\"\s>]+%i Will the URLs always be double quoted? Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-716933 Share on other sites More sharing options...
carrotcake1029 Posted December 16, 2008 Author Share Posted December 16, 2008 I am unsure what you mean by that, sorry. What I am doing is looping through a mysql database and finding links from all the entries. I also discovered that if any tag is behind it, it always seems to get merged with it, such as </a Edit: I went regexlib.com and found that this one is supposed to extract urls, but I can't modify it to be used in php. (I am not very good at regex) (?<http>(http:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*) Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-716944 Share on other sites More sharing options...
effigy Posted December 16, 2008 Share Posted December 16, 2008 What is the format of these entries? HTML? Prose? Anything? Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-716959 Share on other sites More sharing options...
carrotcake1029 Posted December 16, 2008 Author Share Posted December 16, 2008 HTML Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-716962 Share on other sites More sharing options...
effigy Posted December 16, 2008 Share Posted December 16, 2008 Do you want to pull URLs from tags, content, or both? Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-716972 Share on other sites More sharing options...
carrotcake1029 Posted December 16, 2008 Author Share Posted December 16, 2008 Just the content. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-716977 Share on other sites More sharing options...
effigy Posted December 16, 2008 Share Posted December 16, 2008 How about something like this? <pre> <?php $html = <<<HTML <a href="http://www.phpfreaks.com">PHP Freaks</a> <a href="http://www.google.com/index.html">Visit http://www.google.com!</a> HTML; preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches); print_r($matches); ?> </pre> Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717022 Share on other sites More sharing options...
carrotcake1029 Posted December 16, 2008 Author Share Posted December 16, 2008 Well, that got rid of the tags, but I am still getting extra data. Now after the link if there was some text it gets appended. Like if the post looked like this: http://www.google.com Go there for a cool search engine! it returns http://www.google.comGo Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717035 Share on other sites More sharing options...
effigy Posted December 16, 2008 Share Posted December 16, 2008 That data works in the example code: <pre> <?php $html = <<<HTML http://www.google.com Go there for a cool search engine! HTML; preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches); print_r($matches); ?> </pre> What else is happening in your code? Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717061 Share on other sites More sharing options...
carrotcake1029 Posted December 17, 2008 Author Share Posted December 17, 2008 Well, its coming more in the form of this: $html = "http://www.google.com<br>Go there for a cool search engine!"; Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717277 Share on other sites More sharing options...
.josh Posted December 17, 2008 Share Posted December 17, 2008 will there always be a <br> after the link? Will the link always be at the beginning of the string? In order to accurately extract it from the string, a pattern has to be established. A pattern, of course, being something that happens on a regular, predictable basis. It's not really going to be possible to accurately pull a url out from a string if it's just randomly amongst other stuff... Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717293 Share on other sites More sharing options...
carrotcake1029 Posted December 17, 2008 Author Share Posted December 17, 2008 I think that for my purposes, either a <br> or <br /> will be following most of the time. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717312 Share on other sites More sharing options...
nrg_alpha Posted December 17, 2008 Share Posted December 17, 2008 Not sure if I understand this correctly, but would this work? $str = <<<DATA http://www.google.com Go there for a cool search engine! DATA; preg_match_all('#(https?://[.\w/-]+)#s', $str, $matches); echo '<pre>'.print_r($matches[1], true); Output: Array ( [0] => http://www.google.com ) EDIT - by my calculations, it shouldn't matter if there is a <br> trailing afterwards or not with the above pattern. I am using preg_match_all incase what you are plugging into the pattern contains multiple urls. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717316 Share on other sites More sharing options...
.josh Posted December 17, 2008 Share Posted December 17, 2008 Okay well if it's gonna be that the beginning of the string and a <br> is there "most" of the time, then you can do this: $html = "http://www.google.com<br />Go there for a cool search engine!"; preg_match("/(.*?)<br.*?>/",$html,$matches); print_r($matches); Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717321 Share on other sites More sharing options...
carrotcake1029 Posted December 17, 2008 Author Share Posted December 17, 2008 Well, my string doesn't always begin with the link. Is there a way you can modify it to get find the link is well? My first post contained a regex that found all the links. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717323 Share on other sites More sharing options...
nrg_alpha Posted December 17, 2008 Share Posted December 17, 2008 My pattern does not work for what you are looking for? Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717326 Share on other sites More sharing options...
carrotcake1029 Posted December 17, 2008 Author Share Posted December 17, 2008 Nope, I checked. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717327 Share on other sites More sharing options...
nrg_alpha Posted December 17, 2008 Share Posted December 17, 2008 Nope, I checked. Really? because when I test this: $str = "http://www.google.com<br />Go there for a cool search engine!"; preg_match_all('#(https?://[.\w/-]+)#s', $str, $matches); echo '<pre>'.print_r($matches[1], true); It reports back what you seek (in the form of an array element of course). Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717332 Share on other sites More sharing options...
carrotcake1029 Posted December 17, 2008 Author Share Posted December 17, 2008 Yes you are right, but for some reason, it is still not working for me. Here is some info from the mysql table I am reading from: Field Type Collation Null Default post mediumtext latin1_swedish_ci Yes NULL I don't know what else to tell you. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717403 Share on other sites More sharing options...
carrotcake1029 Posted December 17, 2008 Author Share Posted December 17, 2008 Sorry for double post, but I could not edit. I think I know what I need. I just need a regex to to http:// at the beginning and <br.*?> at the end. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717419 Share on other sites More sharing options...
nrg_alpha Posted December 17, 2008 Share Posted December 17, 2008 No, you don't need 'yet another regex solution' as you already have an adequate solution offered to you. The problem here (it seems) is not knowing how to load your MySQL table into an array, which in turn passes through one of the solutions offered here (if you have managed that far, it wouldn't be hard to implement a solution offered in this thread to quickly hammer out the urls). This is why when people respond with something like 'nope.. I checked', this tells us absolutely nothing! Perhaps you should reveal your entire block of MySQL code (hide your SQL password and username though) as well as how you integrated one of the solutions offered here so that others can see the bigger picture and pinpoint where you are going wrong (a small sample list of what is stored within your MySQL database might also help out in trouble shooting this matter). Without knowing more of what's happening, it is basically 'shooting in the dark'. I for one am not knowledgable in databases, so unfortunately I cannot help you. But rest assured, you have enough viable regex solutions here that actually do what you are seeking.. now it is a matter of properly connecting to the database, pulling everything into an array, and then passing that array through one of the regex patters in this thread. Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717437 Share on other sites More sharing options...
effigy Posted December 17, 2008 Share Posted December 17, 2008 <pre> <?php $html = 'http://www.google.com<br>Go there for a cool search engine!'; ### Similar to strip_tags, but replace with a space. $html = preg_replace('/<[^>]*>/', ' ', $html); preg_match('%https?://\S+(?<!\p{P})%i', $html, $matches); print_r($matches); ?> </pre> Quote Link to comment https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/#findComment-717766 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.