Jump to content

Recommended Posts

Hello all!

 

I am having an issue with a regular expression I am using for preg_match_all().  What is does is look at whatever data I throw at it and returns any links it finds in an array.  Well, for the most part, it is doing it's job, but it's getting a little too much.  All the links it returns look like this

http://www.google.com<br

So obviously it is grabbing a little too much and I can't see how to fix it.  Can you guy let me know what you think?

$regex = '/https?\:\/\/[^\" ]+/i';

 

Edit: Sorry, I didn't see until now you had a whole regex subforum.  You can move this if you would like.  Sorry for any hassle.

Link to comment
https://forums.phpfreaks.com/topic/137233-solved-regular-expression-trouble/
Share on other sites

I am unsure what you mean by that, sorry.

What I am doing is looping through a mysql database and finding links from all the entries.

 

I also discovered that if any tag is behind it, it always seems to get merged with it, such as </a

 

Edit: I went regexlib.com and found that this one is supposed to extract urls, but I can't modify it to be used in php. (I am not very good at regex)

(?<http>(http:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)

How about something like this?

 

<pre>
<?php
$html = <<<HTML
<a href="http://www.phpfreaks.com">PHP Freaks</a>
<a href="http://www.google.com/index.html">Visit http://www.google.com!</a>
HTML;
preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches);
print_r($matches);
?>
</pre>

Well, that got rid of the tags, but I am still getting extra data.  Now after the link if there was some text it gets appended.  Like if the post looked like this:

http://www.google.com
Go there for a cool search engine!

it returns

http://www.google.comGo

That data works in the example code:

 

<pre>
<?php
$html = <<<HTML
http://www.google.com
Go there for a cool search engine!
HTML;
preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches);
print_r($matches);
?>
</pre>

 

What else is happening in your code?

will there always be a <br> after the link? Will the link always be at the beginning of the string?  In order to accurately extract it from the string, a pattern has to be established.  A pattern, of course, being something that happens on a regular, predictable basis.  It's not really going to be possible to accurately pull a url out from a string if it's just randomly amongst other stuff...

Not sure if I understand this correctly, but would this work?

 

$str = <<<DATA
http://www.google.com
Go there for a cool search engine!
DATA;
preg_match_all('#(https?://[.\w/-]+)#s', $str, $matches);
echo '<pre>'.print_r($matches[1], true);

 

Output:

Array
(
    [0] => http://www.google.com
)

 

EDIT - by my calculations, it shouldn't matter if there is a <br> trailing afterwards or not with the above pattern. I am using preg_match_all incase what you are plugging into the pattern contains multiple urls.

Okay well if it's gonna be that the beginning of the string and a <br> is there "most" of the time, then you can do this:

 

$html = "http://www.google.com<br />Go there for a cool search engine!";
preg_match("/(.*?)<br.*?>/",$html,$matches);
print_r($matches);

 

Nope, I checked.

 

Really? because when I test this:

 

$str = "http://www.google.com<br />Go there for a cool search engine!";
preg_match_all('#(https?://[.\w/-]+)#s', $str, $matches);
echo '<pre>'.print_r($matches[1], true);

 

It reports back what you seek (in the form of an array element of course).

Yes you are right, but for some reason, it is still not working for me.  Here is some info from the mysql table I am reading from:

Field      Type      Collation      Null      Default
post   mediumtext latin1_swedish_ci   Yes      NULL

I don't know what else to tell you.

No, you don't need 'yet another regex solution' as you already have an adequate solution offered to you.

 

The problem here (it seems) is not knowing how to load your MySQL table into an array, which in turn passes through one of the solutions offered here (if you have managed that far, it wouldn't be hard to implement a solution offered in this thread to quickly hammer out the urls).

 

This is why when people respond with something like 'nope.. I checked', this tells us absolutely nothing! Perhaps you should reveal your entire block of MySQL code (hide your SQL password and username though) as well as how you integrated one of the solutions offered here so that others can see the bigger picture and pinpoint where you are going wrong (a small sample list of what is stored within your MySQL database might also help out in trouble shooting this matter). Without knowing more of what's happening, it is basically 'shooting in the dark'. I for one am not knowledgable in databases, so unfortunately I cannot help you. But rest assured, you have enough viable regex solutions here that actually do what you are seeking.. now it is a matter of properly connecting to the database, pulling everything into an array, and then passing that array through one of the regex patters in this thread.

<pre>
<?php
   $html = 'http://www.google.com<br>Go there for a cool search engine!';
   ### Similar to strip_tags, but replace with a space.
   $html = preg_replace('/<[^>]*>/', ' ', $html);
   preg_match('%https?://\S+(?<!\p{P})%i', $html, $matches);
   print_r($matches);
?>
</pre>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.