Jump to content

[SOLVED] Regular Expression Trouble


carrotcake1029

Recommended Posts

Hello all!

 

I am having an issue with a regular expression I am using for preg_match_all().  What is does is look at whatever data I throw at it and returns any links it finds in an array.  Well, for the most part, it is doing it's job, but it's getting a little too much.  All the links it returns look like this

http://www.google.com<br

So obviously it is grabbing a little too much and I can't see how to fix it.  Can you guy let me know what you think?

$regex = '/https?\:\/\/[^\" ]+/i';

 

Edit: Sorry, I didn't see until now you had a whole regex subforum.  You can move this if you would like.  Sorry for any hassle.

Link to comment
Share on other sites

I am unsure what you mean by that, sorry.

What I am doing is looping through a mysql database and finding links from all the entries.

 

I also discovered that if any tag is behind it, it always seems to get merged with it, such as </a

 

Edit: I went regexlib.com and found that this one is supposed to extract urls, but I can't modify it to be used in php. (I am not very good at regex)

(?<http>(http:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)

Link to comment
Share on other sites

How about something like this?

 

<pre>
<?php
$html = <<<HTML
<a href="http://www.phpfreaks.com">PHP Freaks</a>
<a href="http://www.google.com/index.html">Visit http://www.google.com!</a>
HTML;
preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches);
print_r($matches);
?>
</pre>

Link to comment
Share on other sites

That data works in the example code:

 

<pre>
<?php
$html = <<<HTML
http://www.google.com
Go there for a cool search engine!
HTML;
preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches);
print_r($matches);
?>
</pre>

 

What else is happening in your code?

Link to comment
Share on other sites

will there always be a <br> after the link? Will the link always be at the beginning of the string?  In order to accurately extract it from the string, a pattern has to be established.  A pattern, of course, being something that happens on a regular, predictable basis.  It's not really going to be possible to accurately pull a url out from a string if it's just randomly amongst other stuff...

Link to comment
Share on other sites

Not sure if I understand this correctly, but would this work?

 

$str = <<<DATA
http://www.google.com
Go there for a cool search engine!
DATA;
preg_match_all('#(https?://[.\w/-]+)#s', $str, $matches);
echo '<pre>'.print_r($matches[1], true);

 

Output:

Array
(
    [0] => http://www.google.com
)

 

EDIT - by my calculations, it shouldn't matter if there is a <br> trailing afterwards or not with the above pattern. I am using preg_match_all incase what you are plugging into the pattern contains multiple urls.

Link to comment
Share on other sites

Nope, I checked.

 

Really? because when I test this:

 

$str = "http://www.google.com<br />Go there for a cool search engine!";
preg_match_all('#(https?://[.\w/-]+)#s', $str, $matches);
echo '<pre>'.print_r($matches[1], true);

 

It reports back what you seek (in the form of an array element of course).

Link to comment
Share on other sites

No, you don't need 'yet another regex solution' as you already have an adequate solution offered to you.

 

The problem here (it seems) is not knowing how to load your MySQL table into an array, which in turn passes through one of the solutions offered here (if you have managed that far, it wouldn't be hard to implement a solution offered in this thread to quickly hammer out the urls).

 

This is why when people respond with something like 'nope.. I checked', this tells us absolutely nothing! Perhaps you should reveal your entire block of MySQL code (hide your SQL password and username though) as well as how you integrated one of the solutions offered here so that others can see the bigger picture and pinpoint where you are going wrong (a small sample list of what is stored within your MySQL database might also help out in trouble shooting this matter). Without knowing more of what's happening, it is basically 'shooting in the dark'. I for one am not knowledgable in databases, so unfortunately I cannot help you. But rest assured, you have enough viable regex solutions here that actually do what you are seeking.. now it is a matter of properly connecting to the database, pulling everything into an array, and then passing that array through one of the regex patters in this thread.

Link to comment
Share on other sites

<pre>
<?php
   $html = 'http://www.google.com<br>Go there for a cool search engine!';
   ### Similar to strip_tags, but replace with a space.
   $html = preg_replace('/<[^>]*>/', ' ', $html);
   preg_match('%https?://\S+(?<!\p{P})%i', $html, $matches);
   print_r($matches);
?>
</pre>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.