Jump to content

how to do this regex ?


jjk2

Recommended Posts

i am kinda lost how i can appraoch this.

 

basically i have many file paths like this

 

crazy.com/main/videos/something/popular/index.html

 

crazy.com/latest/news/odds/home.jpg

 

crazy2.com/funny/world/politics/welcome.html

 

another.com/news/business/index.html

 

how can i get only the things in bold ?

 

also, the filenames differs dynamically.

 

 

Link to comment
Share on other sites

even the above regex does what you ask, here's another one ;)

preg_match('#(?<=crazy\.com).*?(?=(?:index|welcome\.html)|home\.jpg)#', $string, $match);
print_r($match);

 

There are a few issues with your suggestion however...

 

a) That's 'probably' more work than sasa's method (while I don't advocate .* too often, it does have its uses, and depending on whether the url entries are by themselves to be checked (not nested within some large block of text), that method is more likely to be faster (granted, I haven't tested the speed difference between yours and sasa's... I'm going on the assumption of positive look behind and ahead assertions vs some minor .* backtracking [although, admittedly I could be wrong on this]).

 

b) Your pattern requires specific domains - (?<=crazy\.com) [so what happens with crazy2.com or another.com?] with specific ending file names (such as index or welcome.html by example) The following code illustrates this these issues:

 

$arr = array('crazy.com/main/videos/something/popular/index.html','crazy.com/latest/news/odds/home.jpg','crazy2.com/funny/world/politics/welcome.html','another.com/news/business/index.html');
foreach ($arr as $val) {
echo (preg_match('#(?<=crazy\.com).*?(?=(?:index|welcome\.html)|home\.jpg)#', $val))? $val . "<br />\n" :  'Url format not found using regex pattern...' .  "<br />\n";
}

 

output:

crazy.com/main/videos/something/popular/index.html
crazy.com/latest/news/odds/home.jpg
Url format not found using regex pattern...
Url format not found using regex pattern...

 

Point being, I think the idea is to be able to match directories of any url (thus, regex patterns being flexable), which sasa's is.

Link to comment
Share on other sites

I wasn't illustrating the path capturing so much as the restriction on domain names and file names that need to be found within the pattern in the first place. sasa's is more flexible. And yes, parse_url would be even better (again, assuming that the url in question is by itself and not embedded within a string).

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.