Jump to content

Searching for certain links (preg_match_all)


Recommended Posts

Hello,

 

I have 2 preg_match_all formulas, is it possible to fit them in one line (one preg_match_all)...

 

Search for links regex:

preg_match_all("#([ \s\n\t\r]|)([\w]+?://(([\w]+?.|)(imdb.com|gamespot.com)[\S]+?)[^ \"\n\r\t<]*)#ise", $fix, $extralinks);

if link(s) matches #([\w]+?://(.*)(([.]{3})|â|€|¦)(.*)\n*)#ise it/they wount be counted or shown

 

Its made ~70% of it by myself... ~30% help from other forum members :)

well... the first one searches any text for links, if it finds them shows them...

the other one searches the found links if trhey have "..." (3 dots) in it,

i need to fit them in one preg_match_all...

 

if the links are found, check them if they have those 3 dot, if yes dont show them... else show them

Gotcha. Try this:

 

<pre>
<?php

$data = <<<URLS
	http://www.imdb.com/.../.../etc
	http://gamespot.com/.../etc
	http://www.google.com/.../etc
	http://www.perl.com
	http://www.gamespot.com
	http://imdb.com
URLS;

preg_match_all('%
	https?://
	(?:www\.)?
	(?:imdb|gamespot)
	\.com
	(?(?=\S)(??!\.\.\.)\S)+(?<!\p{P}))
%x', $data, $matches);
print_r($matches);
?>
</pre>

<pre>
<?php

$data = <<<URLS
	<a href="http://www.imdb.com/.../.../etc">X</a>
	<a href="http://gamespot.com/.../etc">X</a>
	<a href="http://www.google.com/.../etc">X</a>
	<a href="http://www.perl.com">X</a>
	<a href="http://www.gamespot.com">X</a>
	<a href="http://imdb.com">X</a>
	http://www.imdb.com/.../.../etc
	http://gamespot.com/.../etc
	http://www.google.com/.../etc
	http://www.perl.com
	http://www.gamespot.com
	http://imdb.com
URLS;

preg_match_all('%
	https?://
	(?:www\.)?
	(?:imdb|gamespot)
	\.com
	(?(?!$|[\s"\'>])
		(??!\.\.\.)[^\s"\'>])+
		(?<!\p{P})
	)
%x', $data, $matches);
print_r($matches);
?>
</pre>

Sorry, but I dont understand how to fit that code in this code:

 

$linksearch = '#([ \s\n\t\r]|)([\w]+?://(([\S]+?.|)(lix.in|linkbank.eu|rsmonkey.com|xirror.com|stealth.to|badongo.com|filefront.com|sendspace.com|midload.com|rapidshare.(com|de)|easy-share.com|depositfiles.com|filefactory.com|megaupload.com|aliveupload.com|megashares.com|uploaded.to|netload.in)\/([\w\d\<?\=\\\/]{3})|([\S]+?)\/([\S]+?)[.](r([\d]{2})|([\d]{3})avi|mov|wmv|mp3|exe|zip|rar|nfo|sfv))[^ \>\<\s\'\"\n\r\t<]*)#ise';
preg_match_all($linksearch, $fix, $links);

 

Could you make a very simple code that wouldnt add links with "..."?

Here is a sample (input & output i suppose)

 

Inout:

http://someurl.com/something.rar
http://someurl1.com/so...thing.rar
http://someurl2.com/something.rar
http://someurl3.com/something.rar
http://someurl4.com/s...ething.rar
http://someurl5.com/someth....rar

 

Output (show only links that have no dots (3 dots))

http://someurl.com/something.rar
http://someurl2.com/something.rar
http://someurl3.com/something.rar

 

The newer expression

$linksearch = '#([ \s\n\t\r]|)((ftp|http)://(([\S]+?.|)((lix.in|fast-load.net|linkbank.eu|rsmonkey.com|xirror.com|stealth.to|badongo.com|filefront.com|sendspace.com|midload.com|rapidshare.(com|de)|easy-share.com|depositfiles.com|filefactory.com|megaupload.com|aliveupload.com|megashares.com|uploaded.to|netload.in)\/([\w\d\<?\=\\\/]{3,150}))|([\S]{6,150})\/([\S]{6,150})[.]((r|z)([\d]{2,3})|([\d]{3})avi|7z|str|txt|mov|wmv|mp3|exe|zip|rar|nfo|sfv))[^ \>\<\s\'\"\n\r\t<]*)#ise';

The sample given shouldn't return anything because your pattern is requiring certain domains. This might be a start:

 

#
### Protocol
(??:f|ht)tp)://
### Domain
(?:
	(?:lix|netload)\.in|
	fast-load\.net|
	linkbank\.eu|
	(?:rsmonkey|xirror|badongo|filefront|sendspace|midload|easy-share|depositfiles|filefactory|megaupload|aliveupload|megashares)\.com|
	(?:stealth|uploaded)\.to|
	rapidshare\.(?:com|de)
)
(?(?!$|[\s"\'>])
	(??!\.\.\.)[^\s"\'>])+
	(?<!\p{P})
)
#ix

Warning: Unexpected character in input: '\' (ASCII=92) state=1 in /home/domains/warezdir.com/www_root/new.php on line 77

 

Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in /home/domains/warezdir.com/www_root/new.php on line 89

 

Using this code: (also tried withou ### Protocol,### Domain)

preg_match_all("#
### Protocol
(??:f|ht)tp)://
### Domain
(?:
	(?:lix|netload)\.in|
	fast-load\.net|
	linkbank\.eu|
	(?:rsmonkey|xirror|badongo|filefront|sendspace|midload|easy-share|depositfiles|filefactory|megaupload|aliveupload|megashares)\.com|
	(?:stealth|uploaded)\.to|
	rapidshare\.(?:com|de)
)
(?(?!$|[\s"\'>])
	(??!\.\.\.)[^\s"\'>])+
	(?<!\p{P})
)
#ix", $fix, $links);

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.