Jump to content

Searching for certain links (preg_match_all)


mysterbx

Recommended Posts

Hello,

 

I have 2 preg_match_all formulas, is it possible to fit them in one line (one preg_match_all)...

 

Search for links regex:

preg_match_all("#([ \s\n\t\r]|)([\w]+?://(([\w]+?.|)(imdb.com|gamespot.com)[\S]+?)[^ \"\n\r\t<]*)#ise", $fix, $extralinks);

if link(s) matches #([\w]+?://(.*)(([.]{3})|â|€|¦)(.*)\n*)#ise it/they wount be counted or shown

 

Its made ~70% of it by myself... ~30% help from other forum members :)

well... the first one searches any text for links, if it finds them shows them...

the other one searches the found links if trhey have "..." (3 dots) in it,

i need to fit them in one preg_match_all...

 

if the links are found, check them if they have those 3 dot, if yes dont show them... else show them

Gotcha. Try this:

 

<pre>
<?php

$data = <<<URLS
	http://www.imdb.com/.../.../etc
	http://gamespot.com/.../etc
	http://www.google.com/.../etc
	http://www.perl.com
	http://www.gamespot.com
	http://imdb.com
URLS;

preg_match_all('%
	https?://
	(?:www\.)?
	(?:imdb|gamespot)
	\.com
	(?(?=\S)(??!\.\.\.)\S)+(?<!\p{P}))
%x', $data, $matches);
print_r($matches);
?>
</pre>

<pre>
<?php

$data = <<<URLS
	<a href="http://www.imdb.com/.../.../etc">X</a>
	<a href="http://gamespot.com/.../etc">X</a>
	<a href="http://www.google.com/.../etc">X</a>
	<a href="http://www.perl.com">X</a>
	<a href="http://www.gamespot.com">X</a>
	<a href="http://imdb.com">X</a>
	http://www.imdb.com/.../.../etc
	http://gamespot.com/.../etc
	http://www.google.com/.../etc
	http://www.perl.com
	http://www.gamespot.com
	http://imdb.com
URLS;

preg_match_all('%
	https?://
	(?:www\.)?
	(?:imdb|gamespot)
	\.com
	(?(?!$|[\s"\'>])
		(??!\.\.\.)[^\s"\'>])+
		(?<!\p{P})
	)
%x', $data, $matches);
print_r($matches);
?>
</pre>

Sorry, but I dont understand how to fit that code in this code:

 

$linksearch = '#([ \s\n\t\r]|)([\w]+?://(([\S]+?.|)(lix.in|linkbank.eu|rsmonkey.com|xirror.com|stealth.to|badongo.com|filefront.com|sendspace.com|midload.com|rapidshare.(com|de)|easy-share.com|depositfiles.com|filefactory.com|megaupload.com|aliveupload.com|megashares.com|uploaded.to|netload.in)\/([\w\d\<?\=\\\/]{3})|([\S]+?)\/([\S]+?)[.](r([\d]{2})|([\d]{3})avi|mov|wmv|mp3|exe|zip|rar|nfo|sfv))[^ \>\<\s\'\"\n\r\t<]*)#ise';
preg_match_all($linksearch, $fix, $links);

 

Could you make a very simple code that wouldnt add links with "..."?

Here is a sample (input & output i suppose)

 

Inout:

http://someurl.com/something.rar
http://someurl1.com/so...thing.rar
http://someurl2.com/something.rar
http://someurl3.com/something.rar
http://someurl4.com/s...ething.rar
http://someurl5.com/someth....rar

 

Output (show only links that have no dots (3 dots))

http://someurl.com/something.rar
http://someurl2.com/something.rar
http://someurl3.com/something.rar

 

The newer expression

$linksearch = '#([ \s\n\t\r]|)((ftp|http)://(([\S]+?.|)((lix.in|fast-load.net|linkbank.eu|rsmonkey.com|xirror.com|stealth.to|badongo.com|filefront.com|sendspace.com|midload.com|rapidshare.(com|de)|easy-share.com|depositfiles.com|filefactory.com|megaupload.com|aliveupload.com|megashares.com|uploaded.to|netload.in)\/([\w\d\<?\=\\\/]{3,150}))|([\S]{6,150})\/([\S]{6,150})[.]((r|z)([\d]{2,3})|([\d]{3})avi|7z|str|txt|mov|wmv|mp3|exe|zip|rar|nfo|sfv))[^ \>\<\s\'\"\n\r\t<]*)#ise';

The sample given shouldn't return anything because your pattern is requiring certain domains. This might be a start:

 

#
### Protocol
(??:f|ht)tp)://
### Domain
(?:
	(?:lix|netload)\.in|
	fast-load\.net|
	linkbank\.eu|
	(?:rsmonkey|xirror|badongo|filefront|sendspace|midload|easy-share|depositfiles|filefactory|megaupload|aliveupload|megashares)\.com|
	(?:stealth|uploaded)\.to|
	rapidshare\.(?:com|de)
)
(?(?!$|[\s"\'>])
	(??!\.\.\.)[^\s"\'>])+
	(?<!\p{P})
)
#ix

Warning: Unexpected character in input: '\' (ASCII=92) state=1 in /home/domains/warezdir.com/www_root/new.php on line 77

 

Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in /home/domains/warezdir.com/www_root/new.php on line 89

 

Using this code: (also tried withou ### Protocol,### Domain)

preg_match_all("#
### Protocol
(??:f|ht)tp)://
### Domain
(?:
	(?:lix|netload)\.in|
	fast-load\.net|
	linkbank\.eu|
	(?:rsmonkey|xirror|badongo|filefront|sendspace|midload|easy-share|depositfiles|filefactory|megaupload|aliveupload|megashares)\.com|
	(?:stealth|uploaded)\.to|
	rapidshare\.(?:com|de)
)
(?(?!$|[\s"\'>])
	(??!\.\.\.)[^\s"\'>])+
	(?<!\p{P})
)
#ix", $fix, $links);

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.