Jump to content

regular expressions


advancedfuture

Recommended Posts

You wouldn't need to, you could simply use explode to find out the extension:

 

$url="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"; // (in your script it'd be a dynamic variable)

$extension=explode(".", $url);

 

if (($extension[1]=='dtd') || ($extension[1]=='css')){

echo "move to next link";

}

 

EDIT: Corrected a couple syntax errors

 

Link to comment
https://forums.phpfreaks.com/topic/44220-regular-expressions/#findComment-214766
Share on other sites

thanks for that... still having a problem with it looping on that stupid dtd file. only certain pages do that:

 

<?php
// aredhelrim
$seed = "http://www.phpfreaks.com/forums/index.php/board,1.0.html";
spider($seed);
function spider($url)
{
$html = file_get_contents($url);
echo "Page : " . $url;
$extension=explode(".", $url);
preg_match_all("/http:\/\/[^\"\s']+/", $html, $matches, PREG_SET_ORDER);

foreach ($matches as $val)
{
	if (($extension[1]!='dtd') && ($extension[1]!='css'))
	{
		echo "<br><font color=red>links :</font> " . $val[0] . "\r\n";
		spider($val[0]);
	}
}
}
?>

Link to comment
https://forums.phpfreaks.com/topic/44220-regular-expressions/#findComment-214780
Share on other sites

How do you mean looping? Does it continually echo ""<br><font color=red>links :</font> " . $val[0] . "\r\n";" ?

 

You could maybe but the if statement before the preg_match_all, it probably make a difference but there's no need to use a regular expression first, it'd make the script for time efficient even if it doesn't work.

Link to comment
https://forums.phpfreaks.com/topic/44220-regular-expressions/#findComment-214782
Share on other sites

when I say looping. My code will visit some pages and just repeated echo the following if that file exists:

 

Page : http://www.phpfreaks.com/forums/index.php/board,1.0.html

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

links : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd Page : http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

links : http://www.w3.org/1999/xhtml Page : http://www.w3.org/1999/xhtml

Link to comment
https://forums.phpfreaks.com/topic/44220-regular-expressions/#findComment-214787
Share on other sites

You can do something like this:

if( preg_match('/\.dtd(?:$|\?)/', $url) ) {
   return;
}

 

More likely you'd like something like this:

if( !preg_match('/\.(?:html|htm|shtml|xtml|php|asp|cgi)(?:$|\?)/', $url) ) {
   return;
}

So as to limit yourself to pages you're pretty sure you want to see.

 

Link to comment
https://forums.phpfreaks.com/topic/44220-regular-expressions/#findComment-215068
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.