Jump to content

SQL command to crawl for full image file on the web (eg: image.jpg)


Recommended Posts

hello all,

i am doing search engine,

i based my code on Daniel Solin's examples

http://www.onlamp.com/pub/a/php/2002/10/24/simplesearchengine.html?page=3

 

its a good example but i`m wondering how do i make it useful by searching for whole content?

the current operation will populate single words by words.

eg: it will populate the keyword "image" instead of "image.jpg"

 

1.) how do i tweak the sql command so that it will populate both keyword and keyword with their file extension?

 

it is also not efficient because when i search for "buil", the "building" does not appear,

 

2.) how do i tweak the sql command so that it can search for both "buil" and "building" ?

 

thanks.. i`m a real beginner here..

 

heres my populate.php code

    <?php
/*
* populate.php
*
* Script for populating the search database with words,
* pages and word-occurences.
*/


/*  populating eg.: http://localhost/fyp/populate.php?url=http://www.chedet.com   */


/* Connect to the database: */
mysql_pconnect("localhost","admin","password")
    or die("ERROR: Could not connect to database!");

mysql_select_db("fyp");


// clear the previous saved session
mysql_query(" TRUNCATE page; ");
mysql_query(" TRUNCATE word; ");
mysql_query(" TRUNCATE occurrence; ");


/* Define the URL that should be processed: */

$url = addslashes( $_POST['url'] );

if( !$url )
{
   
   die( "<center> <br>Could not open URL! <br><br> <a href=\"javascript:history.back()\">Back</a></center> <br><br><br>" );
   
   
}
else if( substr($url,0,7) != "http://" )
{
   $url = "http://$url";
}

/* Does this URL already have a record in the page-table? */
$result = mysql_query("SELECT page_id FROM page WHERE page_url = \"$url\"");
$row = mysql_fetch_array($result);

if( $row['page_id'] )
{
   /* If yes, use the old page_id: */
   $page_id = $row['page_id'];
}
else
{
   /* If not, create one: */
   mysql_query("INSERT INTO page (page_url) VALUES (\"$url\")");
   $page_id = mysql_insert_id();
}

/* Start parsing through the text, and build an index in the database: */
if( !($fd = fopen($url,"r")) )
   
   die( "<center> <br>Could not open URL! <br><br> <a href=\"javascript:history.back()\">Back</a></center> <br><br><br>" );

while( $buf = fgets($fd,1024) )
{
   /* Remove whitespace from beginning and end of string: */
   $buf = trim($buf);

   /* Try to remove all HTML-tags: */
   $buf = strip_tags($buf);
   $buf = ereg_replace('/&\w;/', '', $buf);

   /* Extract all words matching the regexp from the current line: */
   preg_match_all("/(\b[\w+]+\b)/",$buf,$words);

   /* Loop through all words/occurrences and insert them into the database: */
   for( $i = 0; $words[$i]; $i++ )
   {
      for( $j = 0; $words[$i][$j]; $j++ )
      {
         /* Does the current word already have a record in the word-table? */
         $cur_word = addslashes( strtolower($words[$i][$j]) );

         $result = mysql_query("SELECT word_id FROM word 
                                WHERE word_word = '$cur_word'");
         $row = mysql_fetch_array($result);
         if( $row['word_id'] )
         {
            /* If yes, use the old word_id: */
            $word_id = $row['word_id'];
         }
         else
         {
            /* If not, create one: */
            mysql_query("INSERT INTO word (word_word) VALUES (\"$cur_word\")");
            $word_id = mysql_insert_id();
         }

         /* And finally, register the occurrence of the word: */
         mysql_query("INSERT INTO occurrence (word_id,page_id) 
                      VALUES ($word_id,$page_id)");
         print "Indexing: $cur_word<br>";
      }
   }
}

fclose($fd);



?>


<!-- <script language=javascript>
	setTimeout("location.href='http://localhost/fyp/search.php'",2);
</script> -->
<?php 
if(isset($_POST['submit1'])) {

print "
<script language=javascript>
	setTimeout(\"location.href='http://localhost/fyp/search.php'\",2);
</script>

";

}

if(isset($_POST['submit2'])) {

print "
<script language=javascript>
	setTimeout(\"location.href='http://localhost/fyp/search2.php'\",2);
</script>

";

}
?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.