Jump to content

[SOLVED] Indexing Using PHP


jj20051

Recommended Posts

Ha Ha Ha... >:( I Know The Basics Of PHP I Just Haven't Figured Out How I Would Go About Doing this Problem. I Already Made The First Portion Of The Search Engine Script. ( http://apenex.net ), but I Want To Make An Auto Indexer ATM I Am Manually Adding A Description and A Title and Keywords By Hand. ( Very Time Consuming If I Were To Index 1 Million Pages )

You would need to read the page into a string using file_get_contents() then parse that string for meta tags using preg_match().

 

Something like...

 

<?php

  $file = file_get_contents('http://foo.com/index.php');
  $meta = preg_match('/(<meta name="keywords" content="(.*)" \/>)/i', $file, $matches);
  print_r($matches);

?>

I know only basic PHP so I can't really give you many details but here we go.

 

First I see you got a form to get the URL. From there your going to need to look at the your going to need to find some way to get the info thats on that site. Most search engines use web crawlers; computers that go to the site and follow all the links and grab the data for you. But another way is to maybe to grab the source of the page. From there you can past that data into form and have it look for key meta tags. This page has

 

<meta name="description" content="Post reply" /> 
<meta name="keywords" content="PHP, MySQL, bulletin, board, free, open, source, smf, simple, machines, forum" />

 

Then your script will take the tags and well, store them. As you can see just by the tags this page is used to Post Replies and is for PHP, MySQL and so one.

 

As for what Aqpti posted, all I can say as make sure to give as much detail as you can so the real pros can help. Helps keep them from getting testy.  ;)

I Tried To Use This ( A modified Version Of Your Code )

 

<?php

  $file = file_get_contents('http://phpfreaks.com/index.php');
  $meta = preg_match('/(<meta name="keywords" content="(.*)" \/>)/i', $file, $matches);
  print_r($file);


?>

 

It Worked To An Extent It Displays Just The Content Of The Page ( http://apenex.net/222.php )

If I Could Remove All Links, forms and other HTML content it would be perfect for my purposes.

 

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.