renfley Posted January 20, 2012 Share Posted January 20, 2012 hey guys im looking for some input from the comunity since this a complicated coding issue i will ask the pros!!! [PS that called sucking up ] Ok so here is my end goal! User arrive to a certain webpage He inputs his Web URL and hits submit. Here is where i get stuck as for my solution!. I would like to runs a script once the url is submitted that will scan the URL and collect and store information in a database, basicly i would like three things. i would like to save the sitename, url, description. I really hope this isnt as complicated as it seems. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/ Share on other sites More sharing options...
renfley Posted January 20, 2012 Author Share Posted January 20, 2012 Ok after some more researche i have found that this works great for getting sitename <?php function getTitle($Url){ $str = file_get_contents($Url); if(strlen($str)>0){ preg_match("/\<title\>(.*)\<\/title\>/",$str,$title); return $title[1]; } } //Example: echo getTitle("http://localhost/"); ?> Now i would like something to get the descripttion Any helpers? Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309610 Share on other sites More sharing options...
renfley Posted January 20, 2012 Author Share Posted January 20, 2012 ok so thanks to something call old friends ive written the following! <?php //$url="www.viraleh.com"; function getTitle($Url){ $str = file_get_contents($Url); if(strlen($str)>0){ preg_match("/\<title\>(.*)\<\/title\>/",$str,$title); return $title[1]; } } //Example: $website = "http://www.hotscripts.com"; echo getTitle("$website"); $tags = get_meta_tags("$website"); echo "<br/>"; echo "<br/>"; echo "<br/>"; echo $tags['description']; echo "<br/>"; echo "<br/>"; echo "<br/>"; $host = "root"; $user = "localhost"; $pass = ""; $db = "spider"; $con = mysql_connect($host,$user,$pass); if (!$con) { die('Could not connect: ' . mysql_error()); } $con = mysql_connect("localhost","peter","abc123"); if (!$con) { die('Could not connect: ' . mysql_error()); } mysql_select_db($db, $con); mysql_query("INSERT INTO Persons (FirstName, LastName, Age) VALUES ('Peter', 'Griffin', '35')"); mysql_close($con); Except instead of firstname,ECT... I would like to post into Site (Sitename, Site_desc) Values (getTitle("$website", echo $tags['description']; This last parts i really require some help Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309621 Share on other sites More sharing options...
ultraloveninja Posted January 20, 2012 Share Posted January 20, 2012 What other information are you looking to collect? If you are looking for a web crawler, then you might want to look into Nutch: http://nutch.apache.org/ I am not sure that PHP is going to "scrape" info off of a site. Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309628 Share on other sites More sharing options...
renfley Posted January 20, 2012 Author Share Posted January 20, 2012 Actually I've test the above code and and it works perfectly I am stuck at the insert to database part basically here is the code <?php mysql_select_db($db, $con); mysql_query("INSERT INTO ls_sites (site_title, site_url, site_desc) VALUES ('Peter', 'Griffin', '35')"); mysql_close($con); Now in the above statement i am able to echo the following <?php $tags = get_meta_tags("$website"); echo $tags['description']; Now i would like to change the name peter in the insert statement to reflect whatever is the store variable in $tags['description'] kinda like <?php mysql_query("INSERT INTO ls_sites (site_title, site_url, site_desc) VALUES ($tags['description'], 'Griffin', '35')"); Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309630 Share on other sites More sharing options...
ultraloveninja Posted January 20, 2012 Share Posted January 20, 2012 Looks like you already have it defined: $tags = get_meta_tags("$website"); Then make your SQL query like this: mysql_query("INSERT INTO ls_sites (site_title, site_url, site_desc) VALUES ('$tags', 'Griffin', '35')"); Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309643 Share on other sites More sharing options...
ultraloveninja Posted January 20, 2012 Share Posted January 20, 2012 I also had no idea that you could scrape a site with file_get_contents() Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309647 Share on other sites More sharing options...
renfley Posted January 20, 2012 Author Share Posted January 20, 2012 lol well good new is that if i create a variable with the desc it works great! <?php $site_desc = $tags['description']; i can then echo this and insert into my insert statement which works great but for the title that is an issue. <?php $url="http://www.example.com"; function getTitle($Url){ $str = file_get_contents($Url); if(strlen($str)>0){ preg_match("/\<title\>(.*)\<\/title\>/",$str,$title); return $title[1]; } } $site_title = getTitle("http://www.example.com"); echo $site_title; The echo works perfect but when i add the $site_title variable to my insert statement everything seem ok but nothing get inserted into the databse? Any ideas Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309653 Share on other sites More sharing options...
renfley Posted January 20, 2012 Author Share Posted January 20, 2012 hahahahahahahahahaha figured it out <?php $urlContents = file_get_contents("http://viraleh.com/"); preg_match("/<title>(.*)<\/title>/i", $urlContents, $matches); print($matches[1] . "\n"); // "Example Web Page"echo "<br>"; echo "<br>";echo "<br>"; $shit = ($matches[1] . "\n"); echo $shit; I can then insert the $shit variable in the insert statement and voila full success!!! LOL Everyone echo $Shit FTW Quote Link to comment https://forums.phpfreaks.com/topic/255432-personal-website-crawler/#findComment-1309657 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.