c_pattle Posted March 5, 2011 Share Posted March 5, 2011 I thinking about building a search engine just as a fun project and a way to develop my skills. However I'm not sure where to start... I want it to be a comparison search engine for computer parts so say the user type in "RAM" it would display results from PC World, Ebuyer, etc. I was thinking of using cURL but how do I know where the search results are on each page. For example say I get a html page of results from PC World and store them in a variable how do I then know what to strip out to get the results I want? Would I just have to look at the source code? Also would I then have to work this out for each site I wanted to search from and have a different method of getting the results from search site? If anyone has done anything like this in the past and could give me some advice that would be great. Thanks Quote Link to comment https://forums.phpfreaks.com/topic/229650-php-search-engine/ Share on other sites More sharing options...
ignace Posted March 5, 2011 Share Posted March 5, 2011 To get the results from a website, you'll need to scrape it from the website. When web scraping you should read the terms of use before doing so otherwise you may face a lawsuit. You can scrape the content using DOMDocument or using regular functions like file_get_contents() and fopen(). $content = file_get_contents('http://www.domain.tld'); Using this or the DOMDocument technique is memory intensive as it loads all content into memory. I prefer to use a combination of fopen() and fread() to read the data into chunks and conserve memory in the process. Quote Link to comment https://forums.phpfreaks.com/topic/229650-php-search-engine/#findComment-1183190 Share on other sites More sharing options...
QuickOldCar Posted March 5, 2011 Share Posted March 5, 2011 I use curl and mysql for mine. There's actually a lot more involved as one would think there was to it. Takes lots of time to connect to all the sites just to get all the information you want from them to save. Pulling in random links from various websites, storing them, and just showing the latest discovered links by date is easier. (like a normal search engine is) I would suggest using cassandra as a database and python to do the searching of the data if wanted to do a serious search engine. Quote Link to comment https://forums.phpfreaks.com/topic/229650-php-search-engine/#findComment-1183207 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.