phpsycho Posted September 27, 2011 Share Posted September 27, 2011 Okay so I am scraping websites for their descriptions keywords and titles. I noticed that a lot of websites use the same keywords and descriptions on every page.. so my idea is to scrape the index and find all the links in there and scrape them all then after they been scraped check all of the descriptions and if the descriptions match then pull some text unique to each page and use that. I can't seem to wrap my head around it.. how would I accomplish this? I scrape with curl then find keywords description and title then find all links on the site and scrape those. soo I was thinking making an array of the descriptions and then checking and inserting to the db but doesn't seem like it would work. Any ideas? Oh also.. how would I grab just text from each page that is different from every other page? lol very confusing Quote Link to comment https://forums.phpfreaks.com/topic/247971-scraping-websites/ Share on other sites More sharing options...
phpsycho Posted September 27, 2011 Author Share Posted September 27, 2011 hmmm is that even a good idea? I mean it would take forever for it to scrape those sites sense I have to connect to every link.. Any better ideas? Quote Link to comment https://forums.phpfreaks.com/topic/247971-scraping-websites/#findComment-1273345 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.