Arel3 Posted October 29, 2009 Share Posted October 29, 2009 I have found a script that posts RSS's for me on a site that I'm building. However I would like to also post other articles that don't have a feed. Is there a legal/respectable way to harvest and post these articles on my site? Does such a script or application exist? What would be the keywords I should search for? Quote Link to comment https://forums.phpfreaks.com/topic/179466-rss-poster-script/ Share on other sites More sharing options...
jonsjava Posted October 29, 2009 Share Posted October 29, 2009 simplexml_load_file() is what you're looking for. You'll need to get permission from the copyright holder to redistribute their work, otherwise you can be slapped with a take-down notice and/or taken to court. Quote Link to comment https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947016 Share on other sites More sharing options...
JonnoTheDev Posted October 29, 2009 Share Posted October 29, 2009 However I would like to also post other articles that don't have a feed If the data is not freely available via an API or feed then its guaranteed that the website owner doesn't want you to have the data (or hasn't the skills to create a data source). However most article sites contain user submitted articles, they do not belong to the website owner so you will find the same article all over the web. If you want these articles then you need to write a bot that can extract the page content and then filter all the shit out (html, etc) and leave the article. There will not be a specific script to do this as every website is different, however the tools to make it work are available. Look at CURL. Be careful when scraping data. Quote Link to comment https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947195 Share on other sites More sharing options...
Arel3 Posted October 29, 2009 Author Share Posted October 29, 2009 Thank you both. jonsjava, the sites I'm refering to that have the articles I'd like to post don't have a feed. So they don't have an XML file. That is very good info, the kind of info I was look for though...thank you! Yes, neil.johnson, I am being very careful with it. That is why I've asked the expert freaks Quote Link to comment https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947396 Share on other sites More sharing options...
Arel3 Posted October 29, 2009 Author Share Posted October 29, 2009 However I would like to also post other articles that don't have a feed If the data is not freely available via an API or feed then its guaranteed that the website owner doesn't want you to have the data (or hasn't the skills to create a data source). However most article sites contain user submitted articles, they do not belong to the website owner so you will find the same article all over the web. If you want these articles then you need to write a bot that can extract the page content and then filter all the shit out (html, etc) and leave the article. There will not be a specific script to do this as every website is different, however the tools to make it work are available. Look at CURL. Be careful when scraping data. Do you have suggestions of where I can look to create a bot specific to each site? Quote Link to comment https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947397 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.