Jump to content

Recommended Posts

I have found a script that posts RSS's for me on a site that I'm building. However I would like to also post other articles that don't have a feed. Is there a legal/respectable way to harvest and post these articles on my site? Does such a script or application exist? What would be the keywords I should search for?

Link to comment
https://forums.phpfreaks.com/topic/179466-rss-poster-script/
Share on other sites

However I would like to also post other articles that don't have a feed

If the data is not freely available via an API or feed then its guaranteed that the website owner doesn't want you to have the data (or hasn't the skills to create a data source). However most article sites contain user submitted articles, they do not belong to the website owner so you will find the same article all over the web.

If you want these articles then you need to write a bot that can extract the page content and then filter all the shit out (html, etc) and leave the article. There will not be a specific script to do this as every website is different, however the tools to make it work are available. Look at CURL. Be careful when scraping data.

Link to comment
https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947195
Share on other sites

Thank you both.

 

jonsjava, the sites I'm refering to that have the articles I'd like to post don't have a feed. So they don't have an XML file. That is very good info, the kind of info I was look for though...thank you!

 

 

Yes, neil.johnson, I am being very careful with it. That is why I've asked the expert freaks ;):P

Link to comment
https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947396
Share on other sites

However I would like to also post other articles that don't have a feed

If the data is not freely available via an API or feed then its guaranteed that the website owner doesn't want you to have the data (or hasn't the skills to create a data source). However most article sites contain user submitted articles, they do not belong to the website owner so you will find the same article all over the web.

If you want these articles then you need to write a bot that can extract the page content and then filter all the shit out (html, etc) and leave the article. There will not be a specific script to do this as every website is different, however the tools to make it work are available. Look at CURL. Be careful when scraping data.

 

Do you have suggestions of where I can look to create a bot specific to each site?

Link to comment
https://forums.phpfreaks.com/topic/179466-rss-poster-script/#findComment-947397
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.