Those are some good suggestions. However, I am already doing some of the things you mentioned. There are cron jobs set up for crawling, and those cron jobs parse the import.io crawlers I set up for Reddit and Imgur. The training of the crawlers was already done in my custom import.io scripts so I don't need to go through the hassle of dom/xpath crawling as the crawlers grab the image descriptions, paths and everything else I need. The idea of using the Cloudinary CDN was that I can manipulate images on the fly (like resizing, custom overlays etc.) without using my own server's CPU. Simply hotlinking an image would be bad for Imgur's bandwidth and it would also kill my server with additional resizing and manipulation responsibilities. The URLs for the images are unique indexes in the database and duplicates are ignored. The crawlers run every three hours and the content is updated on the site as well as on Twitter and Facebook using Zapier.
Of course, the idea of the site seems like stealing as you said, which makes me kind of sad. I made it basically to collect all the cool posts that come over to Reddit everyday and are just lost in the course of time and are impossible to get back to and I just wanted a nice archive of the best posts posted each day. However, putting ads on the site was a way to pay for the servers and SASS costs that are involved in running the service. I know a lot of people over time might appreciate an archive of this kind, but is there a more "white hat" way of doing it? I put source links to the original Imgur posts on each page so in case people wanted to see the original post, they can just click on the source link and visit that where credit is due. Its not like the original posters can never be reached from my site and can never be found out, and its not like I am taking the entire credit for their pictures. If anyone has a problem with any of their images being posted here they can always ping the Twitter account or Facebook account and I'll take that image out, no questions asked.
I could take out the ads and just shut down the service, I am not making any money out of it, I don't expect to either in the coming future as it is hard to generate unique content out of a site like this to attract organic traffic from search engines. I don't know what to do, I guess I'll take some time to decide over that. It was fun to build though and I enjoyed my time with it.