joe92 Posted June 13, 2012 Share Posted June 13, 2012 Just recently my colleague emailed me rather concerned that a website we provided data for wasn't getting cached by Google properly. I've just spent the last half an hour thinking of a way to solve the problem before coming to an annoying question in my head, which is: What is the point in Google cache? I can see the need for it when regarding static informative websites such as Wikipedia. Being able to view the content if Wiki goes down could come in handy. But when regarding a small dynamically changing website, is there a need to worry about the cache? The data on this website changes on a very regular interval, probably a minimum of once a week. I cannot think of any purpose for google cache other than to view content on a website when said website is down. In fact, I personally have never actually viewed a cache of a website other than this morning after getting the email. Is there some other purpose to the Google cache that I'm missing? Such as related to helping Google categorise, index and retrieve your site? There is all this hype around creating a website perfect for Google's crawlers that we're spending more time worrying about Google than the actual damn site. Haha. Cheers, Joe Quote Link to comment Share on other sites More sharing options...
RobertP Posted June 13, 2012 Share Posted June 13, 2012 keep your html and css valid Quote Link to comment Share on other sites More sharing options...
ManiacDan Posted June 13, 2012 Share Posted June 13, 2012 The purpose of google cache is so google can return search results. The cache is what they search against. You have to keep a copy of every website on the internet if you're going to search them, that's how it works. Sites which are updated far more frequently than yours are cached with no issue. What's the actual problem you're seeing? Quote Link to comment Share on other sites More sharing options...
joe92 Posted June 13, 2012 Author Share Posted June 13, 2012 I always thought that Google operates on a database type system, wherein the websites are indexed along with other data such as clicks and views and would be searched via the database. This would mean that the cache was an extra but not essential to the workings of the whole system. Although saying that, I am aware that for SEO it's always good to view the cache, and then the plain text file so that you can understand which links are crawled. Thankfully though, that's not my job We deliver the data through a quick ajax request on page load. The data is so small, usually less than 1 kilobyte, that there is no lag in seeing the page load up. The only problem is that Google does not cache this data (all the sites are listed though). I think that is because it reads as plain text hence ignoring JavaScript, but I'm not too sure as Google is immensely complicated. What solution would you suggest for this? Currently, we provide data for about 5 websites so a complete overhaul now is going to be much preferable to a complete overhaul in a months time or even later. What we are thinking at the moment is using iframes and storing the data in small HTML files for the iframes to read which we can post through ftp to the clients server if needed. Joe Quote Link to comment Share on other sites More sharing options...
ManiacDan Posted June 13, 2012 Share Posted June 13, 2012 They do work off a database, but it's not any database you would recognize. Google's data store is massively complex. But when you get right down to it, they need a copy of every page. They have a copy of this page. They have to if they're going to see that I said "they need a copy of every page." In an hour, if you search for: maniacdan "they need a copy of every page" This page will come up. Data fetched through ajax probably will not be read by google, no. I'm not sure how sites like Gawker do it, but I assume they have special code that detects if they're being crawled by the googlebot and serves up the article in plaintext rather than through ajax. I would do something similar to that: If the user-agent supports javascript (like all modern browsers), send the page "normal" with the ajax call. if they're the googlebot, a "mini" browser like opera or a cell phone, or a text-only browser like lynx, serve them the article in a more plaintext format. Or you could just print the content when you load the page and keep the ajax call for subsequent clicks. Warning: Serving googlebot different content than normal users is against their rules and could get you blacklisted from google. Read up on their rules. I think something like what I suggested is ok, it's a rule about the content itself. Quote Link to comment Share on other sites More sharing options...
joe92 Posted June 13, 2012 Author Share Posted June 13, 2012 That sounds like an avenue to explore, unfortunately some of these websites are written purely in HTML with CSS and JS so it can't be implemented here. I'm thinking the iframe route it going to be most effective. I just can't think of a way to get the data to a HTML page any other way. Thanks for your help. Joe Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.