pthurmond Posted September 28, 2010 Share Posted September 28, 2010 Hello, I am working on a caching class and all I have left to figure out now is the string compression. I am storing files locally on the server and may eventually expand the option for storing it in the database. But I want to reduce file size while not sacrificing speed as much as possible. The first function I used was the bzip2 bzcompress function. It did a great job with a 92% compression rate, but I noticed a half-second to full-second delay in the decompression. So what I would like to know is does anyone know which compression functions offer the most speed and what their compression ratios look like? This is for a simple string (well XML that I have reduced with some string replacements), so there is no pre-compression to it. The compression libraries available to me on my server are these: zlib, zip, and bz2 Thoughts? Thanks, Patrick Quote Link to comment Share on other sites More sharing options...
AbraCadaver Posted September 28, 2010 Share Posted September 28, 2010 Unless you have large amounts of data that can be highly compressed, you will see a performance hit using compression here. I wouldn't use it. Quote Link to comment Share on other sites More sharing options...
pthurmond Posted September 28, 2010 Author Share Posted September 28, 2010 The problem is right now our cache is about 40,000 files and it is taking up about 815MB. If it were to be compressed then it would take up only 65.22MB. So this may be beneficial. On the flip side I also realize that disk space is really cheap. That said, it would still be interesting to know or see some good comparisons. Thanks, Patrick Quote Link to comment Share on other sites More sharing options...
pthurmond Posted September 28, 2010 Author Share Posted September 28, 2010 Ok, so I finally had the chance to do some benchmarking for this. I compared the Zlib functions to the Bzip2 functions. Name Run Time (in seconds) Load File 0.000370979309082 BZip2 Compression 0.0489809513092 BZip2 DeCompression 0.00877499580383 ZLib Compression 0.0100789070129 ZLib DeCompression 0.000354051589966 FileSize Original 258,699bytes BZip2 21,503 ZLib 27,123 So it depends on what you want to trade off here. I would say that for my purposes I would be willing to sacrifice a little extra disk space for the speed gain. Though its obviously almost negligible in terms of significance that could change as server load increases. Thanks, Patrick Quote Link to comment Share on other sites More sharing options...
AbraCadaver Posted September 28, 2010 Share Posted September 28, 2010 How long does it take your app to build a cache of 815MB from scratch? The main purpose of the cache is to improve speed, but compression negates some of that. So how about purging the cache periodically, that's what I do. Quote Link to comment Share on other sites More sharing options...
gizmola Posted September 28, 2010 Share Posted September 28, 2010 Interesting benchmarks, but not surprising really. It's well known that bzip is better for compressing certain types of data, but also slower. While on one hand it's great to 815mb of data down to 20, we're still talking about 815mb. There is no way in the world that you could convince me that effort is worth the processing time and delay required, reqardless of the storage savings. If the data will grow rapidly and some day be taking up terabytes, then I would have a different opinion, but when you are talking less than 1gig of data.... Just not worth doing. Quote Link to comment Share on other sites More sharing options...
pthurmond Posted September 28, 2010 Author Share Posted September 28, 2010 That is what we have built up in cache over the course of about 2 weeks. When the file has hit its expiration age it is simply overwritten on the next page visit. Though its not a bad idea to just setup a cleanup cron job for it. The reason I want to compress the cache files is that unfortunately this site is sitting on a cluster server system with a ton of other sites (all owned by the same company). So they are sharing resources, an internal shared server. Because of that we are more limited on space and since the compression/decompression time is negligible on overall load time, I think picking the middle ground is the right balance here. If I had my choice the site would be on its own dedicated server. But alas, we can't always get what we want. -Patrick Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.