Caching is all about keeping data closest to where it is needed. It is first and foremost a performance optimization, however you have to ask yourself a couple of questions about the data:
1. Is it read-only, read-mostly or write-mostly?
2. How often does the data change?
3. What is the tolerance in each case for data staleness?
Data that falls into the category of read-only/mostly, has a low rate of change and a reasonable tolerance for data staleness are ideal candidates for caching. A key performance metric is cache-hits versus cache-misses. Obviously, you want to look for cache-hits to get a good return on the investment. If you get few cache-hits and lots of cache-misses - be careful, you may need to rethink your caching strategy as you may have made performance worse rather than better.
For caching web-objects for a global web-site, edge caching providers such as Akamai are the way to go. This works well for static objects such as images, CSS, JS, HTML etc. It moves the content closer to the end user. Also, recall that every point across the network may have some degree of cache-ability (e.g. proxy servers, routers, browsers).
Ensure the HTTP cache-control headers are set appropriately to reduce trips back to the server when the version in the browser cache could be plenty fresh (enough)
For caching data on the app server that is sourced from some back-end service, database or EIS system, memcached or JBossCache work well. Clustering/cache replication may not always be required. Clustering is a good option if the data source is the end-user - if the data source is a database, you can recover the cache quite easily - if the data source is the end user (e.g. session information), you cannot very well ask them, "My cache seems to have gone down the toilet taking your session information with it - tell me again, how many shares of IBM did you want to buy again? (sorry)"
You can reduce a significant amount of load on back-end resources through caching - but only if the data is a good candidate for caching. Hibernate/eh-cache may work pretty well here in addition to memcached and JBossCache.
Oracle Coherence in my opinion has evolved way beyond it's caching roots. Using Coherence, you can build a huge data grid across commodity hardware. Each node in the grid has it's share of responsibilities and is backed up by n other nodes for redundancy. Operations against the data can move to where the data is rather than move the data to where the operation occurs (our traditional way of approaching data processing for the past gozillion years).
Friday, July 25, 2008
Subscribe to:
Post Comments (Atom)

2 comments:
hi there,
"Principal Architect currently specializing in securing data at rest" : do you mean securing data exposed through REST ?
Thank you,
BR,
~A
Hi BR
Data at rest implies it is sitting somewhere perhaps on a filesystem, in a database or on a web-server. Think safety-deposit box or vault.
The other challenge to securing data, is securing data in motion or transit. Think Wells-Fargo armoured truck.
Jason
Post a Comment