Monday, December 29, 2008

How to scale the data layer

Many a performance problem as a system struggles to support growing demand can be pointed to the data layer. The database can easily become a bottleneck and can be the hardest to scale after the fact because it manages state (in contrast, stateless components are easy to scale).

Of course, the problem may be solved by throwing more hardware at the problem – i.e. upgrading the database server by increasing memory and/or CPU horsepower. However, it is prudent to think about scalability before rather than after the fact.

First, let us consider some elementary physics:

pressure = force / area

For a given force, if we reduce the area to which the force is supplied the pressure increases. Try for example, standing on tip-toes like a ballerina to get a sense of what I’m talking about. The force i.e. your body weight is constant, but we are reducing the surface area to which the force is applied resulting in increased pressure.

Likewise, if we increase the surface area the pressure is reduced. This is how a person can lay on a bed of nails without puncturing the skin – there needs to be of course enough nails.

Typically, we don’t have any real control over the forces applied to the products or solutions we are designing and building. We can speculate and design with certain limits in mind, however what if we come across a situation where those limits are shattered? In the design, we need to think of a way of how we can support this and yet reduce the pressure on the entire system.
One approach is to use horizontal and vertical portioning techniques in the design of the data layer. Both these approaches have the effect of increasing the surface area and thus reducing pressure points as the force (or load) on the system increases compared to a single monolithic data layer (i.e. single database server).

Horizontal partitioning is where the rows of a single logical table are spread over multiple physical databases. For example, customers whose last name begins with A-F may be stored in one database server and customers whose last name begins with G-L are stored in another database server and so on. It is a popular technique used by several large dot-coms as a way to spread the load.

Vertical portioning is a similar concept but involves storing different tables in different physical databases. For example, purchase orders may be stored in one database server whereas invoices may be stored in another.

To learn more about horizontal and vertical partitioning, take a look at http://en.wikipedia.org/wiki/Partition_(database)

Conceptually, the approaches are very straightforward however if you want to leverage them in your design there are several things to look out for.

For horizontal paritioning, you want to strive for even distribution across the partitions. For example, partitioning based on the first letter of the last name may not provide even distribution - V through Z for example may be quite light in the number of records. Hashing techniques may provide a good approach.

For vertical partitioning, references across data models may be tricky. In particular, forget about foreign key constraints. If the two data models have a high degree of coupling between them, then I would go back to the drawing board – perhaps you haven’t found the right boundary. It’s ok to go to one data model, find what you are looking for, then use the result from that query, to find what you ultimately need from the second data model. Avoid distributed transactions (XA/2PC) that span more than one data model if at all possible. Ask yourself, do you really need referential integrity across two data models at all times? If so, then again you should revisit your data design and carve out the right boundaries.

When designing data models at the highest level, strive for high-cohesion (if it changes together, then it stays together) and loose-coupling between the data models. In other words, apply some of the same desirable properties for code design to your data design.

Monday, December 8, 2008

You know you're a geek when

Writing a shopping list the other day, I wrote down Guice instead of Juice. Time for a break from work I think.

Thursday, December 4, 2008

Tomcat Controller update

http://www.jasondchambers.com/2008/07/tomcat-controller.html

Wednesday, November 12, 2008

Using Grails to explore and develop the domain model

Data modeling is typically a fairly important design activity. However, I have a hard time with EAR diagrams and data models as a starting point – particularly for a new system (you don’t really have a choice when dealing with legacy databases). Thinking in objects is much more natural to me. Also, I like the idea of using code to explore the domain model and try a few things out.

In the past, I have used the forward engineering features of Hibernate for this very purpose. Once the tables have been created in the database, I take advantage of a neat feature in JDeveloper where you can reverse engineer a data model diagram. So the process kind of goes like this Object model->Java code->Hibernate DDL->Database->Data model. It worked well in thrashing out the data model for my last project. Of course, it may not work for everyone but that’s ok – it works for me and that’s all that matters ;-)

For a new project I’m working on, I thought I’d try something different. I thought I’d give Grails a spin. This worked even better because Grails can generate such a lot of boiler-plate code for me enabling me to move a lot faster. It can also generate a scaffolding UI so I can interact with and test out the model. Here was the process:
  1. Download and install Grails from grails.org

  2. Follow the Quick Start guide to familiarize yourself with Grails http://grails.org/Quick+Start . During the Quick Start guide, you will learn how to create an application a domain class and a controller

  3. Configure Grails to use Oracle instead of HSQL (I needed to externalize the database so I could browse it and reverse engineer the data model diagram using JDeveloper) – to do this modify grails-app/conf/DataSource.groovy – change the driver class name and the JDBC url – you may also have to copy the Oracle JDBC driver (ojdbc14.jar) to the lib directory

  4. For each entity (where name is the name of the entity you want to model)

    1. $ grails create-domain-class

    2. Modify the generated class located in grails-app/domain/name.groovy – add the attributes

    3. $ grails create-controller

    4. Modify the generated controller class located in grails-app/controllers/nameController.groovy – change the body of the class to look like def scaffold = name

  5. Run the application $ grails run-app

  6. Point your browser to the application

  7. Populate the model by interacting with the controllers through the generated scaffolding

  8. Add some more entities by going to back to step 4

  9. If you want to auto-populate the model with data on startup, add your code to grails-app/conf/BootStrap.groovy

I have to say, I found the out of the box experience with Grails pretty polished. It passed the 15 minute test with flying colors. (If I can’t get something working in under 15 minutes, I tend to dump it).

Wednesday, October 8, 2008

Performance testing tip - assumptions are dangerous

I've been doing a lot of research over the past couple of weeks with a goal to understand how cryptography performs on different platforms, with different algorithms, with different data characteristics. In addition, I was also interested to learn how crypto accelerators perform compared to software based crypto.

I can't share all the findings with you, but I can tell you I was quite surprised by what I found. Once again, I was reminded that assumptions regarding performance are invariably wrong. The reason for this is there are so many subtle factors involved that can affect performance. Also, technology is constantly moving forward. Results and conclusions found today may not necessarily hold true tomorrow. As an example, does anybody remember how people used to chastise Java for being slow. That may have been the case in 1996, but in 2008 it most certainly is not.

A major assumption I had before I conducted my research was that I was convinced that the crypto accelerator would outperform s/w based crypto. This, it turned out was not the case. There were some cases where the s/w based crypto outperformed the crypto accelerator significantly. I also assumed that the SPARC chip would perform pretty well compared to AMD/Intel chips - it didn't really perform well at all in terms of raw speed/performance for the kinds of crypto operations I was performing.

The graph to the left shows the fastest and average performance for performing a SHA-1 hash on a credit card number on various platforms. The units are in nanoseconds. Look how fast Java performs on Intel on both Linux and Windows.

Saturday, September 13, 2008

Quote of the day - Build one to throw it away

This is a quote from Fred Brooks' "The Mythical Man Month".

“Where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time. Hence plan to throw one away; you will, anyhow.”

Great insight. What's also remarkable is this book is over thirty years old. In technology, we like to talk about how much and how fast things change - and they do, however there appear to be lots of things that remain timeless.

Friday, September 12, 2008

Using the Flickr Authentication API - part I - generating the api_sig

All Flickr API calls using an authentication token must be signed. In addition, calls to the flickr.auth.* methods and redirections to the auth page on flickr must also be signed.

The api_sig is to be appended to the URL for all such API calls. The following code can be used to generate the api_sig (of course, you can use a higher-level library such as flickrj if you enjoy eating microwave ready to eat meals over growing your own tomatoes ;-).

package com.jasondchambers.flickr;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Set;
import java.util.SortedMap;
import java.util.TreeMap;

/**
* Generates an api_sig from a list of parameter key value pairs
*
* The signing process goes like this. The parameters are sorted
* e.g. foo=1, bar=2, baz=3 sorts to bar=2, baz=3, foo=1
* The secret and the parameters are concatenated together as follows to provide
* the raw signature
* e.g. SECRETbar2baz3foo1
* An MD5 hash is created, converted to hex and returned - this is to be used as
* the api_sig parameter
*
* See section 8 of [1]
*
* [1] http://www.flickr.com/services/api/auth.spec.html
*
* @author Jason Chambers
*
*/
public class ApiSigGenerator {

private String secret;

public ApiSigGenerator(String secret) {
this.secret = secret;
}

public String sign(String... paramKeyValuePairs) {
try {
// Sort the parameters first
SortedMap<String, String> sortedParameterMap = sort(paramKeyValuePairs);
// Generate the raw signature
String rawApiSig = generateRawApiSig(sortedParameterMap);
// Hash the raw signature and return
return generateMd5(rawApiSig.toString());
}
catch (Exception e) {
throw new FlickrClientException(e);
}
}

private SortedMap<String, String> sort(String[] paramKeyValuePairs) {
SortedMap<String, String> sortedParameterMap = new TreeMap<String, String>();
final int KEY = 0;
final int VALUE = 1;
int i = KEY;
String key = null;
for (String o : paramKeyValuePairs) {
if (i == KEY) {
key = o;
} else {
sortedParameterMap.put(key, o);
}
i = ~i;
}
return sortedParameterMap;
}

private String generateRawApiSig(SortedMap<String, String> sortedParameterMap) {
StringBuffer rawApiSig = new StringBuffer();
rawApiSig.append(secret);
Set<String> keySet = sortedParameterMap.keySet();
for (String k1 : keySet) {
rawApiSig.append(k1);
rawApiSig.append(sortedParameterMap.get(k1));
}
return rawApiSig.toString();
}

private static String generateMd5(String input) throws NoSuchAlgorithmException {
StringBuffer output = new StringBuffer();
MessageDigest md;
md = MessageDigest.getInstance("MD5");
byte[] md5 = md.digest(input.getBytes());
for (int i = 0; i < md5.length; i++) {
String tmpStr = "0" + Integer.toHexString((0xff & md5[i]));
output.append(tmpStr.substring(tmpStr.length() - 2));
}
return output.toString();
}
}