Wednesday, March 26, 2014

A new blog

Hi there. It's been a while since I last blogged here. I've been pretty busy but now it's time for me to take a 3rd stab at blogging. This time, I will focus more narrowly on a particular topic. The lucky topic that has caught my attention is Microservices based architectures. I'm dropping my thoughts ideas at a new site called microsvcs.io. Go check it out!

Sunday, December 5, 2010

Introduction to Gradle

At my new gig, one of my first assignments was to re-engineer our build system. For this task I used Gradle. To ease the transition for the team, I put together a YouTube video

Wednesday, September 29, 2010

Tips on designing an upgrade process

I’d like to share some of my lessons learned and pass these on.
Consider the current working version (x) sacred – the upgrade process should not touch it because you always need to provide an option to roll-back to (x) – perhaps even a couple of hours after the new version (y) has been deployed. Design hint: when upgrading, lay the new version (y) along-side and not on top of version (x). This implies a migration of configuration files at the application layer and data migration at the data layer from version (x) to version (y).


In a high-availability environment, consider is it possible to run both version (x) and (y) concurrently – this is to support the concept of a rolling upgrade where you upgrade each node in turn.


By far, the trickiest part of designing an upgrade process is considering evolutionary changes to the data schema. New tables, columns or constraints may have been added. You need to pay close attention to changes to the schema during development and always think about what this means for the migration of existing data. It’s usually a good idea to embed the version number in the name of the schema/database.


It’s essential that data created with the old version (x) is available with the new version (y) – but also, ensure that the operations the system provides on the data exhibit a consistent behavior where expected.

Wednesday, September 22, 2010

DevNexus presentation

Back in March of 2010, I gave a presentation at the devnexus.com conference in Atlanta.

The presentation was entitled "From whiteboard to product launch". The intent being to share my teams experience in bringing a brand-new product to the marketplace. It covers a wide variety of areas including process, architecture and team organization.

The slides are available here and the audio is available here (I recommend you flip through the slides while listening to the audio).

Monday, December 29, 2008

How to scale the data layer

Many a performance problem as a system struggles to support growing demand can be pointed to the data layer. The database can easily become a bottleneck and can be the hardest to scale after the fact because it manages state (in contrast, stateless components are easy to scale).

Of course, the problem may be solved by throwing more hardware at the problem – i.e. upgrading the database server by increasing memory and/or CPU horsepower. However, it is prudent to think about scalability before rather than after the fact.

First, let us consider some elementary physics:

pressure = force / area

For a given force, if we reduce the area to which the force is applied the pressure increases. Try for example, standing on tip-toes like a ballerina to get a sense of what I’m talking about. The force i.e. your body weight is constant, but we are reducing the surface area to which the force is applied resulting in increased pressure.

Likewise, if we increase the surface area the pressure is reduced. This is how a person can lay on a bed of nails without puncturing the skin – there needs to be of course enough nails.

Typically, we don’t have any real control over the forces applied to the products or solutions we are designing and building. We can speculate and design with certain limits in mind, however what if we come across a situation where those limits are shattered? In the design, we need to think of a way of how we can support this and yet reduce the pressure on the entire system.
One approach is to use horizontal and vertical portioning techniques in the design of the data layer. Both these approaches have the effect of increasing the surface area and thus reducing pressure points as the force (or load) on the system increases compared to a single monolithic data layer (i.e. single database server).

Horizontal partitioning is where the rows of a single logical table are spread over multiple physical databases. For example, customers whose last name begins with A-F may be stored in one database server and customers whose last name begins with G-L are stored in another database server and so on. It is a popular technique used by several large dot-coms as a way to spread the load.

Vertical portioning is a similar concept but involves storing different tables in different physical databases. For example, purchase orders may be stored in one database server whereas invoices may be stored in another.

To learn more about horizontal and vertical partitioning, take a look at http://en.wikipedia.org/wiki/Partition_(database)

Conceptually, the approaches are very straightforward however if you want to leverage them in your design there are several things to look out for.

For horizontal paritioning, you want to strive for even distribution across the partitions. For example, partitioning based on the first letter of the last name may not provide even distribution - V through Z for example may be quite light in the number of records. Hashing techniques may provide a good approach.

For vertical partitioning, references across data models may be tricky. In particular, forget about foreign key constraints. If the two data models have a high degree of coupling between them, then I would go back to the drawing board – perhaps you haven’t found the right boundary. It’s ok to go to one data model, find what you are looking for, then use the result from that query, to find what you ultimately need from the second data model. Avoid distributed transactions (XA/2PC) that span more than one data model if at all possible. Ask yourself, do you really need referential integrity across two data models at all times? If so, then again you should revisit your data design and carve out the right boundaries.

When designing data models at the highest level, strive for high-cohesion (if it changes together, then it stays together) and loose-coupling between the data models. In other words, apply some of the same desirable properties for code design to your data design.

Monday, December 8, 2008

You know you're a geek when

Writing a shopping list the other day, I wrote down Guice instead of Juice. Time for a break from work I think.