When a system becomes distributed …

When we build a system, normally we design a simple framework at first. Then we will add more features to it. When the system is growing bigger and distributed, you will find some weird and interesting bugs.

One example is about date time. When a system was designed at the beginning, the developers decided to use local time to save all the time fields, including “date of birth”. When the program was running locally, there was no problem. But when a call center using Windows Forms application became part of the system, a weird bug arose: some clients’ date of birth had one day difference between sub systems (e.g. 08/18/1966 in one database, but 08/19/1966 in another database). Because of wrong DOB, the clients could not pass credit check.

I spent the whole afternoon and evening yesterday to check code and logs, but I didn’t find any reason. Then before I went bed last night, suddenly I realized the bug was caused by time zone difference in computers.

For example, the business layer server is in “-5” time zone, but a Call Center agent’s computer is set to time zone “-4”. When the agent inputs DOB “08/18/1966”, the Call Center Windows Forms program will pass that date (“08/18/1966 00:00:00”) in local time format to business layer server. Because of time zone difference, the server will receive “08/17/1966 23:00:00” instead – one day difference occurs.

Some computers may have random time zone settings. The DOB field becomes messy.

A quick, temporary, logically (not physically) simple solution: to make all machines in same time zone. For the long term solution, the system should change to use UTC, not local time. That will require longer time to change database and code.

0 comments: