The Need to Log & Retain Activity Data

In the current age of On-Demand & SaaS combined with multi-tenant hosting, we are likely to generate tons of activity data every hour. For this data to be useful to administration & support teams, IT has to plan for its conversion to information. The strategy to implement information logging should be built right into the development process.

The Confusion

However, to most people, that I have communicated with while developing systems,

  • the terms Audit log, server log, audit trail, time-stamping, change history are synonymous
  • implementing ‘soft-delete’ probably appears a development overhead

I don’t know if it is because of exposure to ERP or otherwise, but unlike these people, I am overly sensitive to recording audit trails. Are you one of these? Are you not convinced about implementing a logging strategy? Then this post was written thinking about you.

An Example

Imagine a scenario from a Leave Management System with some fictitious characters:

When Bob built the system, he thought that for running his workflow, capturing the relationship between users – employee and manager – was sufficient.

Now, Trudy (an employee) has convinced Darth (system administrator) to configure her friend Mallory (a manager) as Trudy’s leave approver in the system. But in reality, Trudy’s manager is Alice.

Now when Trudy applies for leave, she gets it approved by Mallory. One day Alice figures this out and asks David, manager of the admin team, to investigate.

The investigation puts David in an embarrassing situation: all he knows is that Trudy & Mallory are party to the fraud, but he can’t figure out ‘who’ from his team set-up this hierarchy? Because he does not know when it was done, he has to check daily from the day Trudy joined (a year back). At a modest average file size of 50MB/day, it turns out that he needs to figure out the culprit from his team from 18.25GB of trace.

Would you like to be David? Or, have you been Bob already?

The Analyst’s Stand

As an analyst, you would often see development teams questioning the very need or the priority of this requirement; “Has this come from a customer?”, “Will this information be accessed even once in a million reads?” To this, Analysts need to take a rock-solid stand: “Customers are not interested in the database schema, neither has this been asked for, and nor are we going to validate this need”. Even if logged information is going to be queried once in a trillion reads, it needs to be done! You can surely create a win-win by agreeing to defer development of the reporting UI, but you should start capturing activity right from day one of the PDLC.

‘Hotel California’ database

Always ask the dev team for a ‘Hotel California’ database! I’ve created this analogy from the Eagles’ hit song and adapted the lyrics to read ‘data can check-out any time it likes, but it can never leave!’ How to create one? Simple, Get them to program for a soft-delete for every record in the database. And this rule is not a function of what you’re building: retaining data will come in handy for every software application – unless you’re building something that defies gravity.

I hope I have conveyed a lesson here. Logging should be so tightly knit into the security guidelines, and thus the framework, that it no one ever has to code for it! In my next post, I will attempt to list out various logging strategies along with brief explanation, utility, associated constraints and effectiveness for each one.