{"id":710,"date":"2010-02-28T13:06:43","date_gmt":"2010-02-28T07:36:43","guid":{"rendered":"http:\/\/www.prasadgupte.com\/go\/?p=710"},"modified":"2011-01-28T16:55:17","modified_gmt":"2011-01-28T11:25:17","slug":"strategies-to-log-retain-activity-data","status":"publish","type":"post","link":"https:\/\/www.prasadgupte.com\/blog\/strategies-to-log-retain-activity-data\/","title":{"rendered":"Strategies to Log &#038; Retain Activity Data"},"content":{"rendered":"<p style=\"text-align: justify;\">My previous article <a href=\"http:\/\/www.prasadgupte.com\/go\/the-need-to-log-retain-activity-data\/\" target=\"_blank\">The Need to Log &amp; Retain Activity Data<\/a> argued the very need of logging &amp; retaining data. In this post, I am listing out various logging strategies along with some brief explanation, utility, associated constraints and effectiveness of each method. As highlighted before, most people fail to understand the difference between logs\/traces, audit trails and database time-stamps. Each of Log\/Trace, Audit Trail &amp; Timestamping has its purpose, pros and cons.<\/p>\n<h2><strong>Log or Trace<\/strong><\/h2>\n<p style=\"text-align: justify;\">When I think of a log, the first thing that comes to my mind is a trace consisting of developer injected <a href=\"http:\/\/java.sun.com\/j2se\/1.4.2\/docs\/api\/java\/lang\/System.html\" target=\"_blank\">SOP<\/a>s (SysOuts), messages\/exceptions generated by the server or any third-party component used. This trace could be written to a flat file or a database table.<\/p>\n<p><strong>Example:<\/strong><\/p>\n<p style=\"font-family:'Courier New',Courier\">2009-31-12 23:59:59 ::: LoginServlet &gt;&gt;&gt; john.doe &gt;&gt;&gt; Incorrect Password<br \/>\n192.168.10.101 &#8211; 10\/Nov\/09:13:55:36 -0700 &#8220;GET \/logo.gif HTTP\/1.0&#8221; 200 2326<br \/>\ninstantiated Bean: com.detangle.ejbs.whatever<br \/>\nJava.Lang.NullPointerException at \u2026..<br \/>\nConnected to ProductionDB: Saved record #862<br \/>\nExecuted Query: INSERT INTO SUPPLIERS\u2026 : 1 row affected<br \/>\ninside getSuppliersForCategory: Category = &#8220;Laptops&#8221;<!--more--><\/p>\n<p><strong>Comments:<\/strong><br \/>\nBeing the most detailed trace for system activity, user activity, exceptions, database operations, it is likely to generate thousands of lines per hour in Multi-tenant SaaS hosting. Such logging should surely be done for various reasons, but is not a real information source for administrators.<\/p>\n<p><strong>#Think:<\/strong> <em>Imagine an auditor asking for all the changes to the &#8216;Amount&#8217; field of Purchase Order #102365 for customer ABC; can you query the trace to generate this report?<\/em><\/p>\n<p>If you would agree, it is not the easiest way for an administrator to trace user activity. Thus, writing enough SOPs does not imply that you have great audit capability.<\/p>\n<h2>Audit Trail or Change History<\/h2>\n<p style=\"text-align: justify;\">Audit trail is nothing but change history at some grain (explained below), accompanied by timestamps and user information. There is thus a table containing the current values for a record, and another history table that maintains versions for each record in the main table. This is the same as \u2018Type 2 change\u2019 as understood by Data-warehousing professionals.<\/p>\n<p><strong>Example:<\/strong><br \/>\nBy grain I mean that we could either add one version row in the history table per changed field or per changed record. Let\u2019s understand this:<\/p>\n<p><strong>Main table:<\/strong><\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>ID (PK)<\/th>\n<th>Name<\/th>\n<th>Designation<\/th>\n<\/tr>\n<tr>\n<td>4052<\/td>\n<td>Peter<\/td>\n<td>Manager<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The day Peter is promoted to \u2018Sr Manager\u2019, a Type1 change would be applied to the Main Table and the designation over-ridden.<\/p>\n<p>To record history of Peter\u2019s past designation(s) and the date of promotion, changes would be recorded in the history table with one of the two schemas (and thus the grain set at field or record):<\/p>\n<p><strong>History Table (Grain: Field):<\/strong><\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>ID (PK)<\/th>\n<th>Version<\/th>\n<th>Field<\/th>\n<th>Old_Val<\/th>\n<th>New_Val<\/th>\n<th>Date<\/th>\n<th>User<\/th>\n<\/tr>\n<tr>\n<td>4052<\/td>\n<td>1<\/td>\n<td>Designation<\/td>\n<td>Manager<\/td>\n<td>Sr Manager<\/td>\n<td>2009-11-10&#8230;<\/td>\n<td>HR_Admin<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>History Table (Grain: Record):<\/strong><\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>ID (PK)<\/th>\n<th>Version<\/th>\n<th>Designation<\/th>\n<th>Archived_On<\/th>\n<th>User<\/th>\n<\/tr>\n<tr>\n<td>4052<\/td>\n<td>1<\/td>\n<td>Manager<\/td>\n<td>2009-11-10&#8230;<\/td>\n<td>HR_Admin<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Comments:<\/strong><br \/>\nManaging versions of records is more of an overhead, unless the end-user or administrator would really want to revert to a previous version or track changes on a frequent basis. ERP databases carry this overhead, but with the advantage of being able to generate any sort of audit report. Without such logging, you could be inviting a lot of trouble from one of the four biggies! Overheads can be minimized by building this mechanism right into the persistence framework.<\/p>\n<h2>Time-stamping<\/h2>\n<p style=\"text-align: justify;\">During my (good-old) development days, no matter whether I created a table in MS-Access or MySQL, I always started off with a key field (PK, Identity, and Auto-Increment OR FK), a couple of timestamps and a flag to indicate row validity.<br \/>\nWith the latest generation of persistence &amp; ORM frameworks, this kind of comes free. But the framework will manage these fields, provided the developer opts in.<\/p>\n<p><strong>Example:<\/strong><br \/>\nContinuing with the example from my <a href=\"http:\/\/www.prasadgupte.com\/go\/the-need-to-log-retain-activity-data\/\" target=\"_blank\">last post<\/a>, when Darth assigned Malory as Trudy\u2019s approver in the leave management system, Bob programmed to create the following record:<\/p>\n<p>Establishing the relation (in a factless fact table) without any additional information:<\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>Employee<\/th>\n<th>Manager<\/th>\n<\/tr>\n<tr>\n<td>Trudy<\/td>\n<td>Malory<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>David wished that some more information be captured; something like\u2026<\/p>\n<p>Establishing the relation with a time-stamp, validity &amp; login:<\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<th>Employee<\/th>\n<th>Manager<\/th>\n<th>Date<\/th>\n<th>isValid<\/th>\n<th>Login<\/th>\n<\/tr>\n<tr>\n<td>Trudy<\/td>\n<td>Malory<\/td>\n<td>2009-11-10&#8230;<\/td>\n<td>false<\/td>\n<td>Darth<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>Soft-Delete: <\/strong>Did you notice the \u2018isValid\u2019 above and something mentioned about row validity? The flag allows you to perform a logical- or a soft-delete. This way the record is not physically removed, but marked invalid. Transactions should become squatters in the database, and live there forever. I believe that timestamps should always be implemented.<\/p>\n<p><strong>Comments:<\/strong><br \/>\nTimestamps will always act as a starting point for investigation. It gives an immediate insight into the ownership of the record.<\/p>\n<p><strong>#Think:<\/strong> <em>Even in the absence of the \u2018User_Login\u2019 field, if only David had the timestamp, he would only have to go through the 100 or fewer entries generated during that fraction of a second! Makes sense to you?<\/em><\/p>\n<p><em><strong>Few Links:<\/strong><\/em><\/p>\n<p><a href=\"http:\/\/aplawrence.com\/MDesrosiers\/audit_logging.html\" target=\"_blank\">Logging as required by US regulation\/law<\/a>:<br \/>\n<a href=\"http:\/\/www.tonymarston.net\/php-mysql\/auditlog.html\" target=\"_blank\">A bit over-engineered, yet comprehensive design<\/a><\/p>\n<p style=\"text-align: justify;\"><a href=\"http:\/\/en.wikipedia.org\/wiki\/Common_Log_Format\" target=\"_blank\">Common Log Format<\/a><br \/>\n<a href=\"http:\/\/en.wikipedia.org\/wiki\/Extended_Log_Format\" target=\"_blank\">Extended Log Format<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My previous article The Need to Log &amp; Retain Activity Data argued the very need of logging &amp; retaining data. In this post, I am listing out various logging strategies along with some brief explanation, utility, associated constraints and effectiveness of each method. As highlighted before, most people fail to understand the difference between logs\/traces, &hellip; <a href=\"https:\/\/www.prasadgupte.com\/blog\/strategies-to-log-retain-activity-data\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Strategies to Log &#038; Retain Activity Data<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[8],"tags":[253,265,293,291,263,264,19,257,292,582,252,266,262,583,255,254],"class_list":["post-710","post","type-post","status-publish","format-standard","hentry","category-data","tag-audit-trail","tag-big-four","tag-change-history","tag-choosing-best-strategy-method","tag-data-granularity","tag-data-warehousing","tag-tech-db","tag-developers","tag-erp","tag-linkedin","tag-log","tag-orm-framework","tag-persistence","tag-pm_ux","tag-timestamp","tag-trace"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/posts\/710","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/comments?post=710"}],"version-history":[{"count":0,"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/posts\/710\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/media?parent=710"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/categories?post=710"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.prasadgupte.com\/blog\/wp-json\/wp\/v2\/tags?post=710"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}