ETL-diagram_167x167What does it mean?
If you have ever worked in the Data Management space (i.e., a data warehouse environment) you would easily agree that it’s very resource intensive. Every step of the way, be it build, deploy or support you need hands on the deck to actively manage the process.  But often enough you see resources that are tied up with maintenance related activities rather than build related.

In a nut-shell, “ETL for Lights Out” can be described more as developing a solution that does not require a lot of maintenance so that resources can primarily focus on build related activities and value add to the solution.

As easy as it is to summarize, there are several factors that influence building a robust ETL solution that works in “lights-off” mode.

Consistent architecture
First off, you need an architecture that is consistent in its approach. Not to state the obvious, but this would be the foundation on which you could build a risk-reduced, fault tolerant ETL solution.  Design considerations need to be consistent across time and to the best extent possible, minimize exceptions to the rules.

Some of the best practices that facilitate a robust architecture could be grouped under three broad categories – Data, Process and Technology.

Data:   While building the Integration model focus on flexibility.  Always separate the Business data model from the Integration metadata model And be sure to include a Data Quality strategy as part of the overall architecture.

I touch a bit more on this further in the blog.  Also, equally important is to have a Master Data Management strategy as part of the overall architecture. Design your Integration architecture with attention to Load and downstream Query performance. Plan to have a data retention strategy which will keep your data volumes trim and responsive. Implement Data marts and Views to support reporting requirements and new requests for information.

Process:  When it comes to process, implement re-usable routines and good standards across all components of the solution –whether MDM, Data Integration or Reporting layer. Be sure to define Service Levels for data and information availability and include a high availability strategy in your overall architecture. One  commonly overlooked item is a strategy for disaster recovery and its provisioning and readiness.

Technology:  When considering technology, implement or recommend only the best of breed toolsets for all components such as Data management, Data Integration, Reporting, EPM, Workflow management and even scripting.

As you see, a consistent approach to architecture promotes a uniform, predictable look and feel for the solution which minimizes points of failure in the system. Issue resolution becomes so much easier with a consistent approach. Needless to say, embracing best practices, consistent conventions and standards are very relevant in keeping the architecture intact.

Scalable design
Creating a solution that adapts well to constant scope changes requires a lot of up-front effort.  Managing scope becomes very critical earlier on in the engagement, as it is throughout.

Design considerations here include such things as a staged approach, consistent load philosophy (i.e. all versus nothing), how error tolerant the code is, automatic data-quality checks, performance on high volumes of data, optimal configuration settings for the ETL tool, etc. This knowledge comes from deep experience accumulated over the years.

All of the above factors need to be addressed one way or the other when taking a “Lights-Out” approach.

Data quality and rigorous test cycles
There are two focus areas that generously contribute to creating a Lights-Out solution, one that is done earlier on, and the other, towards the latter part.

It’s no secret that data quality issues contribute significantly to failures so it’s important to spend a decent amount of time early in the project timeline to firm up some of the data quality and governance issues. In most places, data governance is an integral part of the overall data management practice.

Almost equally important is the thoroughness of test cycles. I can’t stress enough the role this plays in creating the most stable solution. Regardless of how good the design and coding standards are, they mean nothing if the code keeps failing. Constant failures make the solution “support-heavy” which leads into a “minimum lights-on” approach. This does require extensive planning for the test phases. Creating intuitive test scenarios and use-cases comes from a wealth of experience in the subject matter and in the vertical industry the solution is built for. Constant “lessons learned” sessions are required to improve the quality of the solution in making it “self-healing”.

Robust automation
Where there are data warehouses you would always see big batch cycles running the data integration and management components. To seamlessly integrate this process, you would almost always have to use scheduler products with a mix of scripting to integrate with your ETL layer.  Several robust scheduling tools available in the market like – Autosys, Control-M etc., handle workflow management tasks effortlessly.  These are all configurable, scalable and stable for any enterprise to leverage as part of their data management practices.

Looking beyond ETL integration tools
Sometimes the functional requirements may be so complex that not all use-case scenarios are easily supported by a scheduling tool. This warrants a need to build custom integration toolkits. At one of Cervello’s current engagements with a Global Risk & Insurance Broker, we successfully implemented such a  solution that seamlessly integrates the various layers from end to end. Such a metadata driven architecture greatly reduces the need for a hands-on approach to running ETL jobs where the business user invokes data integration tasks via the Portal with the click of a button.

What’s next?  Automated build / deploy and operate?
So we’ve discussed all the opportunities to revisit ETL for a Lights Out type approach for build and support, but what about deployment? I can still remember spending countless weekends in the past few years doing production releases. However, at a recent engagement with a Global Financial Services firm, we had the opportunity to automate the entire workflow as well, using some really robust tools like Bamboo coupled with the existing version control repositories (and a little bit of scripting thrown in.) With an intuitive web-based interface, we could seamlessly deploy all the ETL components in a, get this, consistent and repeatable manner! And just like that, we no longer needed any resources on deck during release processes.

Summary
In summary, the challenges we face in running “support-heavy” solutions must be viewed as opportunities to leverage known best practices  to transition into a self-supporting model. Clearly, there is more value for budget dollars to be spent on build related initiatives. Solution integrators, such as Cervello, increasingly play a critical role in assisting in both the building of solutions and in the transition to a more “lights-out” model.

In my next blog, I’ll cover the importance of having a Master Data strategy as part of the overall data management strategy.

ETL for Lights OutPOSTED BY: Shakti Krishnamurthy

Leave a Reply

Your email address will not be published. Required fields are marked *

One comment on “Revisiting ETL for Lights Out

  1. It’s really a great pleasure to provide an opinion about ETL tools. These are very important and useful to all the people from all over the world. ETL tools are useful to everyone which help to transform any data into any database fast and easy and comfortably.
    ETL