Why analytics is getting big

February 8, 2011

One of the most heard buzz words in the world of Business Intelligence these days is analytics. As different vendors use this term differently (of course to their own advantage ūüôā ) it is important to give some sort of definition before I continue this post. Therefore I would like to refer to a quote of Thomas Davenport “I think of analytics as a subset of BI based on statistics, prediction and optimization.” Using this definition it is easy to conclude analytics is not the same as reporting, dashboarding or even OLAP. Yes these techniques can be used to present the reults of analytics, but they are not analytics themselves. In my view analytics is more like an engine applying data mining and predictive algorithms to the data in order to enable things like sales forecasts, customer segmentation, fraud detection, revenue prediction and cash flow optimization.

From a technical point of view an analytical model is like an ETL mapping that adds “intelligence” to the raw data. from a functional point of view analytics enables organisations to define an answer on question like “why is this happening?”, “what will happen (if this trend continues)?” and even “what should we do to make it happen?” whereas traditional BI solutions only focus on what has and is currently happening.

Where in the past analytics were only used in the world of science, the usage spreads. That is no wonder if you take into account the heavilly growing amounts of data that can be used to base decisions on. And of course there are obstacles that need to be taken before an organizating can fully take advantage of analytics. High quality data, or a good insight in the quality of the data is needed before applying analytics. And form a business point of view a change in management culture from “common business sense” to fact based decisioning, relying on analytics, is needed to optimally take advantage. And although this is not something that can be achieved in months, the return on investment can be enormous if it enables organisations to “sense” and “anticipate”. That given, it is no wonder why SAS reports a 26% growth in income related to business analytics. It is also no wonder why IBM bought SPSS and heavily invests in this area. Analytics is getting big!

A new godfather?

August 20, 2010

The discussions of¬†the Inmon versus Kimball¬†approach to datawarehousing still popup sometimes. And just when you think it is getting quit, the “godfathers” themselves cannot resist to continue arguing.¬†¬†

In my opinion¬†these discussions are a waste of time, since all that can be said about it has allready been said. Just as there is no single version of the truth,¬† I’m convinced there’s no single best approach to datawarehousing. Each approach has its own benefits (and downsides).¬†So choose the approach that is best¬†suitable in¬†your situation. Moreover, you don’t have to choose one of them. Depending on your situation a¬†hybrid approach might fit even better.¬†

That said, the times¬†are changing.¬†The¬†information environment business users live in is changing. The boundaries between operational, tactical and strategical information are blurring. Because the outside world is changing at an even higher pace, business need to react faster. There’s no time for bureacratic, hierarchical controlled processes, and collaboration¬†becomes ¬†key. The IT environment needs to support¬†this.¬†That’s what the enterprise 2.0. approaches¬†are all about. Instead of / besides¬†a datawarehouse architecture, this needs to be supported with an¬†information architecture that captures all relevant information. Structured or unstructures, operational, tactical or strategic. That’s what makes the Business Integrated Insight concept of Barry Devlin,¬†¬†although it is still largely on a conceptual level,¬†so strong.

So stop discussing Inmon versus Kimball and dive into the Business Integrated Insight concept of Barry Devlin. A new godfather is born!

Reducing the DWH data integration effort

April 30, 2010

¬†Why is it that¬†in most projects 70% of the¬†BI/DWH development effort is spent on data integration? What makes data integration so complex? According to datawarhouse guru Ralph Kimbal there are 38 unique subsystems of ETL.¬†After reading the Kimball article, you may conclude it is no wonder that so much effort is spent on¬†this aspect of BI/DWH. However, I’m convinced the¬†effort spent on ETL in BI/DWH development¬†can be reduced drastically by:

1. Focussing on data quality issues at the source instead of dealing with them in ETL

Having just completed¬†an excelelnt¬†training on Total Information Quality Management, I’m even more convinced that a lot of effort spent on data quality issues¬†should be avoided. If the data quality of the source system is not improved, the effort spent here¬†is scrap and rework at best.

So reduce the effort here by spending more time on finding¬†the causes for these data quality issues where they arise (gemba: In quality management gemba means the manufacturing floor and the idea is that if a problem occurs, the engineers must go there to understand the full impact of the problem).¬†¬†Improve processes and¬†use mistake proofing techniques¬†to avoid these. (Poka yoke: a Japanese term that means fail-safing or “mistake-proofing”. A poka-yoke is any mechanism in a manufacturing ocess that helps an equipment operator avoid (yokeru) mistakes (poka). Its purpose is to eliminate product defects by preventing, correcting, or drawing attention to human errors as they occur).

In other words: focus on source data quality improvement so you can reduce the eforts spent on data quality checks and data cleansing.

2. Architecting for change

The datawarehouse and datamarts should be architected in such a way that changes require minimal effort. One elegant way to¬†achieve this¬†this is to apply the Data Vault method in which¬†the datawarehouse is moddeled according to a standard method and data integration complexity is put between the datawarehouse and the (staging out area or datamarts) , allowing for loading one to one data from the source in a largely standardised manner. In this way changes in business requirements have minimal impact on the datawarehouse layer.¬†Be sure to check out the¬†“The next generation EDW” articles¬†on¬†the articles page¬†for more information.

3. Generate instead of build

I am in strong favour of the use of products that provide metadata driven automation and generation of SQL and DDL such as  Kalido DIW and  BI-Ready. When using these tools changes are much faster to realize, morover a lot mof time and effort is saved because less coding is required. With these tools data integration is much more a case of modelling, configuring and generating instead of building.

So yes, I’m convinced data integration effort¬†ican be significantly reduced.

Requirements Engineering for BI

January 26, 2010

When you start your BI project, gathering the functional busines requirements is one of the first steps you will need to take. What reports are needed? Which measures need to be shown, and against what dimensions do people want to analyse these measures?

Of course you might just ask each stakeholder these questions, but chances are that stakeholders find it hard to define what they exactly need. This again might result in stakeholders not asking all they need, or stakeholders asking more then they really need. As explained in this post various techniques
can be used to gather the requirements. However, how do you make sure the requirements are consistent, unambiguous, complete but also really necesarry? How do I challange the stakeholders?

A few years ago I took a requirements analysis course to learn more about the steps involved in defining and anslysing  requirements. However, this course was aimed at ICT projects  in general. The techniques that were covered (uses case modelling and RUP) seem to me to be very usefull when used for requirements engineering for process-oriented applications. However when using them for specifying BI requirements they seem to lack sufficient data-orientation.

In my last project where I needed to determine and specifiy the reporting requirements for one of my customers I used a top-down appraoch. First I started off with all business processes. I was lucky that these had just been reviewed and updated, so they were a good starting point. Business processes consist of several steps (phases). For every phase reporting needs might exist to monitor or analalyse progress and performance. For every such reporting need one or more reports can be created. This helped me in specifying a complete (are all business process steps, that are in scope, covered?) set of reports. Moreover it helped me to challenge the stakeholders. I now was able to ask questions like: how does this report help you in monitoring or analysing this progress and performance of this specific part of the busines process? And which of these reports you have specified will help you best in monitoring the progress?

So yes, this helped, but it is still not a perfect, neither complete requirements engineering methodology for BI. My search continues…

Insights from Gartners Magic Quadrant for Data Integration Tools

December 6, 2009

On¬†25¬†November¬†Gartner published a new version of its¬†Magic Quadrant for Data Integration Tools, 2009. Here are¬†the things¬†that caught my attention, and my thoughts on it.¬†Don’t take it all to seriously and be sure to check the Gartner Magic Quadrant for Dummies as well ūüôā

Gartners vision is that leaders in the data integration tools market also are front runners in the convergence of single-purpose tools into an offering that supports a range of data delivery styles. Whenlooking at the vendors in the leaders quadrant the following can be noticed:

  • Informatica has a (historical?) lack of focus on virtualization and data federation when comparing to some of its competitors
  • Oracle¬†focusses mainly on 2 different tools for data integration, OWB and¬†ODI. And although they¬†are both part of the Oracle Data Integrator Enterprise Edition (ODIEE),¬†additional¬†data integration tools of OrSAPacle, such as ODSI (which¬†adds federation capabilities) and the data quality OEM of Trillium are not part of this package.¬†¬†¬†
  • With data integrator, Data Federator and metadata Management tools from Business Objects and¬†the data integration capabilities of the Sextractors for BW and Netweaver PI it will take SAP a lot of integration and rationalization efforts¬†to¬†realize a consitent and integrated data integration platform.
  • IBM‘s infosphere product is probably the most extensive data integration¬†platform.¬†However, it also is perceived as being very complex. Yes, completeness comes at a cost.

Another thing that caught my attention is that Microsoft is still lagging behind in this area. Although the introduction of SSIS was a leap forward compared to its previous SQL Server DTS ETL tooling. SSIS is mainly focussed at batch and bulk oriented data delivery. Other data integration styles such as data federation and data replication are not well adressed. Moreover, metadatamangement capabilities still seem to be very weak. They are not there (yet).

Why not to split-up the DWH and BI team

October 27, 2009

Several organisations have split their BI front-end development team from their DWH back-end development team. Because of the strong interdependency between these teams I do not think this is a good idea.

Often the DWH team, dealing with complex¬†data integration and architectural¬†issues, cannot keep up to speed with the BI front-end development team. To avoid delays caused by the dependency on the DWH team, the front-end team¬† creates “work-arounds” that reduce the dependency on the DWH team.¬†Data integration complexity¬†is implemented in the front-end¬†and when data from operational source systems is not available in the DWH it is accessed directly from the source. These choices might turn¬†out positive on short term, but on long term these sub-optimal choices result in an¬†uncontrollable “spagethi BI and DWH architecture”. Moreover organisations tend to give the BI team¬†all the credits for delivering fancy reports, while¬†disregarding the added value of the DWH team that enables the creation of these reports. They¬†are dealing with the most complex data-integration issues.

Insights from Gartners Hype Cycle BI

September 11, 2009

On 27 July this year Gartner published a new version of its Hype Cycle for Business Intelligence and Performance Management, 2009 (I have not found a freely available copy of this reports, but as soon as I find one I will add it to the analysts page). Here are some points that caught my attention, and my thoughts on it For an explanation on the various stages in the hype cycle model see this page):

  • Collaborative Decision Making (Technology Trigger, Mainstream adoption: 5 to 10 years, Transformational Benefit): By merging Social Software, Knowledge Management and Business Intelligence¬†together¬†a new style Decision Support Systems will emerge that allow users to remotely¬†collaborate in discussions around assumptions,¬†analysis and other decision inputs and explore and decide on alternatives.¬† –¬† This will require a¬†huge change in the way companies make decisions (culture), but with improved transparancy and “corporate memory”¬† the quality of decisions can improve significantly.
  • Open Source BI (Peak of Inflated Expectations,¬† Mainstream adoption: 5 to 10 years, Moderate Benefit): Given its comments, Gartner seems to be very reluctant on this:
    • “open-source vendors tend to lag behind their commercial counterparts in delivering innovative, emerging capabilities such as interactive visualization, in-memory analytics and search-based BI.”
    • “customers should know that the skills required for open-source BI products are generally hard to find, and that many open-source BI projects are defunct.”
    • “Although there is still a significant gap in terms of functionality, scalability and usability, opensource BI tools have advanced significantly to become viable alternatives.”
  • SAAS¬†BI (Technology Trigger,¬† Mainstream adoption:¬†2 to 5¬†years, Moderate Benefit): Gartner sees SAAS BI mainly s a solution for Small and medium businesses that have yet to start with Business Intelligence.

What’s also interesting to notice is that compared to the 2008 Gartner has removed BI applicances from the hype cycle. The reason they give: “this technology no longer appears on the Hype Cycle because we consider it to be obsolete.” is not very clear te me.¬†

Social BI

July 20, 2009

In Gartner’s Magic Quadrant for Social Software Gartner compares different “products that¬†focus on team collaboration, communities and social interaction.” These products offer some interesring capabilities that are currently not or only partly available¬†in¬†the existing BI platforms.

Because of its integration with Sharepoint the Microsoft BI platform might have a competative advantage in this area, although key capabilities for social BI such as social taging, bookmarking and social search are not (yet) offered. For Business Objects there is a seperate product on the market, Antivia desktop, that  adds social software functionality to the platform. Besides user-based rankings, Antivia also provides smart recommendations (like e.g. Amazon), usage statistics and supports supports discussions, capturing insights and actions in a centralized way.

In my opinion a mature BI platform should support social features such as:

– Content Rating: What’s hot, what’s not
– Community Polls
– People finder (by skill)
– Group Chat
– Comments
– Workflow (for approval’s and publishing etc.)
– User Profiles (information on each user)
– Document Sharing
– Wiki
– Tagging

This will enable users to find relevant information, perform collaborative analysis, learn from each other and to discuss and share their findings and thoughts.

Magic Quadrant for Data Quality Tools

June 24, 2009

Published this month by Gartner:

The Gartner Magic Quadrant for Data Quality Tools (issue date 9 June 2009): http://mediaproducts.gartner.com/reprints/dataflux/167657.html

See the Analysts Page for other Magic Quadrants

In-memory analytics

June 21, 2009

Instead of querying data on disk, in-memory analytic tools such as Qliktechs Qlikview, Tibco’s Spotfire and IBMs TM1 query data in RAM instead of ¬†on disk. By eliminating the need for creating cubes for analysis purposes, they offer a strong and more flexible (structures do not have to be defined up-front) alternative for OLAP tools. Since the time to apply relevant information in business decisions is ever decreasing, it is no wonder that these tools¬†get a lot of attention.

Some even claim these in-memory tools eliminate the need for a seperate datawarehouse. But as Dutch BI blogger Ronald Damhof clearly points out in a recent blog post there are plenty of reasons why organizations still need datawarehouses. They meet requirements these in-memory tools do not tackle.

On the front-end side these in-memory analytic tools surelyoffer a more intuitive interface than most other BI tools do. That is the other reason these tools get so much attention. With project Gemini (see a demo here) Microsoft is now also entering the in-memory analytics area. According to Qlikview, who allready released its 9th version, they still have a long way to go. However this will surely have a positive impact on the attention for these kind of applications.