Big data algorithms, machine learning & reasoning has become the heart of almost all applications today. These smart applications are solving crucial business problems and helping decision makers in quickly reaching a business critical decision in a matter of minutes. These techniques are defining the norms by also using statistical analysis & predictive modelling.
But, all that glitters is not gold and we have to understand that it’s NOT always the case that all insights that spawns out of such models is CORRECT. Business leaders have to understand the inflexion point where data starts to control them rather than other way round. If they are thinking that insights coming out from machines will be always Right & Correct, then it’s a Mistake!
In this post, I shall be explaining the various issues which comes with using the data analytics as it is
Simpson’s Paradox
The best way to understand this statistical paradox is – the groups have averages that point in one direction whereas the overall averages points in other direction.
Let’s understand this with 2 real world example:
Take #1:
In tennis, if the loser of the match has actually won more games than the winner, then we have an example of “Simpson’s paradox”. For example, though not very possible, if the final score is 0-6, 7-5, 7-5; then the loser has won more points in the game (16) than the winner (14).
A real game example is Isner–Mahut match at the 2010 Wimbledon Championships. If you see the Records section, the last but one point explains it. Mahut won 502 points in the match as compared to Isner’s 478 (difference of 24). But we all know that Isner won the match 6-3, 3-6, 6-7, 7-6, 70-68.
Take #2:
Suppose your enterprise has two business application towers: A & B. Now let us analyze the overall tickets generated from those applications:
If you look at the analytics reports & dashboards created at the Business Applications level, you shall see that the predicted tickets matches with the Actual tickets. So,
Inferences:
1. the maturity of the model is very high
2. resulting to say that we can scale it up to new towers.
However, if you deep-dive into the two towers, you can see that this inference is Incorrect. This is one of the most important challenges in the reporting & dash-boarding world. It is easy to think that we are meeting our numbers, when in reality; the case might be completely different.
Idiosyncrasies in the data
In many businesses, important decisions are made based upon the statistical inferences using the historical references and experiences. A major caveat here is that if the sample size in use is small, then few outliers can skew the understandings/inferences a lot.
Many predictive models use historical data to make predictions on the future. Hence, if the past data and its data model upon which it is based relies heavily on past incidents, and then it may not accurately give predictions on the future.
Believing numbers blindly
Too often, we are so driven by numbers that we forget that there are biases, which creep into the system, possibly during the initial requirement validation phases, designing the data model phase etc. Such biases, though very small, constraints the way with which we look upon the end-results (in the form of dashboards & reports). In addition, it is also important to continuously normalize & check the data for inconsistencies and do a ground check verification before any major decisions can be taken. To give an example, it may be the case that business/operations leaders may be seeing a high inflow of tickets, though at a ground level, those tickets always existed in the system; only thing is that they were NEVER being tracked!
Though these challenges exists, the Solution to these challenges are domain expertise, tacit business knowledge, common sense & above all, Critical Thinking, which can help business manager, escape such caveats.
No comments:
Post a Comment