05 August 2019

What Should an Executive Know about Machine Learning? "Unsupervised Learning"


This article is in continuation of my previous articles on Machine Learning & Supervised Learning.
In this post, I am going to share my views on Unsupervised Learning. I have tried to capture the basics here
The basic fact in unsupervised learning is that that the data model performs prediction/actions/inferences by learning from input training data, which in itself does not have any output/results defined. Meaning there is no particular solution/target/output or even error to evaluate an outcome/prediction.
Unsupervised Learning can be further divided into two categories:
Clustering
It means grouping of items into subsets (or cluster) so that the observations & inferences coming from the same clusters are similar. It also implies that the behavior one subset will be different from another subset.
Applications of clustering:
1. You run an e-commerce firm (with large volume of data on customers & their buying patterns) and want to find groups of customers with similar behavior for chronographic watches. Clustering is what you do
2. You are an insurance company and want to segregate group of policyholders with high average claims.
Dimensionality Reduction (DR)
A straightforward method for feature selection and feature extraction, this method reduces the features to process, so that the performance improves and the technique becomes computationally more efficient.
For example, consider a situation where you want to classify buyers of watches from non-buyers of watches based upon their demography. The dimension of this data can be very large (age, education, race, sex etc.). Therefore, if one start applying classification upon all these dimensions, then the system may take very long to process the records. A computationally easier way can be to use DR to find a subset of data that can represent the original data in a non-redundant way; and hence, both cases will lead to the same result.
In addition, it is common experience that projecting higher dimensions data into 2D leads to better visualization of the data set.

Summary:
· In unsupervised learning, we do not know the outcomes
·  It can be of two types: Clustering (grouping) & Dimensionality Reduction (50,000 features become 10)
Hope it helps in your next sales pitch to convey these concepts better!

No comments:

Post a Comment