This post is in continuation of my earlier post on Machine Learning & the three buckets in which it can be understood. You can read it here
In this post, I am going to express my opinion on Supervised Learning.
In supervised learning, the output variable is known, and this output variable is used in the training.
There are three steps for building a supervised model: Building model, Training model & Testing model. Let us understand these three with the help of an example.
Step #1: Building model
Suppose you have joined a coaching class to learn machine learning. Hence, in this case, you become the model.
Step #2: Training model
Your faculty will be teaching you. She will also use various teaching aids during this process. This is the training process. Here, we try to train the model using historical & recent data. The basis of this process is to identify either patterns or dependencies in the data.
Step #3: Testing model
Now is the time when you (model) has to appear for the exam. Obviously, the teacher will not use the same data to test you on which she has trained you, hence, the exam paper will have similar patterns on which you have to respond, but not the same.
Generally, to test the prediction or accuracy of the model, we test it on the untrained data. Usually, the ratio of training data to test data is 70/30.
If your exam score falls below a configured value, then, re-training happens.
Let me correlate it with IT & Business use-cases:
Case #1: IT | Proactive Maintenance of Infrastructure
You take 100,000 tickets from your ITSM tool, build a data model (70,000 tickets) & test it with the rest of the 30,000 tickets. If the accuracy of the data model is >85% (e.g.), then you roll the model for proactive maintenance of your infrastructure (servers, routers etc.).
Case #2: Business | Detect Fraud Transactions of Credit Cards
You gather data on the fraud transactions. Again you split the data into 70:30 ratio. In this case, let’s assume that the model has 75% accuracy (which may be good for rolling it out live). So, this model, when encounters pattern abc in the new transactions, it can predict the probability of fraud in that transaction. Hence, now, you can take necessary actions
Both the cases which I mentioned falls under the category of Classification problems under our initial Supervised Learning. There is another category, called Regression (same thing which you did during your MBA days using SPSS, & hence you all have learnt some machine learning!), which is the second category under Supervised Learning.
Technically speaking, Regression is independent of any framework: machine learning or any classical statistical methods.
Regression refer to a model to predict some numbers, like real values. This is different from Classification, which predicts discrete variables (fraud, mangoes etc.)
Summary:
● In the supervised learning, we know the outcomes
● The Three steps process: Build, Train, Test
● It can be of two types: Classification (discreet classes) & Regression (real values)
Hope it helps in your next sales pitch to convey these concepts better!
Special thanks to Aditi Aggarwal for helping me with the content & Debapriya for the motivation!
No comments:
Post a Comment