Batch Learning, Online Learning, Instance-Based Learning and Model-Based Machine Learning || Beginner Guide To Machine Learning 2021

What is batch Learning? What is Online Learning? What is Instance-Based Learning? What is model-based Learning? Machine Learning

We have seen a detailed explanation of Supervised, Unsupervised, Reinforcement, and Semi-supervised Machine Learning. Let’s dig more into this topic and explore some other types of Machine Learning. All of them uses a different kind of algorithms to function. But they are very useful, as we see them from today’s growing ML culture.

What is Batch Learning

Other criteria used to classify machine learning systems is whether the system can learn incrementally or not. In batch learning, the learning system can’t learn incrementally from the stream of incoming data. In this type of learning the system is trained using all the available data. This process is lengthy and hence it takes too much time and computing resources. So in general these kinds of learning done offline. First, the system is trained offline and then it launched in production, it runs without learning anymore. It just applies what it learned. This is called offline learning.

To Know the new kind of data, a batch learning system is trained from scratch, which means you have to train the old data also from scratch. Then stop the old data and replace the new data in production.

If you work on a smaller dataset then training, evaluating, and launching a machine learning system can be automated fairly. But when you talk about a large dataset then it may take a large amount of time, as training data every time takes time. Also training on the full set of data requires a lot of computing resources i.e, CPU, memory space, hard disk, disk I/O, network I/O, etc. It costs a huge amount of money if we go for a large amount of dataset each time.

So, we have to use algorithms that are capable of learning incrementally.

What is Online Learning

In online learning, you can train the data incrementally. Here you can feed the data either individually or in small groups called mini-batches. Each of these learning steps is fast and cheap, so the system can learn about new data, as it arrives.

Unlike Batch learning, Online learning is great for a system that receives data in continuous flow. They train and adapt to change rapidly. You can simply use this on low specs system as after the train you can discard the data from the system. Because the system learned from the new data instances, therefore, they don’t need them longer. This saves lots of space in a given system.

For example, When we talk about the dataset of the stock market. The machine receives a new dataset continuously. But still, it helps to analyze the market.

When the dataset is too large as compared to the machine’s main memory that case you can break the data into small batches or the algorithm loads part of the data, runs a training step on that, and repeats that process until it has run on all of the data.

Online Learning

One important parameter of online learning is how fast they adapt to changing data, this is generally called learning rate. If you set the learning rate as too high then the machine will adapt the new data quickly and work pretty well on that but on the other hand, they forget the old data. And anyone doesn’t want this. On the other hand, if you set the learning rate too low then, in that case, the machine will be less sensitive toward the new data and it will learn more slowly.

A big challenge to online learning is feeding bad data to the system. This will decline the system’s performance. If your system is online then the client will notice it.
In the whole part of ML, you need good data. As for good wine, you need good quality grapes, same as in Machine Learning you need a good quality of data. A bad one will make your mood off in both cases

What is Instance-Based Learning

How do you learn things? The most trivial form of learning is learning by heart. Right.
Let’s take an example, you made an email spam filter by heart. Based on the training the system filter all the email that is identical to the emails that have been flagged by the user. It is not the worst solution but you can not consider this a good solution either.
Instead of flagging identical emails, the spam filter algorithm is programmed in such a way that it would flag similar emails that are known spam emails. This requires a measure of similarity between two emails. A very similar measure is to count the words that have been common between spammed emails and the new instances. More the similar words appear, the more likely they to be in the spam category.
Here the system learns by heart and then generalizes the new cases by using a similarity measure to compare to the learned examples. This is called Instance-based Learning.

Instance-based Learning

What is Model-Based Learning

Another form of learning is to build a model of the examples and then use that model to predict. This is called model-based learning.
For example, let’s check out does money makes people happy? Download the data of the better life index from OCED’s website. We take the GDP per capita as one feature and based on that we see the life satisfaction of people of the different countries.

Well, it seems pretty clear that with the increase in GDP per capita the life satisfaction of people of that increases. The data is pretty linear as we see in the above image. So you decide to model life satisfaction as a linear function of GDP per capita. This step is called model selection.
Maths enthusiast can also depict this as:
Life_satisfaction = θ1+ θ2 * GDP_per_capita (y = mx + c)
Where θ1and θ2are parameters
Before using the model you have to define the value of the parameters. To know which values fit the best in the model we use a fitness function that measures how good the model is, or you can define a cost function that measures how bad it is. In the Regression problem, people generally use the cost function. Because the cost function measures the distance between the linear model’s prediction and the training examples. The main objective is to minimize the distance.
As we feed with training examples, the linear regression algorithms find the parameters that fit best for the data. This is called training the model.

Thank You