Which ML Model To Use?

It is normal for a python developer, new to artificial intelligence, to get scared by several models and neural network structures, that are accessible and readily available. We frequently wrack our psyches and evaluate a lot of calculations to get the best model. My educator frequently used to express that the most ideal route is to evaluate every conceivable model until you make sense of the best one. After several years of hard work and implementing a lot of models later. Gradually I built up an instinctive comprehension of how various models fit themselves to the information.

So here I am sharing my understanding of "Which machine learning model to use?"

I have categorized machine learning models into different subgroups for better understanding. I will be explaining in brief about the models in each category and discussing their application.

Group - 1 Linear statistical models:

Generalized Linear Models: These models are based on probability distributions and the often-used logistic and linear regression belong to this category. They accurately model uncertainty of the predictions. For example, if you are creating a model to predict the number of defects go ahead and use a Poisson distribution. Or if you are dealing with a multi-class classification problem use a multinomial distribution.
Linear Discriminant Analysis: Algorithm for multiclass classification. The ease of use and excellent interpretability makes the algorithm best. The algorithm uses gaussian distribution. It also gives a linear separation boundary.
Naive Bayes Algorithm: A probabilistic model for binary classification. Excellent at handling large amounts of categorical data. Multinomial Naive Bayes algorithm is also used for multi-class classification. A lesser-known fact is that the decision boundary of the algorithm is linear and it performs similarly to Logistic regression.

Advantages	Disadvantages
Works well with less data	Assumption of linearity
High interpretability	Low flexibility
High speed	High bias
Predicts uncertainty	Requires data preparation

Application: Macroeconomics, scientific research, market research

Group-2 Non-linear statistical models:

Support Vector Machines: One of the most beautiful models that exist in the field of machine learning and statistics. 10 years ago these models were the only statistical models that gave neck to neck competition to neural networks. The ability of the models to handle non-linearity is what makes them stand apart from traditional statistical models. The basic concept of the model existed since the 1960s in the form of optimal margin classifiers but they were put into practice only in the '90s when two statisticians discovered something known as the kernel trick. The kernel trick replaces the dot product of the input vectors in the dual optimization problem with a kernel function. This allows SVM's to find a decision boundary in an infinite-dimensional space with a limited amount of computation. The ability to choose the right kernel or create your kernel for different scenarios is an important skill to understand before you use SVM's.
Polynomial Regression: Please don't use this model unless you are completely sure of the existence of a polynomial relation. It can easily be overfitted and can provide unrealistic results on the test set. However, it is one of the best models to explain the concept of overfitting and bias-variance tradeoff.

Advantages	Disadvantages
Ability to handle larger data sets	Longer training times
Handles non-linearity	Difficulty in implementing multiclass classification
Not dependant on strong assumptions	Low interpretability
High speed while making predictions	High learning curve

Group-3 Tree-based models:

Models that fall in this category:

Decision Trees: Highly interpretable useless models. Please use decision trees with caution. The big tradeoff of their high interpretability is high variance and errors when it comes to predictions. It is not uncommon to get an entirely different tree by changing one data point in the entire dataset
Random Forest: This model overcomes the shortcomings of the decision tree by averaging the results of hundreds and thousands of decision trees. It's fast and simple to implement with fewer hyperparameters than gradient boosted trees. It is resistant to overfitting and hence provides a good baseline performance while implementing tree-based models.
Gradient Boosted Trees: One of the algorithms that revolutionized the field of machine learning as a go-to solution for many machine learning problems. The algorithm makes use of hundreds of trees like the random forest but fits them to the errors, systematically reducing the error with each subsequent tree. The way the model learns is a form of controlled overfitting and the error on the train set decrease with each iteration. This property of the algorithm makes it flexible enough to learn any decision boundary. Gradient boosted trees in 99% of the cases will perform better than a random forest with the right set of hyperparameters.

Advantages	Disadvantages
Ability to handle larger data sets	Longer training times
Ability to handle tabular data	A lot of hyperparameter tuning
Handles non-linearity	Requires large datasets

Group 4: Neural Networks

Neural network architectures have revolutionized the field of machine learning. Neural networks are functional models with a large number of parameters. This gives them the ability to handle any kind of non-linearities in the data. It is important to note that the theoretical possibilities of a neural network are endless but their training is limited by the availability of data or compute. In simple terms, you may often face the problem in which the logic of your model will be perfect but its performance will fall short or you are unable to train the model.

Advantages	Disadvantages
Ability to handle any form of data	Slow training speed
Ability to handle any task	Requires large amounts of data
Supports transfer learning

Conclusion:

Machine learning is an ever-developing field with a new model always around the corner. It is extremely important to keep yourself updated and to keep practicing and learning.

Thank you for making it to the end. Keep exploring and continue learning!

Until next time, Bye bye!

Search This Blog

Winklix LLC - IT Consultation , Mobile App Development , Salesforce / SAP Consultation & More