Data Science

The Beautiful Binomial Logistic Regression

The Logistic Regression is an important classification model to understand in all its complexity. There are a few reasons to consider it: It is faster to train than some other classification algorithms like Support Vector Machines and Random For...

The Worst Kind of Data: Missing Data

Most publicly available datasets or datasets at the workplace are complete. However, from time to time we encounter datasets where some or many entries are missing. The problem of missing data exists on a spectrum; only a few entries missing among mi...

How to Overcome the Curse of Dimensionality

Dimensionality reduction is an important technique to overcome the curse of dimensionality in data science and machine learning. As the number of predictors (or dimensions or features) in the dataset increase, it becomes computationally more expensiv...

K-Means Clustering: All You Need to Know

In machine learning, we are often in the realm of “function approximation”. That is, we have a certain ground-truth (y) and associated variables (X) and our aim is to use identify a function to wrap our variables in that does a good job in approx...

Interpreting and Visualizing AutoCorrelation

By Jithin J and Karthik Ravindra, Byte Academy Analyzing a Time Series Data needs special attention. Here, we would like to explore working with time series data and identify the effect of autocorrelation to come up with a more practical approach ...