During the course of my latest machine learning project, I took some time to explore Scikit-learn pipelines. A pipeline is an object that allows you to preprocess/transform data, train a model, and use a model all in one easy tool. Below I will talk about some of the cool things you can do with pipelines, and how they can be a HUGE time-saver when building and validating models.
Building a pipeline is simple. Let’s say you need to do the following steps in building your model:
We can take…
Natural Language Processing, or NLP, is a very popular subfield in Data Science at the moment because it allows computers to process and analyze human language. Siri and Alexa, spam filters, chatbots, auto-complete, and translate apps are all examples of everyday technology that use NLP.
As a Data Scientist, working with text data is a bit trickier than other types of data. Why? Because words are not numbers! This makes the Exploratory Data Analysis and the data cleaning and preprocessing steps a bit different in the Data Science workflow. Text data generally requires much more cleaning (removing stop words and…
During my journey as a Data Science student, I created a project using Convolutional Neural Networks (CNNs). CNNs are deep learning models that are becoming increasingly popular in the world of Data Science for image classification.
There is just one major issue when you’re building them at home — the training time for these models is very long!
Enter Google Colab.
Colab is great because it allows you to run your notebook on a hosted computer that is most-likely better/faster/stronger than your local machine. That means faster training for your CNN model.
In Colab, you can make use of GPUS…
The first time I ever built a Linear Regression model, I thought two things:
I had learned to check all of the assumptions of a Linear Regression model (residuals should have a normal distribution, features are linearly correlated with the target, there’s no multi-collinearity, etc.). I learned to scale and sometimes even log-scale my features and target. I even learned about mean squared error (MSE) and root mean squared error (RMSE) to interpret the residuals of the model. The problem was, once I…
Data Science Student | Yoga Teacher | Marketer | Life-Long Learner | github: https://github.com/tiaplagata