During the course of my latest machine learning project, I took some time to explore Scikit-learn pipelines. A pipeline is an object that allows you to preprocess/transform data, train a model, and use a model all in one easy tool. Below I will talk about some of the cool things you can do with pipelines, and how they can be a HUGE time-saver when building and validating models.

Building a Basic Pipeline | Pipeline Basic Abilities

Building a pipeline is simple. Let’s say you need to do the following steps in building your model:

  • Scale data to a standardized scale
  • Fit a Random Forest Classifier

We can take…


A guide to creating stunning visualizations for your next NLP project

Word Cloud by Author | Trevi Fountain Image by James Lee via Unsplash

Natural Language Processing, or NLP, is a very popular subfield in Data Science at the moment because it allows computers to process and analyze human language. Siri and Alexa, spam filters, chatbots, auto-complete, and translate apps are all examples of everyday technology that use NLP.

As a Data Scientist, working with text data is a bit trickier than other types of data. Why? Because words are not numbers! This makes the Exploratory Data Analysis and the data cleaning and preprocessing steps a bit different in the Data Science workflow. Text data generally requires much more cleaning (removing stop words and…


During my journey as a Data Science student, I created a project using Convolutional Neural Networks (CNNs). CNNs are deep learning models that are becoming increasingly popular in the world of Data Science for image classification.

Source: https://github.com/tiaplagata

There is just one major issue when you’re building them at home — the training time for these models is very long!

Enter Google Colab.

Colab is great because it allows you to run your notebook on a hosted computer that is most-likely better/faster/stronger than your local machine. That means faster training for your CNN model.

Why use Google Colab?

In Colab, you can make use of GPUS…


The first time I ever built a Linear Regression model, I thought two things:

  • Wow! I built something that can actually predict housing prices!
  • Ok, but how good are these predictions?

I had learned to check all of the assumptions of a Linear Regression model (residuals should have a normal distribution, features are linearly correlated with the target, there’s no multi-collinearity, etc.). I learned to scale and sometimes even log-scale my features and target. I even learned about mean squared error (MSE) and root mean squared error (RMSE) to interpret the residuals of the model. The problem was, once I…

Tia Plagata

Data Science Student | Yoga Teacher | Marketer | Life-Long Learner | github: https://github.com/tiaplagata

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store