During the course of my latest machine learning project, I took some time to explore Scikit-learn pipelines. A pipeline is an object that allows you to preprocess/transform data, train a model, and use a model all in one easy tool. Below I will talk about some of the cool things you can do with pipelines, and how they can be a HUGE time-saver when building and validating models.

Building a Basic Pipeline | Pipeline Basic Abilities

Building a pipeline is simple. Let’s say you need to do the following steps in building your model:

  • Scale data to a standardized scale
  • Fit a Random Forest Classifier

We can take…


A guide to creating stunning visualizations for your next NLP project

Image for post
Image for post
Word Cloud by Author | Trevi Fountain Image by James Lee via Unsplash

Natural Language Processing, or NLP, is a very popular subfield in Data Science at the moment because it allows computers to process and analyze human language. Siri and Alexa, spam filters, chatbots, auto-complete, and translate apps are all examples of everyday technology that use NLP.

As a Data Scientist, working with text data is a bit trickier than other types of data. Why? Because words are not numbers! This makes the Exploratory Data Analysis and the data cleaning and preprocessing steps a bit different in the Data Science workflow. Text data generally requires much more cleaning (removing stop words and…


During my journey as a Data Science student, I created a project using Convolutional Neural Networks (CNNs). CNNs are deep learning models that are becoming increasingly popular in the world of Data Science for image classification.

Image for post
Image for post
Source: https://github.com/tiaplagata

There is just one major issue when you’re building them at home — the training time for these models is very long!

Enter Google Colab.

Colab is great because it allows you to run your notebook on a hosted computer that is most-likely better/faster/stronger than your local machine. That means faster training for your CNN model.

Why use Google Colab?

In Colab, you can make use of GPUS…


The first time I ever built a Linear Regression model, I thought two things:

  • Wow! I built something that can actually predict housing prices!
  • Ok, but how good are these predictions?

I had learned to check all of the assumptions of a Linear Regression model (residuals should have a normal distribution, features are linearly correlated with the target, there’s no multi-collinearity, etc.). I learned to scale and sometimes even log-scale my features and target. I even learned about mean squared error (MSE) and root mean squared error (RMSE) to interpret the residuals of the model. The problem was, once I…

Tia Plagata

Data Science Student | Yoga Teacher | Marketer | Life-Long Learner | github: https://github.com/tiaplagata

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store