Python for Citizen Data Scientist

How to do Cross Validation and Grid Search for Model Selection in Python

Hits: 2 How to do Cross Validation and Grid Search for Model Selection in Python Introduction A typical machine learning process involves training different models on the dataset and selecting the one with best performance. However, evaluating the performance of algorithm is not always a straight forward task. There are several factors that can help …

The Naive Bayes Algorithm in Python with Scikit-Learn

Hits: 1  The Naive Bayes Algorithm in Python with Scikit-Learn When studying Probability & Statistics, one of the first and most important theorems students learn is the Bayes’ Theorem. This theorem is the foundation of deductive reasoning, which focuses on determining the probability of an event occurring based on prior knowledge of conditions that might be …

How to Implement LDA in Python with Scikit-Learn

Hits: 2  How to Implement LDA in Python with Scikit-Learn In our previous article Implementing PCA in Python with Scikit-Learn, we studied how we can reduce dimensionality of the feature set using PCA. In this article we will study another very important dimensionality reduction technique: linear discriminant analysis (or LDA). But first let’s briefly discuss how PCA and …

Introduction to Neural Networks with Scikit-Learn in Python

Hits: 1 Introduction to Neural Networks with Scikit-Learn in Python What is a Neural Network? Humans have an ability to identify patterns within the accessible information with an astonishingly high degree of accuracy. Whenever you see a car or a bicycle you can immediately recognize what they are. This is because we have learned over …

How to do K-Means Clustering with Scikit-Learn in Python

Hits: 5 How to do K-Means Clustering with Scikit-Learn in Python Introduction K-means clustering is one of the most widely used unsupervised machine learning algorithms that forms clusters of data based on the similarity between data instances. For this particular algorithm to work, the number of clusters has to be defined beforehand. The K in …

TensorFlow Neural Network Tutorial in Python

Hits: 4  TensorFlow Neural Network Tutorial in Python TensorFlow is an open-source library for machine learning applications. It’s the Google Brain’s second generation system, after replacing the close-sourced DistBelief, and is used by Google for both research and production applications. TensorFlow applications can be written in a few languages: Python, Go, Java and C. This …

How to Save and Restore scikit learn Models

Hits: 0  How to Save and Restore scikit learn Models On many occasions, while working with the scikit-learn library, you’ll need to save your prediction models to file, and then restore them in order to reuse your previous work to: test your model on new data, compare multiple models, or anything else. This saving procedure is also …

How to do Parallel Processing in Python

Hits: 1 How to do Parallel Processing in Python Introduction When you start a program on your machine it runs in its own “bubble” which is completely separate from other programs that are active at the same time. This “bubble” is called a process, and comprises everything which is needed to manage this program call. For …

How to Read a File Line-by-Line in Python

Hits: 0 Google –> SETScholars. How to Read a File Line-by-Line in Python Introduction Over the course of my working life I have had the opportunity to use many programming concepts and technologies to do countless things. Some of these things involve relatively low-value fruits of my labor, such as automating the error prone or …

What is Command Line Arguments in Python

Hits: 0 What is Command Line Arguments in Python Overview With Python being such a popular programming language, as well as having support for most operating systems, it’s become widely used to create command line tools for many purposes. These tools can range from simple CLI apps to those that are more complex, like AWS’ awscli tool. …

Data Viz in Python – Stacked Percentage Bar Plot In MatPlotLib

Hits: 7 Stacked Percentage Bar Plot In MatPlotLib Preliminaries %matplotlib inline import pandas as pd import matplotlib.pyplot as plt Create dataframe raw_data = {‘first_name’: [‘Jason’, ‘Molly’, ‘Tina’, ‘Jake’, ‘Amy’], ‘pre_score’: [4, 24, 31, 2, 3], ‘mid_score’: [25, 94, 57, 62, 70], ‘post_score’: [5, 43, 23, 23, 51]} df = pd.DataFrame(raw_data, columns = [‘first_name’, ‘pre_score’, ‘mid_score’, …

Data Viz in Python – Pie Chart In MatPlotLib

Hits: 3 Pie Chart In MatPlotLib Preliminaries %matplotlib inline import pandas as pd import matplotlib.pyplot as plt Create dataframe raw_data = {‘officer_name’: [‘Jason’, ‘Molly’, ‘Tina’, ‘Jake’, ‘Amy’], ‘jan_arrests’: [4, 24, 31, 2, 3], ‘feb_arrests’: [25, 94, 57, 62, 70], ‘march_arrests’: [5, 43, 23, 23, 51]} df = pd.DataFrame(raw_data, columns = [‘officer_name’, ‘jan_arrests’, ‘feb_arrests’, ‘march_arrests’]) df …

Data Viz in Python – Group Bar Plot In MatPlotLib

Hits: 5 Group Bar Plot In MatPlotLib Preliminaries %matplotlib inline import pandas as pd import matplotlib.pyplot as plt import numpy as np Create dataframe raw_data = {‘first_name’: [‘Jason’, ‘Molly’, ‘Tina’, ‘Jake’, ‘Amy’], ‘pre_score’: [4, 24, 31, 2, 3], ‘mid_score’: [25, 94, 57, 62, 70], ‘post_score’: [5, 43, 23, 23, 51]} df = pd.DataFrame(raw_data, columns = …

Data Viz in Python – Creating A Time Series Plot With Seaborn And pandas

Hits: 6 Creating A Time Series Plot With Seaborn And pandas Preliminaries import pandas as pd %matplotlib inline import matplotlib.pyplot as plt import seaborn as sns data = {‘date’: [‘2014-05-01 18:47:05.069722’, ‘2014-05-01 18:47:05.119994’, ‘2014-05-02 18:47:05.178768’, ‘2014-05-02 18:47:05.230071’, ‘2014-05-02 18:47:05.230071’, ‘2014-05-02 18:47:05.280592’, ‘2014-05-03 18:47:05.332662’, ‘2014-05-03 18:47:05.385109’, ‘2014-05-04 18:47:05.436523’, ‘2014-05-04 18:47:05.486877’], ‘deaths_regiment_1’: [34, 43, 14, 15, 15, …

Data Viz in Python – Color Palettes in Seaborn

Hits: 1 Color Palettes in Seaborn Preliminaries import pandas as pd %matplotlib inline import matplotlib.pyplot as plt import seaborn as sns data = {‘date’: [‘2014-05-01 18:47:05.069722’, ‘2014-05-01 18:47:05.119994’, ‘2014-05-02 18:47:05.178768’, ‘2014-05-02 18:47:05.230071’, ‘2014-05-02 18:47:05.230071’, ‘2014-05-02 18:47:05.280592’, ‘2014-05-03 18:47:05.332662’, ‘2014-05-03 18:47:05.385109’, ‘2014-05-04 18:47:05.436523’, ‘2014-05-04 18:47:05.486877’], ‘deaths_regiment_1’: [34, 43, 14, 15, 15, 14, 31, 25, 62, 41], …

Data Wrangling in Python – Pandas Time Series Basics

Hits: 4 Pandas Time Series Basics Import modules from datetime import datetime import pandas as pd %matplotlib inline import matplotlib.pyplot as pyplot Create a dataframe data = {‘date’: [‘2014-05-01 18:47:05.069722’, ‘2014-05-01 18:47:05.119994’, ‘2014-05-02 18:47:05.178768’, ‘2014-05-02 18:47:05.230071’, ‘2014-05-02 18:47:05.230071’, ‘2014-05-02 18:47:05.280592’, ‘2014-05-03 18:47:05.332662’, ‘2014-05-03 18:47:05.385109’, ‘2014-05-04 18:47:05.436523’, ‘2014-05-04 18:47:05.486877’], ‘battle_deaths’: [34, 25, 26, 15, 15, 14, …