Gallery
Exploratory Data Analysis (EDA)
•
Exploratory Data Analysis (EDA) is a term coined by John W. Tukey in his seminal book (Tukey 1977).
•
It is the practice of inspecting and exploring your data before stating hypotheses and fitting your model.
•
It typically includes the development of visualization and simple summary statistics.
•
These components are sometimes referred to as Descriptive Statistics and Visual Analytics.
Tables & Visualizations:
We can quickly do some profiling on our data, a package that could help us is ‣ for which we can run a demo here
The package is built on the original open source project ‣ and although some of these methods help you perform EDA, they can impossibly cover all the operations.

Summary Statistics
You can do more to your data than simply df.describe() in Python or summary(df) in R. Perhaps you can run slightly more advanced stats like that found on datatile.
A. Exploratory Data Analysis
Data augmentation because it is not just features, but also labels, augmentation because it is not just feature engineering, but also data generation, sampling, and preprocessing methods. There are five data augmentation techniques that we will discuss.
1.
Transforming (One to One Column)
2.
Interacting (Many to One Column)
3.
Mapping (Many to Many Column)
4.
Extracting (Column/s to Row/s)
5.
Synthesizing (Many to Many Rows)
•
There is another type called Exploding (One to Many Columns) that is not included, for example the Signitures Method can convert a single feature into multiple features.
•
Currently we are also only focusing on a two dimensional dataset, there are additional extraction methods for higher dimensional datasets.
•
Some methods can create a three dimensional dataset out of a two dimensional dataset like Recurrence Plots, Gramian Angular Fields, and Markov Transition Fields.
•
And some methods can create two dimensional dataset out of three dimensional datasets like the Tucker and CANDECOMP decomposition.
•
It is unlikely that you would ever have to deal with these augmentation methods, so we ignore them here.
•
We are also only looking at numerical augmentation methods and not audio, text, or image augmentation methods.
C. Data Augmentation