Key Features and Explanation

Some of the Key features in Seaborn are:
- Built-In Datasets - Default Colour Themes - Simplified Code

📌 Built-In Datasets

Seaborn provides access to a collection of built-in datasets. These can be used to demonstrate Seaborn’s functionalities and data visualization capabilities.

We can load a built-in dataset from Seaborn using the following code:

# Loading libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

flights = sns.load_dataset('flights')  # Load the 'flights' dataset

print(flights.head())  # Printing the first 5 rows of the dataset

   year month  passengers
0  1949   Jan         112
1  1949   Feb         118
2  1949   Mar         132
3  1949   Apr         129
4  1949   May         121

📌 Default Colour Themes

Seaborn provides some default colour themes and palettes, making it easy to plot visually appealing graphs. We can also create and save our colour themes and palettes for future use.
Some examples of these colour themes are ticks, whitegrid, darkgrid, dark, and white. Some examples of the colour palettes are pastel, bright, muted, and deep.
The colour themes enhance the font colour, background colour, gridlines, and the overall appearance of our plots to make them look cleaner.

These themes and palettes can be used in the following way:

# Applying darkgid theme
sns.set_theme(style="darkgrid", context="notebook", palette='bright')
plt.figure(figsize=(10, 8))

for y in range(1949, 1960):
    plt.plot(flights[flights['year'] == y]['month'], flights[flights['year'] == y]['passengers'], label=f'{y}')
plt.xlabel('Month')
plt.ylabel('Number of passengers')
plt.title('Number of Passengers traveling in flights')
plt.legend()
plt.tight_layout()
plt.show()

The above line plot shows the trend of the number of passengers traveling over months for years from 1949 to 1960.

📌 Simplified Code

Seaborn is designed so that it is easy to visualize data while working on some datasets. Seaborn can support several different dataset formats, which are usually stored as objects of Numpy and Pandas libraries.
Seaborn will take care of small details like colours, legends, and labels while working with datasets, enabling us to plot complex graphs with fewer lines of code than Matplotlib.

We will see the implementation of this property later in this blog while talking about the Regression plot.

Now let’s explore some plot types that work effectively in Seaborn.

📌 Regression Plot

This type of plot is helpful if we want to study the relationship between variables. Seaborn has a function regplot(), which can be used to plot regression plots, making it easier to understand the relationship between variables and fit a linear regression model for the data.

First lets plot using Matplotlib:

# Loading the built-in dataset
tips = sns.load_dataset('tips')

# Plotting the scatter plot
plt.scatter(tips['total_bill'], tips['tip']) 

# Using numpy to get the slope and Y-intercept of the Regression line
slope, y_intercept = np.polyfit(tips['total_bill'], tips['tip'], 1) 

# Plotting the regression line
plt.plot(tips['total_bill'], slope * tips['total_bill'] + y_intercept, color='orange') 

plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.title('Tips in a retaurant')
plt.show()

As you can see, we needed 8 lines of code to plot a regression plot using Matplotlib. Below, you can see how Seaborn can generate the same plot in just 2 to 3 lines of code. This demonstrates Seaborn’s simplified coding feature.

tips = sns.load_dataset("tips")
sns.regplot(x='total_bill', y='tip', data=tips, line_kws={'color': 'orange'})  # Create a regression plot
plt.xlabel('Total bill')
plt.ylabel('Tips')
plt.title('Tips in a restaurant')
plt.show()

The above graph shows how individuals’ tips vary based on the total bill. The orange line is the regression line, the best-fit line for the data, and tries to capture the overall trend.
The orange-coloured shaded region around the regression line shows how reliable the regression model of our data is. The wider the area, the more the uncertainty of our regression model.
Using line_kws, we apply orange color to our best-fit line.

Also, you might be wondering why the darkgrid colour theme was applied even though we have not used any theme in the above code snippets; it is because once sns.set_theme() is called, it sets the theme as the default theme and applies it all the plots.

📌 Violin Plot

A violin plot is a mix of a kernel density estimate (KDE) plot and a boxplot. It is beneficial when we want to compare statistical data over multiple groups in a dataset. They help visualize the data’s spread, density, and shape.

plt.figure(figsize=(8, 6))
sns.violinplot(x="day", y="total_bill", hue="sex", data=tips, split=True)
plt.xlabel('Day')
plt.ylabel('Total bill')
plt.legend(loc='upper right')
plt.title('Distribution of tips based on days')
plt.tight_layout()
plt.show()

In the above plot, we compare the distribution of total bills between males and females for each week. The wider area tells us about the concentration of data. It also speaks about the shape of the distribution.
We are using hue=“sex” to plot the distribution separately based on the sex of the customers.
Further, we are using split=True to split the violin graph into two parts and show them side by side for easy comparison.