mds_2025_helper_functions.eda

Functions

perform_eda(dataframe[, rows, cols])

A universal EDA function to generate data summaries and visualize features.

Module Contents

mds_2025_helper_functions.eda.perform_eda(dataframe, rows=5, cols=2)[source]

A universal EDA function to generate data summaries and visualize features.

Parameters:
  • dataframe (pd.DataFrame) – The input dataset for EDA.

  • rows (int) – Number of rows in the grid layout for visualizations.

  • cols (int) – Number of columns in the grid layout for visualizations.

Returns:

None

Example

>>> import pandas as pd
>>> from mds_2025_helper_functions.eda import perform_eda
>>> data = {
...     'Age': [25, 32, 47, 51, 62],
...     'Salary': [50000, 60000, 120000, 90000, 85000],
...     'Department': ['HR', 'Finance', 'IT', 'Finance', 'HR'],
...     'JoiningDate': pd.to_datetime(['2015-01-01', '2016-07-15', '2017-03-12', '2018-06-01', '2019-08-19']),
...     'Bonus': [0, 5000, 12000, 7500, 7000]
... }
>>> df = pd.DataFrame(data)
>>> # Use the function to perform EDA
>>> perform_eda(df, rows=2, cols=2)

# The above call will generate the following: # 1. A dataset overview # 2. Basic statistics for all columns # 3. A missing values report and heatmap (if applicable) # 4. Correlation heatmap for numeric columns # 5. Feature distribution/count plots # 6. Scatterplots for numeric feature pairs (if applicable) # 7. Outlier detection report for numeric features

# Note: Visualizations will be shown as matplotlib and seaborn plots.