mds_2025_helper_functions.eda
Functions
|
A universal EDA function to generate data summaries and visualize features. |
Module Contents
- mds_2025_helper_functions.eda.perform_eda(dataframe, rows=5, cols=2)[source]
A universal EDA function to generate data summaries and visualize features.
- Parameters:
dataframe (pd.DataFrame) – The input dataset for EDA.
rows (int) – Number of rows in the grid layout for visualizations.
cols (int) – Number of columns in the grid layout for visualizations.
- Returns:
None
Example
>>> import pandas as pd >>> from mds_2025_helper_functions.eda import perform_eda >>> data = { ... 'Age': [25, 32, 47, 51, 62], ... 'Salary': [50000, 60000, 120000, 90000, 85000], ... 'Department': ['HR', 'Finance', 'IT', 'Finance', 'HR'], ... 'JoiningDate': pd.to_datetime(['2015-01-01', '2016-07-15', '2017-03-12', '2018-06-01', '2019-08-19']), ... 'Bonus': [0, 5000, 12000, 7500, 7000] ... } >>> df = pd.DataFrame(data)
>>> # Use the function to perform EDA >>> perform_eda(df, rows=2, cols=2)
# The above call will generate the following: # 1. A dataset overview # 2. Basic statistics for all columns # 3. A missing values report and heatmap (if applicable) # 4. Correlation heatmap for numeric columns # 5. Feature distribution/count plots # 6. Scatterplots for numeric feature pairs (if applicable) # 7. Outlier detection report for numeric features
# Note: Visualizations will be shown as matplotlib and seaborn plots.