mds_2025_helper_functions.eda ============================= .. py:module:: mds_2025_helper_functions.eda Functions --------- .. autoapisummary:: mds_2025_helper_functions.eda.perform_eda Module Contents --------------- .. py:function:: perform_eda(dataframe, rows=5, cols=2) A universal EDA function to generate data summaries and visualize features. :param dataframe: The input dataset for EDA. :type dataframe: pd.DataFrame :param rows: Number of rows in the grid layout for visualizations. :type rows: int :param cols: Number of columns in the grid layout for visualizations. :type cols: int :returns: None .. rubric:: Example >>> import pandas as pd >>> from mds_2025_helper_functions.eda import perform_eda >>> data = { ... 'Age': [25, 32, 47, 51, 62], ... 'Salary': [50000, 60000, 120000, 90000, 85000], ... 'Department': ['HR', 'Finance', 'IT', 'Finance', 'HR'], ... 'JoiningDate': pd.to_datetime(['2015-01-01', '2016-07-15', '2017-03-12', '2018-06-01', '2019-08-19']), ... 'Bonus': [0, 5000, 12000, 7500, 7000] ... } >>> df = pd.DataFrame(data) >>> # Use the function to perform EDA >>> perform_eda(df, rows=2, cols=2) # The above call will generate the following: # 1. A dataset overview # 2. Basic statistics for all columns # 3. A missing values report and heatmap (if applicable) # 4. Correlation heatmap for numeric columns # 5. Feature distribution/count plots # 6. Scatterplots for numeric feature pairs (if applicable) # 7. Outlier detection report for numeric features # Note: Visualizations will be shown as matplotlib and seaborn plots.