This project aimed to predict educational outcomes, specifically whether students would drop out, enroll, or graduate, based on various features. The dataset encompassed diverse information, including application modes, numerical features, and binary indicators. Methodology: The project employed a range of machine learning models, employing various algorithms to find the most suitable for the classification task. Models included K-Nearest Neighbors, Gradient Boosting, Decision Trees, Random Forests, Support Vector Machines, Gaussian Naive Bayes, Neural Networks, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. Exploratory Data Analysis (EDA) : Explored distribution of application modes, numerical features, and target variable. Utilized visualizations like pie charts, histograms, and correlation matrices for insights. Data Preprocessing: Transformed the target variable into binary classes for simplification. Split the dataset into training and testing sets. Standard...
This report presents an analysis of the Olympics History dataset using Python and the Pandas library. The dataset contains information about athletes who participated in various Olympic events, including details such as age, height, weight, medals won, and more. The analysis aims to explore and visualize different aspects of the data to gain insights into athlete demographics, sports participation, and medal achievements. Data Loading and Overview The analysis begins with loading the dataset into a Pandas DataFrame named 'olympics.' The dataset contains information about Olympic athletes, including their personal details and performance records. After loading the data, we checked for basic information about the DataFrame using the info() method. This provided an overview of the columns, data types, and the presence of missing values. Missing Value Analysis Next, we conducted a missing value analysis by using the isna().sum() method to count the number of missing values in eac...