This project aimed to predict educational outcomes, specifically whether students would drop out, enroll, or graduate, based on various features. The dataset encompassed diverse information, including application modes, numerical features, and binary indicators.
Methodology:
The project employed a range of machine learning models, employing various algorithms to find the most suitable for the classification task. Models included K-Nearest Neighbors, Gradient Boosting, Decision Trees, Random Forests, Support Vector Machines, Gaussian Naive Bayes, Neural Networks, Linear Discriminant Analysis, and Quadratic Discriminant Analysis.
Exploratory Data Analysis (EDA):
Explored distribution of application modes, numerical features, and target variable.
Utilized visualizations like pie charts, histograms, and correlation matrices for insights.
Data Preprocessing: Transformed the target variable into binary classes for simplification. Split the dataset into training and testing sets. Standardized numerical features using Z-score scaling.
Modeling:
Conducted Grid Search for optimal K value in K-Nearest Neighbors.
Evaluated performance metrics for each model, including accuracy, confusion matrices, and classification reports.
Results:
The Random Forest Classifier emerged as the most effective model, achieving an accuracy of 84%. Other notable performers include the Decision Tree Classifier (80%) and the Support Vector Machine Classifier (83%).
Challenges and Future Work:
While the project yielded promising results, challenges such as class imbalance and suboptimal neural network performance were encountered. Future work could involve further hyperparameter tuning, feature engineering, and exploring ensemble techniques for enhanced predictive accuracy.
Technologies Used:
Python, scikit-learn, TensorFlow, seaborn, matplotlib
The complete python code for the project can be accessed at: Github Repository
Comments
Post a Comment