Skip to main content

Project 3: Criminal Cases Against Indian Politicians


 The Indian Lok Sabha elections of 2019 saw intense political competition among various political parties and their candidates. However, it is also essential to evaluate the possibility of criminal charges against the candidates. In this regard, the data for criminal cases registered against Lok Sabha MPs, who contested in the 2019 elections, has been extracted from myneta.info. This data has been analyzed using Python libraries like pandas, NumPy, and other machine learning algorithms to predict criminal cases against the candidates.

The Python script starts by scraping data from a specific URL using the requests library and parses the data using BeautifulSoup. The script then imports various libraries such as re, sqlite3, pandas, and numpy, and creates two tables (candidates and winners) using SQL queries in an SQLite database. It then inserts data into these tables using the executemany() method of the cursor object and saves the changes to the database using the commit() method.

After data cleaning, fundamental data analysis has been performed using pandas, NumPy, and Matplotlib libraries. The word cloud generated displays the political parties with a higher number of candidates contesting in the 2019 Lok Sabha Elections.

The analysis also used various machine learning algorithms like logistic regression, K Neighbors, SVC Linear, SVC rbf, Gaussian NB, Decision Tree, and Random Forest to predict criminal cases against the candidates. The models produced impressive training accuracies ranging from 80.8% to 99.5%.

Logistic regression, K Neighbors, SVC Linear, SVC rbf, Gaussian NB, and Random Forest produced training accuracies of 81.05%, 81.71%, 81.06%, 81.2%, 80.8%, and 96.4%, respectively. These accuracies suggest that these models are effective in predicting criminal cases of Lok Sabha candidates. Decision Tree produced a training accuracy of 99.5%, indicating that it is a highly reliable method for predicting criminal cases of Lok Sabha candidates.

In conclusion, this project demonstrates that machine learning algorithms, along with Python libraries like pandas, NumPy, and others, can be used to predict criminal cases against Lok Sabha candidates with impressive accuracies. Such predictions can provide valuable insights into the nature of Indian politics and the role of criminal charges in shaping electoral outcomes. These insights can help political parties and voters to make informed decisions during elections.

The complete project code can be found at the following link: navyasrivattikuti/LokSabha2019 (github.com)
You can also checkout my Instagram, YouTube and Twitter pages.

Comments

Popular posts from this blog

Project 5: Machine Learning Classification for Educational Outcomes

This project aimed to predict educational outcomes, specifically whether students would drop out, enroll, or graduate, based on various features. The dataset encompassed diverse information, including application modes, numerical features, and binary indicators. Methodology: The project employed a range of machine learning models, employing various algorithms to find the most suitable for the classification task. Models included K-Nearest Neighbors, Gradient Boosting, Decision Trees, Random Forests, Support Vector Machines, Gaussian Naive Bayes, Neural Networks, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. Exploratory Data Analysis (EDA) : Explored distribution of application modes, numerical features, and target variable. Utilized visualizations like pie charts, histograms, and correlation matrices for insights. Data Preprocessing: Transformed the target variable into binary classes for simplification. Split the dataset into training and testing sets. Standard...

Project 4: Insights from 'Olympics History' Data

 This report presents an analysis of the Olympics History dataset using Python and the Pandas library. The dataset contains information about athletes who participated in various Olympic events, including details such as age, height, weight, medals won, and more. The analysis aims to explore and visualize different aspects of the data to gain insights into athlete demographics, sports participation, and medal achievements. Data Loading and Overview The analysis begins with loading the dataset into a Pandas DataFrame named 'olympics.' The dataset contains information about Olympic athletes, including their personal details and performance records. After loading the data, we checked for basic information about the DataFrame using the info() method. This provided an overview of the columns, data types, and the presence of missing values. Missing Value Analysis Next, we conducted a missing value analysis by using the isna().sum() method to count the number of missing values in eac...