My Data Diary

Posts

Project 4: Insights from 'Olympics History' Data

This report presents an analysis of the Olympics History dataset using Python and the Pandas library. The dataset contains information about athletes who participated in various Olympic events, including details such as age, height, weight, medals won, and more. The analysis aims to explore and visualize different aspects of the data to gain insights into athlete demographics, sports participation, and medal achievements. Data Loading and Overview The analysis begins with loading the dataset into a Pandas DataFrame named 'olympics.' The dataset contains information about Olympic athletes, including their personal details and performance records. After loading the data, we checked for basic information about the DataFrame using the info() method. This provided an overview of the columns, data types, and the presence of missing values. Missing Value Analysis Next, we conducted a missing value analysis by using the isna().sum() method to count the number of missing values in eac...

Project 3: Criminal Cases Against Indian Politicians

The Indian Lok Sabha elections of 2019 saw intense political competition among various political parties and their candidates. However, it is also essential to evaluate the possibility of criminal charges against the candidates. In this regard, the data for criminal cases registered against Lok Sabha MPs, who contested in the 2019 elections, has been extracted from myneta.info. This data has been analyzed using Python libraries like pandas, NumPy, and other machine learning algorithms to predict criminal cases against the candidates. The Python script starts by scraping data from a specific URL using the requests library and parses the data using BeautifulSoup. The script then imports various libraries such as re, sqlite3, pandas, and numpy, and creates two tables (candidates and winners) using SQL queries in an SQLite database. It then inserts data into these tables using the executemany() method of the cursor object and saves the changes to the database using the commit() method...

Project 2: Exploring Factors Affecting Life Expectancy

The goal of this report was to explore factors that affect life expectancy across geographies/time period. The study relied on accurate data from the Global Health Observatory (GHO) data repository under the World Health Organization (WHO). The data-set related to life expectancy, health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. Among all categories of health-related factors only those critical factors were chosen which are more representative. The study found that life expectancy is majorly affected by the resources available in a country and how they are utilized. Wealthier countries have a higher average life expectancy than poorer countries. The study also found that alcohol consumption is one of the biggest factors affecting life expectancy. The research question was: "What changes are needed for a country to improve life expectancy?" The null hypoth...

Project 1: Analyzing Titanic Survivors

The Titanic is undoubtedly one of the most infamous maritime disasters in history, and the analysis of its passenger data has been a topic of interest for many data enthusiasts. This project report provides a comprehensive analysis of the attributes of the Titanic passengers and their survival rates after the disaster. The report starts by describing the data source and its attributes, with a significant number of missing values in the age and cabin columns. The descriptive statistics indicate that the majority of passengers were young, with more passengers in the 3rd class. Most passengers did not survive, with a high proportion of male passengers among those who perished. The data visualization techniques employed in the report highlight some interesting patterns, such as the higher survival rate of women and higher-class passengers. The correlation analysis further emphasizes the importance of age and sex in determining survival rates. The machine learning models, particu...