Skip to main content

Project 4: Insights from 'Olympics History' Data

 This report presents an analysis of the Olympics History dataset using Python and the Pandas library. The dataset contains information about athletes who participated in various Olympic events, including details such as age, height, weight, medals won, and more. The analysis aims to explore and visualize different aspects of the data to gain insights into athlete demographics, sports participation, and medal achievements.

Data Loading and Overview

The analysis begins with loading the dataset into a Pandas DataFrame named 'olympics.' The dataset contains information about Olympic athletes, including their personal details and performance records. After loading the data, we checked for basic information about the DataFrame using the info() method. This provided an overview of the columns, data types, and the presence of missing values.

Missing Value Analysis

Next, we conducted a missing value analysis by using the isna().sum() method to count the number of missing values in each column. We observed missing values in columns such as 'Age,' 'Height,' 'Weight,' and 'Medal.'

Data Imputation

To handle missing values, we calculated the average age, height, and weight for each combination of 'Sex' and 'Sport.' We then filled in the missing values in these columns based on the athlete's 'Sex' and 'Sport' using a custom function. After imputation, we rechecked the DataFrame for missing values and found that 'Height' and 'Weight' columns were successfully imputed.

Data Visualization

The analysis includes various data visualizations to explore and understand the dataset:

Scatterplots: Scatterplots were created to visualize the relationships between age, height, and weight, with points colored by gender. These scatterplots help identify trends and correlations between these variables.

Athlete Participation Over Time: Line plots were used to visualize the participation trends in the top 5 sports over the years. This provides insights into the popularity and growth of specific sports in the Olympics.


Top 10 Medal-Winning Athletes: A bar plot was used to display the top 10 athletes with the highest medal counts, highlighting their achievements.

Age Distribution in Top 10 Sports: A violin plot was used to visualize the distribution of athletes' ages in the top 10 sports, offering insights into the age demographics of athletes in these sports.

Gender Distribution in Top 5 Sports: Bar plots were used to show the gender distribution in the top 5 sports, providing insights into the gender balance in these sports.

Distribution of Age, Height, and Weight by Season: Violin plots were created to visualize the distribution of age, height, and weight by season, showing how these attributes vary between Summer and Winter Olympics.

Top 10 Sports by Season: A count plot was used to visualize the top 10 sports by season, showcasing the most popular sports in both Summer and Winter Olympics.

Trends in Total Medals Over the Last 10 Years: A line plot was used to display trends in total medals over the last 10 years, providing insights into the overall performance of athletes.


Top 10 Events with the Most Participants: A bar plot was used to show the top 10 events with the highest number of participants, highlighting the most popular events.

Top 10 Countries with the Most Total and Gold Medals: Bar plots were used to display the top 10 countries with the most total and gold medals, providing insights into medal achievements over the years.

Top 10 Countries with the Most Total Medals:
     NOC  Medal
139  USA   4985
137  URS   2063
46   GBR   1919
49   GER   1776
43   FRA   1611
67   ITA   1446
6    AUS   1304
57   HUN   1122
121  SWE   1108
93   NED    917

Top 10 Countries with the Most Gold Medals:
     NOC  Medal
101  USA   2468
99   URS    832
36   GBR    610
39   GER    590
51   ITA    518
34   FRA    461
44   HUN    432
87   SWE    354
4    AUS    342
37   GDR    339

Conclusion

This data analysis report has provided a comprehensive overview of the olympics history dataset, covering data loading, missing value analysis, data imputation, and various data visualizations. The analysis has shed light on athlete demographics, sports participation trends and medal achievements.

The complete python code for the project can be accessed at navyasrivattikuti/Olympics_History (github.com)

You can also checkout my Instagram, YouTube and Twitter pages.

Comments

Popular posts from this blog

Project 5: Machine Learning Classification for Educational Outcomes

This project aimed to predict educational outcomes, specifically whether students would drop out, enroll, or graduate, based on various features. The dataset encompassed diverse information, including application modes, numerical features, and binary indicators. Methodology: The project employed a range of machine learning models, employing various algorithms to find the most suitable for the classification task. Models included K-Nearest Neighbors, Gradient Boosting, Decision Trees, Random Forests, Support Vector Machines, Gaussian Naive Bayes, Neural Networks, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. Exploratory Data Analysis (EDA) : Explored distribution of application modes, numerical features, and target variable. Utilized visualizations like pie charts, histograms, and correlation matrices for insights. Data Preprocessing: Transformed the target variable into binary classes for simplification. Split the dataset into training and testing sets. Standard...

Project 3: Criminal Cases Against Indian Politicians

 The Indian Lok Sabha elections of 2019 saw intense political competition among various political parties and their candidates. However, it is also essential to evaluate the possibility of criminal charges against the candidates. In this regard, the data for criminal cases registered against Lok Sabha MPs, who contested in the 2019 elections, has been extracted from myneta.info. This data has been analyzed using Python libraries like pandas, NumPy, and other machine learning algorithms to predict criminal cases against the candidates. The Python script starts by scraping data from a specific URL using the requests library and parses the data using BeautifulSoup. The script then imports various libraries such as re, sqlite3, pandas, and numpy, and creates two tables (candidates and winners) using SQL queries in an SQLite database. It then inserts data into these tables using the executemany() method of the cursor object and saves the changes to the database using the commit() method...