October 2023 – Stats

October 31, 2023

30th October,2023

Had some issues with respect to performing clustering techniques that were resolved by using different techniques as was not able to get correct clustering in terms of certain techniques. I tried to change and modify that accordingly using best ones for our analysis.

October 28, 2023

27th October,2023

Worked on using clustering techniques to group similar police shootings together so that i can basically identify different types of police shootings and also develop targeted interventions for each type and along with that using PCA to reduce the dimensionality of the data and to identify the most important factors that contribute to police shootings.

October 26, 2023

25th October,2023

Further looking at trends, correlations, and outliers. For example, how the number of police killings has changed over time, or how the rate of police killings varies across different states and the relationship between police killings and other factors, such as poverty, crime rates, and the presence of body cameras.

October 23, 2023

23rd October,2023

Exploring the types of weapons involved in incidents and their impact on threat levels along with that Investigating the relationship between signs of mental illness and the outcome of incidents. using Logistic regression which was not predicting correct trying to make changes and if not possible then move to applying classification models then.

October 20, 2023

20th October,2023

Signs of Mental Illness Prediction using logistic regression to predict whether an individual exhibits signs of mental illness based on other features like age, gender, and threat level along with that performing Clustering based on longitude and latitude that can identify geographic regions with a higher frequency of fatal police shootings.

October 19, 2023

18th October,2023

Some more clarity with respect to the dataset were found with the help of heatmap:

There is a strong positive correlation between age and signs of mental illness. This means that older people are more likely to show signs of mental illness.
There is a strong negative correlation between body camera usage and signs of mental illness. This means that people who have body cameras used on them are less likely to show signs of mental illness.
There is a weak negative correlation between longitude and signs of mental illness. This means that people who live in more westerly locations are slightly less likely to show signs of mental illness.
There is a weak positive correlation between latitude and signs of mental illness. This means that people who live in more northerly locations are slightly more likely to show signs of mental illness.

October 16, 2023

16th October,2023

Notably, several columns exhibit missing values, particularly in the longitude/latitude fields that must be solved before performing analysis as this would lead to inaccurate accuracies. Key statistical insights reveal the average age of individuals involved is approximately 37.21 years, with a broad range (2 to 92 years). Additionally, the geographic incidents’ locations are dispersed across a wide range of longitudes and latitudes, indicating a diverse geographic distribution of incidents.

October 14, 2023

13th October,2023

I learnt about logistic Regression and how it can be applied on our dataset analyzing the importance and usage of this algorithm to get better results and to generate insights for our dataset.

October 12, 2023

11th October,2023

I basically analyzed the data and what the data is all about. I tried to understand the essence of the data and to clear my doubts regarding what the data is all about and want problem should I solve while analyzing the dataset. The key to perform best analysis is to get the right questions with respect to the data and then find answers to them via the data.

October 8, 2023

Report on Exploring Obesity, Inactivity, and Diabetes Patterns in the United States

PROJECT1

October 7, 2023

6th October,2023

Compiled all the Report findings and completed the Report. Lastly discussed it with all the team members once to have everyone on the same page and to discuss if anything needs to be added or not.

October 5, 2023

4th October,2023

Writing Report and discussing the results and findings. Along with that trying to use other techniques on the dataset to get some better results or maybe find something insightful for our project.

October 4, 2023

2nd October,2023

Drafting of the Report in which different elements and findings were combined and thoroughly gone through along with that tested out multiple regression to understand how the combination of multiple independent variables influences or predicts the value of the dependent variable such as Year and Overall SVI as Independent Variable and one Dependent Variable Diagnosed Diabetes Percentage.

October 4, 2023

29th September,2023

Starting on our key findings and discussions with respect to the report and analyzing all the results of the team together to get all of us under same header. This led to further clarity regarding all the algorithms that were applied and were used.

October 4, 2023

27th September,2023

Used K-Means clustering, to divide the counties into groups according to the proportion of obese and diabetic people in each county (% Obese and % Diabetic, respectively). Based on these two features, clusters are produced that aid in locating patterns and similarities in the data. A separate cluster is represented by each color to see better representation of the dataset.

October 4, 2023

25th September,2023

Principle Component Analysis was used to reduce the features % Obese and % Inactive to two principal components, which capture the most variance in the data. The scatter plot shows how the data points are distributed in this new two-dimensional space. This was done to see any pattern or cluster between the two.

October 4, 2023

22nd September,2023

The accuracy score and confusion matrix are two essential tools for evaluating the performance of a logistic regression model. The accuracy score provides a general measure of how well the model is performing, while the confusion matrix provides more detailed information about the model’s ability to predict specific classes but in this I got a low accuracy that led me to move on to the next algorithm.

October 4, 2023

2Oth September,2023

I conducted Linear Regression and found that the slope of the line indicates the change in diabetes rate for a one-unit change in obesity rate. In this case, the slope is positive, indicating that obesity and diabetes are positively correlated. This means that as obesity rate increases, diabetes rate tends to increase as well. Next I am going to use Logistic Regression to determine the probability whether there is a higher percentage of diabetics or whether it has a higher percentage of obese individuals.

October 4, 2023

18th September,2023

Learning about the different concepts of linear regression, such as the y-intercept and slope, and how to interpret the results of a linear regression model. How we can basically use this to solve our problem and how we can interpret the results in a best way possible.

October 4, 2023

15th September,2023

Analyzing the dataset and basically breaking our problems into:

What exactly are we trying to solve? What are the desired outcomes?
What factors are influencing the problem?
Breaking down the problems and analyzing the dataset as it will make it easier to analyze and solve.
How should we approach this problem?