8th December,2023
Finalizing the report and working on the final stages of results section including all the relevant code and the results for the report.
6th Decemebr,2023
Started working on the Report and adding all the results that are generated into the report and compiling them all.
4th December,2023
Applied neural network classifier using scikit-learn’s MLPClassifier and assessed its performance through cross-validation. First, the features are extracted from the DataFrame, selecting columns related to ‘inquiry_type’, ‘how_contacted’, and ‘research_type’. Categorical variables are then encoded using one-hot encoding to convert them into a format suitable for the neural network.
1st December,2023
Used Bootstrapping that involves repeatedly resampling with replacement from the original data to create multiple bootstrap samples. In this case, for each of the 1000 iterations, a bootstrap sample is drawn from the mean service time for each ‘inquiry_type’. Then computed the mean for each bootstrap sample and stores it in the list ‘bootstrap_means’.
29th Novemebr,2023
Standardized using z-score scaling and PCA is applied to derive principal components, and the printing the explained variance ratio for each. Additionally, it visualizes the cumulative explained variance against the number of principal components, aiding in the identification of an optimal dimensionality reduction level.
27th November,2023
Created a binary variable indicating whether service time is longer than 30 minutes. The model is used to predict whether service time is longer than 30 minutes on the test set. Then evaluates the model’s accuracy and printed the result along with a confusion matrix, providing insights into the model’s performance in binary classification.
24th November,2023
I worked on the Research dataset that was subsequently divided into training and testing set, computed the average service time for each type of study, and gets ready features and target variables for a linear regression model. Then forecasted service times on the test set and The script uses matplotlib to show the regression line and Root Mean Squared Error (RMSE) to evaluate the model. The final result is a figure that shows how the average service time is predicted by the linear regression model depending on the type of study.
22nd November,2023
I carried out a comprehensive analysis with the goal of highlighting important facets of organizational dynamics by examining pay scales by department, analyzing overtime patterns, and examine budgetary distributions. We will look at the effect of educational background on income. I also intend to find anomalies and outliers, offering insights into departmental and individual performance.
20th November,2023
One statistical technique to determine if a sample’s mean deviates significantly from an assumed or known population mean is the Z-test. The Z-test, which is used in hypothesis testing, entails figuring out the Z-score, which expresses how far a data point deviates from the mean in standard deviations. An elevated Z-score indicates a noteworthy departure from the average. When the population standard deviation is known, this test is very helpful. The Z-test is a widely used statistical tool in many disciplines, including psychology, economics, and quality control.
17th Novemebr,2023
We basically discussed the type of Dataset we should work on for our project along with that also discussed various Time Series Algorithms.
Time series are broken down into their component parts using conventional techniques like Autoregressive Integrated Moving Average (ARIMA) and Seasonal-Trend decomposition using LOESS (STL). ETS models, or exponential smoothing state space models, account for seasonality, trend, and error. Complex dependencies are handled by sophisticated machine learning models such as XGBoost, Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM). Seasonality-based forecasting is the area of expertise for algorithms like Prophet and SARIMA. Time series data trends and seasonality are accommodated by Holt-Winters exponential smoothing.
15th November,2023
Studied Time series analysis, which is commonly used in fields such as finance and meteorology, involves examining data points ordered over time. By examining patterns, trends, and fluctuations in temporal data, this statistical method frequently provides insights into underlying behaviors. Finding trends, seasonal patterns, and autocorrelation are essential elements, and methods like decomposition and smoothing help reveal important information. In order to make predictions and identify anomalies, forecasting and anomaly detection are crucial components. Time series analysis helps to comprehend and utilize the temporal dependencies within data to make predictions and decisions by using techniques like Autoregressive Integrated Moving Average (ARIMA).
13th November,2023
Analyzed the Dataset and this dataset includes economic indicators that the Boston Planning and Development Authority (BPDA) tracked on a monthly basis from January 2013 to December 2019. The data reflects the BPDA’s efforts to track and analyze important metrics for well-informed decision-making in city planning and development. These metrics span a variety of economic aspects, including employment, housing, travel, and real estate development for which analysis must be done further to generate insights from it.
Project 2 Report
Project 1 Updated
10th November,2023
We have finalized our report and are about to complete the material for the project with a recap of our findings.
8th November,2023
Compiling the report along with making the advised changes by the team members specifically in the section of Discussion and Appendix A. Comparing the results of other members and trying to correctly explain the results and our findings in the convincing way possible.
6th November,2023
Working on the draft of the report specifically on the sections of Findings and Discussion where my main finding with respect to the dataset must be explained in the concise way along with the Discussion that includes the fact that is proven by my findings.
3rd November,2023
Made some editing with respect to the project report and worked on trying to use other algorithms to get better accuracy for our data. Furthermore, discussed the insights among the group members to keep them all under one header. Along with that made some corrections upon input of the group members.
1st November,2023
I started working on the report getting all the data results and writing report according to the described format. All the results are being added and a punchline report is being created to explain my results and findings.
30th October,2023
Had some issues with respect to performing clustering techniques that were resolved by using different techniques as was not able to get correct clustering in terms of certain techniques. I tried to change and modify that accordingly using best ones for our analysis.
27th October,2023
Worked on using clustering techniques to group similar police shootings together so that i can basically identify different types of police shootings and also develop targeted interventions for each type and along with that using PCA to reduce the dimensionality of the data and to identify the most important factors that contribute to police shootings.
25th October,2023
Further looking at trends, correlations, and outliers. For example, how the number of police killings has changed over time, or how the rate of police killings varies across different states and the relationship between police killings and other factors, such as poverty, crime rates, and the presence of body cameras.
23rd October,2023
Exploring the types of weapons involved in incidents and their impact on threat levels along with that Investigating the relationship between signs of mental illness and the outcome of incidents. using Logistic regression which was not predicting correct trying to make changes and if not possible then move to applying classification models then.
20th October,2023
Signs of Mental Illness Prediction using logistic regression to predict whether an individual exhibits signs of mental illness based on other features like age, gender, and threat level along with that performing Clustering based on longitude and latitude that can identify geographic regions with a higher frequency of fatal police shootings.
18th October,2023
Some more clarity with respect to the dataset were found with the help of heatmap:
- There is a strong positive correlation between age and signs of mental illness. This means that older people are more likely to show signs of mental illness.
- There is a strong negative correlation between body camera usage and signs of mental illness. This means that people who have body cameras used on them are less likely to show signs of mental illness.
- There is a weak negative correlation between longitude and signs of mental illness. This means that people who live in more westerly locations are slightly less likely to show signs of mental illness.
- There is a weak positive correlation between latitude and signs of mental illness. This means that people who live in more northerly locations are slightly more likely to show signs of mental illness.
16th October,2023
Notably, several columns exhibit missing values, particularly in the longitude/latitude fields that must be solved before performing analysis as this would lead to inaccurate accuracies. Key statistical insights reveal the average age of individuals involved is approximately 37.21 years, with a broad range (2 to 92 years). Additionally, the geographic incidents’ locations are dispersed across a wide range of longitudes and latitudes, indicating a diverse geographic distribution of incidents.
13th October,2023
I learnt about logistic Regression and how it can be applied on our dataset analyzing the importance and usage of this algorithm to get better results and to generate insights for our dataset.
11th October,2023
I basically analyzed the data and what the data is all about. I tried to understand the essence of the data and to clear my doubts regarding what the data is all about and want problem should I solve while analyzing the dataset. The key to perform best analysis is to get the right questions with respect to the data and then find answers to them via the data.
Report on Exploring Obesity, Inactivity, and Diabetes Patterns in the United States
6th October,2023
Compiled all the Report findings and completed the Report. Lastly discussed it with all the team members once to have everyone on the same page and to discuss if anything needs to be added or not.
4th October,2023
Writing Report and discussing the results and findings. Along with that trying to use other techniques on the dataset to get some better results or maybe find something insightful for our project.
2nd October,2023
Drafting of the Report in which different elements and findings were combined and thoroughly gone through along with that tested out multiple regression to understand how the combination of multiple independent variables influences or predicts the value of the dependent variable such as Year and Overall SVI as Independent Variable and one Dependent Variable Diagnosed Diabetes Percentage.
29th September,2023
Starting on our key findings and discussions with respect to the report and analyzing all the results of the team together to get all of us under same header. This led to further clarity regarding all the algorithms that were applied and were used.
27th September,2023
Used K-Means clustering, to divide the counties into groups according to the proportion of obese and diabetic people in each county (% Obese and % Diabetic, respectively). Based on these two features, clusters are produced that aid in locating patterns and similarities in the data. A separate cluster is represented by each color to see better representation of the dataset.
25th September,2023
Principle Component Analysis was used to reduce the features % Obese and % Inactive to two principal components, which capture the most variance in the data. The scatter plot shows how the data points are distributed in this new two-dimensional space. This was done to see any pattern or cluster between the two.
22nd September,2023
The accuracy score and confusion matrix are two essential tools for evaluating the performance of a logistic regression model. The accuracy score provides a general measure of how well the model is performing, while the confusion matrix provides more detailed information about the model’s ability to predict specific classes but in this I got a low accuracy that led me to move on to the next algorithm.
2Oth September,2023
I conducted Linear Regression and found that the slope of the line indicates the change in diabetes rate for a one-unit change in obesity rate. In this case, the slope is positive, indicating that obesity and diabetes are positively correlated. This means that as obesity rate increases, diabetes rate tends to increase as well. Next I am going to use Logistic Regression to determine the probability whether there is a higher percentage of diabetics or whether it has a higher percentage of obese individuals.
18th September,2023
Learning about the different concepts of linear regression, such as the y-intercept and slope, and how to interpret the results of a linear regression model. How we can basically use this to solve our problem and how we can interpret the results in a best way possible.
15th September,2023
Analyzing the dataset and basically breaking our problems into:
- What exactly are we trying to solve? What are the desired outcomes?
- What factors are influencing the problem?
- Breaking down the problems and analyzing the dataset as it will make it easier to analyze and solve.
- How should we approach this problem?
Hello world!
Welcome to UMassD WordPress. This is your first post. Edit or delete it, then start blogging!