NYC Crime Analysis
NYC Crime Analysis - An Advanced Data Science Project
The "New York City Crime Analysis" project leverages advanced data science techniques to explore and forecast crime trends in New York City. The primary objective of the project is to identify the safest areas to live in NYC by analyzing historical crime data and making future predictions.
To achieve this, the project employs a variety of technologies and methodologies:
-
Data Collection and Preprocessing: The project utilizes data from NYC Open Data, Census Surveys, and EquityNYC. The data is cleaned and preprocessed to ensure accuracy and consistency, preparing it for detailed analysis.
-
Exploratory Data Analysis (EDA): Visualization tools such as Plotly are used to explore the data, identify trends, and understand the distribution of different types of crimes (felonies, misdemeanors, violations, and infractions) across various precincts and boroughs of NYC.
-
Time Series Analysis with ARIMA: The project uses the ARIMA (AutoRegressive Integrated Moving Average) model to forecast future crime rates based on historical monthly crime data. The model accounts for seasonality, trends, and noise in the data, enabling accurate predictions of future crime trends.
-
Machine Learning: A multivariate linear regression model is developed to analyze the impact of socio-economic factors (such as median income, poverty rate, and unemployment) and the presence of schools on crime rates. This model helps in understanding the relationship between these variables and crime, providing insights into potential causes and prevention measures.
-
Geospatial Analysis: The project integrates geospatial analysis to examine the impact of schools on nearby crime rates. By analyzing crime data within a 500-meter radius of schools, the project assesses the role of educational institutions in reducing crime.
-
Visualization and Mapping: Interactive maps and visualizations are created to present the findings. Tools like Leaflet and OpenStreetMap are used to visualize crime distribution across NYC, highlighting the safest precincts and boroughs.
The core questions the project aims to answer are:
- Which are the safest precincts and boroughs in NYC in terms of crime rates?
- How do socio-economic factors influence crime rates in different areas?
- What is the impact of schools on local crime rates?
- How will crime trends evolve in the near future?
By addressing these questions, the project provides valuable insights for individuals looking to relocate to NYC and policymakers aiming to improve public safety.
Concepts Used:
- Machine Learning (TidyModels + Parsnip)
- R
- Python (Selenium)
- QuatroDoc
- ARIMA
- RandomWalk
- Geospatial Data