Context
The Lancaster City Council is facing increasing challenges with fly-tipping incidents in the region. Fly-tipping poses environmental and health hazards, with over 1 million incidents reported in England between 2018-2019. This project analyzes fly-tipping data for the Lancaster area from September 2015 to October 2021 to identify hotspots, common characteristics, and predict future incidents to help the council better allocate resources and implement preventative measures.
Requirements
Hotspot Identification: Identify fly-tipping hotspots in the Lancaster City region.
Incident Characteristics: Determine common characteristics of fly-tipping incidents (e.g., waste types, locations).
Predictive Analysis: Predict future fly-tipping incidents and locations.
Incident-Location Links: Analyze potential links between incident types and locations.
Focused Enforcement: Recommend areas for focused enforcement efforts.
Visualization and Tools: Develop visualizations and tools to aid council staff in understanding and addressing the issue.
Approach
Data Preprocessing:
Data Cleaning:
Clean and format raw data from Lancaster City Council.
Remove irrelevant columns and handle missing values.
Date Conversion: Convert date formats and remove outliers.
Exploratory Data Analysis (EDA):
Frequency Analysis: Analyze frequency of incidents by location, waste type, and time.
Identification of Key Factors: Identify top locations and waste types for fly-tipping.
Clustering:
Hotspot Identification: Create interactive maps to visualize clusters and incident details.
Time Series Analysis:
Forecasting: Apply ARIMA models to forecast future incidents for key wards.
Model Diagnostics: Conduct diagnostic tests to ensure model fit.
Data Visualization:
Graphical Representation: Create graphs showing incidents by waste type, ward, and land type over time.
Deep Dive Analysis:
Comparison of Locations: Compare characteristics of top 4 and bottom 4 incident locations.
Factor Analysis: Analyze factors like bin collection schedules and nearby recycling facilities.
Technologies Used
R Studio:For data preprocessing and statistical analysis.
Python:For clustering, time series analysis, and visualization.
Libraries:Scikit-learn: For K-means clustering
Pandas & NumPy: For data manipulation.
Matplotlib & Seaborn: For data visualization.
ARIMA Models:For time series forecasting.