Agriculture Data Processing with Python
Updated: Nov 26, 2024

Link to the code: https://github.com/Chauloroches/Agricultural_Data_Processing_Python
Cleaned, processed, and visualized agricultural datasets using Python and pandas, identifying key trends that informed productivity improvements and sales forecasting
This project has two files:
crop_production.csv

Sales_Data

Cleaning Data
Provide summary details of sales and production data as shown below:
This helps me know how the data is and how many rows and columns it has.
I also check for quick statistics in our data frames in the numerical columns:
sales.describe()
production.describe()
I then check for total missing values in sales and production as shown below:
sales.isnull().sum()
production.isnull().sum()
I then check for duplicates in both data frames:
sales.duplicated()
production.duplicated()
I now remove duplicates to ensure the data frames are clean:
remove_sales_duplicates = sales.drop_duplicates()
remove_production_duplicates = production.drop_duplicates()
remove_production_duplicates
remove_sales_duplicates
Exploratory Data Analysis (EDA)
To find top crops by yield.
top_crops = production.groupby("Crop")["Yield_per_Hectare"].mean().sort_values(ascending=False)
top_crops

Monthly trends in sales volume:
monthly_sales = sales.groupby("Month")["Sales_Volume"].sum()
monthly_sales

Seasonal trends:
seasons_yield = production.groupby(['Season','Crop'])["Yield_per_Hectare"].mean().unstack()
seasons_yield

Visualization
Plotting Bar chart of average yield per crop and we used a bar graph for this:
top_crops.plot(kind="bar", figsize=(8, 5),colormap="Set2")
plt.ylabel("Yield (tons per hectare)",labelpad=10)
plt.xlabel("Crop",labelpad=10)
plt.title("Average Yield per Crop",y=1.02)
plt.show()

Plotting Line chart of Monthly sales
This line plot shows the total sales volume for each month, making it easy to spot any seasonal patterns in sales
monthly_sales.plot(kind="line", figsize=(6, 4),marker="o")
plt.ylabel("Sales Volume (tons)",labelpad=10)
plt.xlabel("Month",labelpad=10)
plt.title("Monthly Sales Volume",y=1.02)
plt.grid(True)
plt.xticks(range(1,13))
plt.show()

Plotting Seasonal Yield Trends for all the crops
seasons_yield.plot(kind='bar',figsize=(8,5),colormap="Set2")
plt.xlabel('Crop',labelpad=10)
plt.ylabel('Yield(Tonnes per Hectares)',labelpad=10)
plt.title('Seasonal Yield per Hectares',y=1.02)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Insights and Report
Having done the analysis and visualization we concluded two major things:
1. The crop with the highest yield was Maize and Soybean was the lowest
2. The months with the highest sales volumes were months 6 and 4.
Comments