Food Waste - Exploring “Food Loss and Waste” Database from FAO

March 29, 2022 7 minute read

In this post, we’ll take a quick look at the Food Loss and Waste Database (FLW DB) from the Food and Agriculture Organization (FAO) of the United Nations. The post will be mostly about data visualization and the limitations of the dataset.

I used Python and Pandas/Seaborn/Matplotlib libraries to visualize the data. You can find the full Python scripts on my GitHub repository.

TL;DR

As described in the Background of the Food Loss and Waste Database page, the dataset has clear limitations. The data collection research for food waste and loss seems to be at a very early stage, so it’s hard to draw a meaningful and insightful conclusion just by looking at the dataset. However, since data visualization is fun, I plotted some satisfying graphs.

Database Overview

Downloading the Data

Please follow the below steps to download the dataset. (Note: The following steps are valid as of March 2022)

Go to FAO’s FLW data page. link.
Select “Year Range.” I used 1965 - 2021 for this post.
Download data by clicking the “Download Data” button on the bottom left of the page.

Here are the first few rows of the dataset.

# read and show the first few rows of the dataframe
df = pd.read_csv('./data/Data.csv')
df[:8]

	m49_code	country	region	cpc_code	commodity	year	loss_percentage	loss_percentage_original	loss_quantity	activity	food_supply_stage	treatment	cause_of_loss	sample_size	method_data_collection	reference	url	notes
0	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2009	5.22	5.22%	68100	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
1	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2008	5.43	5.43%	65240	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
2	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2007	5.61	5.61%	61080	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
3	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2006	5.40	5.4%	55270	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
4	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2005	5.00	5%	51970	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
5	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2004	5.00	5%	47310	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
6	104	Myanmar	NaN	0142	Groundnuts, excluding shelled	2003	5.00	5%	43880	NaN	Whole supply chain	NaN	NaN	NaN	FAO's annual Agriculture Production Questionna...	FAO Sources	NaN	NaN
7	108	Burundi	NaN	0111	Wheat	2020	3.50	3.5	NaN	Shelling, Threshing	Farm	NaN	NaN	NaN	Modelled Estimates	NaN	https://www.aphlis.net/en/page/20/data-tables#...	NaN

Columns

Here are the columns that I’ll be primarily using for this post. I believe the column names are self-explanatory, but please look at the Background of the Food Loss and Waste Database page for the full details.

year
- e.g. 2017, 2018, …
country
- There are 149 unique countries.
- e.g. ‘Austria’, ‘Jordan’, ‘Kenya’, …
commodity
- There are 194 unique commodities.
- e.g. ‘Apples’, ‘Pears’, ‘Apricots’, ‘Cherries’, ‘Peaches and nectarines’, ‘Plums and sloes’, ‘Kiwi fruit’, ‘Strawberries’, ‘Blueberries’, …
food_supply_stage
- There are 20 unique food supply stages.
- e.g. ‘Whole supply chain’, ‘Farm’, ‘Harvest’, ‘Storage’, ‘Processing’, ‘Trader’, ‘Retail’, …
loss_percentage
- e.g. 5%, 13.5%, 43.2%…

Data Visualization

I apologize for the small font size of the plots. I’m too lazy to rewrite the code. Please open the images in new windows to see the plots better.

Which Commodity Has The Most Food Waste?

This section visualizes loss percentages for each commodity.

The U.S. (Whole Supply Chain)

Let’s first look at the U.S. data. I’ll be looking at the “Whole supply chain” data from 2005.

country = 'United States of America'
year = 2005
supply_stage = "Whole supply chain"
df_filtered = df[(df['country'] == country) & 
                 (df['food_supply_stage'] == supply_stage) & 
                 (df['year'] > year)]

plot_order = df_filtered.groupby(by='commodity').mean().sort_values(by='loss_percentage', ascending=False).index.tolist()
fig, ax = plt.subplots(figsize=(20, 6))
plot = sns.barplot(x='commodity', y='loss_percentage', data=df_filtered, order=plot_order, ci='sd')
ax.set_xticklabels(ax.get_xticklabels(), rotation = 45, horizontalalignment='right')
ax.set_title(f"Food Loss ({supply_stage}) by Commodity in {country}, (data from {year}, vertical bars show stddev)".title())
ax.set(xlabel=None)
plt.show()

png

Here are some interesting findings.

“Grapefruit juice” has a large loss percentage while “Grapefruits” has a small loss percentage.
“Mixed grain” loss is large while individual grains (“Wheat” and “Barley”) losses are very small.
There are no data for meat. I don’t see beef, pork, and chicken.

Other Countries (Whole Supply Chain)

Here are some plots for different countries that used the same code as the above. There are more plots in my repo.

Canada

png

Canada data is very similar to U.S. data.
There is “Meat of pig, fresh or chilled” in Candata data, which is not found in the U.S. data.
Austria, Germany, Hungary

The followings data are from three European countries.

png png png

The above plots are mostly about grains (e.g., Rye, Wheat, Barley). I guess grain data in these countries were easier to collect than other commodities for FAO.
“Plums and sloes” loss is low in Hungary (about 2.5%), while the same number is high for Austria (about 28%). It’s hard to believe that Austria is ten times more efficient in processing plums than Hungary. I looked at the “reference” and “method_data_collection” of the rows, but they seem to be the same. To me, this seems like a limitation of the dataset - even though the data collection method is the same, it’s hard to believe the numbers.

year = 2005
supply_stage = "Whole supply chain"
countries = ["Hungary", "Austria"]
commodity = "Plums and sloes"
df_filtered = df[df['country'].isin(countries) & 
                 (df['food_supply_stage'] == supply_stage) & 
                 (df['year'] > year) & (df['commodity'] == commodity)]
df_filtered[['country', 'commodity', 'loss_percentage', 'food_supply_stage', 'method_data_collection','reference', 'year']]

	country	commodity	loss_percentage	food_supply_stage	method_data_collection	reference	year
7720	Hungary	Plums and sloes	2.40	Whole supply chain	FAO's annual Agriculture Production Questionna...	FAO Sources	2014
10267	Austria	Plums and sloes	26.12	Whole supply chain	FAO's annual Agriculture Production Questionna...	FAO Sources	2007
10282	Austria	Plums and sloes	35.30	Whole supply chain	FAO's annual Agriculture Production Questionna...	FAO Sources	2006

Mexico, Peru

png png

“Watermelons” loss percentages seem too different for these two countries. Mexico is about 0.4%, and Peru is about 45%.
The Mexico data doesn’t have a reference, so the large difference makes sense. One can imagine how difficult collecting food waste data is.

year = 2005
supply_stage = "Whole supply chain"
countries = ["Mexico", "Peru"]
commodity = "Watermelons"
df_filtered = df[df['country'].isin(countries) & (df['food_supply_stage'] == supply_stage) & (df['year'] > year) & (df['commodity'] == commodity)]
df_filtered[['country', 'commodity', 'loss_percentage', 'food_supply_stage', 'method_data_collection','reference', 'year']]

	country	commodity	loss_percentage	food_supply_stage	method_data_collection	reference	year
13919	Mexico	Watermelons	0.42	Whole supply chain	NaN	NaN	2011
17149	Peru	Watermelons	45.19	Whole supply chain	FAO's annual Agriculture Production Questionna...	FAO Sources	2007
17178	Peru	Watermelons	45.17	Whole supply chain	FAO's annual Agriculture Production Questionna...	FAO Sources	2006

Food Loss Per Commodity (Whole Supply Chain)

As we have seen from the Watermelon data from Mexico and Peru, there might be a massive difference if the data collection methods are different. From now on, I’ll only use the data where the reference is “FAO Sources” and “method_data_collection” is “FAO’s annual Agriculture Production Questionnaires.” The below box plot shows the food loss percentage distribution per commodity from all the countries.

supply_stage = "Whole supply chain"
year = 2005
reference = "FAO Sources"
data_collection_method = "FAO's annual Agriculture Production Questionnaires"
df_filtered = df[(df['food_supply_stage'] == supply_stage) & 
                 (df['year'] >= year) & 
                 (df['reference'] == reference) & 
                 (df['method_data_collection'] == data_collection_method)]
# Get commodities with data points from many countries.
n_country_threshold = 6  # We only use commidity where data is collected from more than n_country_threshold countries. 
unique_country_commodity_pairs = set(zip(df_filtered['country'], df_filtered['commodity'])) # unique (country, commodity) pairs
commodity_counter = Counter([commodity_ for _, commodity_ in unique_country_commodity_pairs]) 
common_commodities = set({x: count for x, count in commodity_counter.items() if count >= n_country_threshold}.keys())
df_filtered = df_filtered[df_filtered['commodity'].isin(common_commodities)]
commodity_plot_order = df_filtered.groupby(by='commodity').mean().sort_values(by='loss_percentage', ascending=False).index.tolist()
fig, ax = plt.subplots(figsize=(25, 6))
plot = sns.boxplot(x='commodity', y='loss_percentage', data=df_filtered, order=commodity_plot_order)
ax.set_xticklabels(ax.get_xticklabels(), rotation = 45, horizontalalignment='right')
ax.set_title(f"Food Loss ({supply_stage}) from all countries (from {year}, reference={reference})".title() )
ax.set(xlabel=None)
plt.show()

png

By looking at the plot, it seems like fruits (e.g., tomatoes, watermelons) produce more waste than grains (e.g., oats, rye)

Which Country Has the Highest Food Loss?

The below plots show how much food loss percentage for “Potatoes”, “Wheat”, “Rice” and “Hen eggs in shell, fresh” for different countries. We can interpret how efficient/inefficient each country’s supply chain is for various commodities.

supply_stage = "Whole supply chain"
year = 2005
reference = "FAO Sources"
data_collection_method = "FAO's annual Agriculture Production Questionnaires"
df_filtered = df[(df['food_supply_stage'] == supply_stage) & 
                 (df['year'] >= year) & 
                 (df['reference'] == reference) & 
                 (df['method_data_collection'] == data_collection_method)]
unique_country_commodity_pairs = set(zip(df_filtered['country'], df_filtered['commodity'])) # unique (country, commodity) pairs
common_commodities = [commodity for commodity, _ in Counter([commodity for _, commodity in unique_country_commodity_pairs]).most_common(10)]

commodity_to_plot = ['Potatoes',
                     'Wheat', 
                     'Rice',
                     'Hen eggs in shell, fresh']
for commodity in common_commodities:
    if commodity not in commodity_to_plot:
        continue
    df_filtered_by_commodity = df_filtered[df_filtered['commodity'] == commodity]
    fig, ax = plt.subplots(figsize=(15, 6))
    plot_order = df_filtered_by_commodity.groupby(by='country').mean().sort_values(by='loss_percentage', ascending=False).index.tolist()
    sns.barplot(x='country', y='loss_percentage', data=df_filtered_by_commodity, ax=ax, order=plot_order)
    ax.set_xticklabels(ax.get_xticklabels(), rotation = 45, horizontalalignment='right')
    ax.set_title(f"'{commodity}' Loss ({supply_stage}) per Country (from {year}, Vertical bars show stddev)".title() )
    ax.set(xlabel=None)
    ax.set_ylabel(f"{commodity} loss percentage")
    plt.show()

png

Limitation of the Dataset

If you look at the Background of the Food Loss and Waste Database page, FAO describes the limitations of the dataset.

Often studies will sum the loss percentages across the supply chain. This tends to inflate the overall percentage losses and negate that the quantities of the commodities decrease through the supply chain with respect to losses and other utilizations.

The issue of food loss may not be fully captured within national estimates as they often present the lower bound of the problem. On the other hand, studies done with small sample sizes may not be fully representative, even though they represent the upper bound of the problem.

The figures show differences between estimates along different stages of the value chains and regions, making a one-size fits all estimate at the international level less practical when addressing losses. Considering how these estimates change over time by stage and ranges, it will be more effective to look at supply stages which fluctuate in their losses in order to focus on effective policy and investments.

Effective comparison of data is presently hindered by a lack of comparability on many levels: different measurement methodologies across space and commodities, and estimates that may not be repeated over time.

There may be underlying biases when it comes to the data that has been collected, both at the national level and within the individual studies and the focus will be put on problematic areas

In the case studies, the sample sizes may not be large enough to extrapolate causes and solutions to a general policy prescription.

IMO, this means we can’t draw any conclusions from the data. I appreciate and highly think of the initial effort to collect the data. Still, I look forward to more standardized data collection so people can do more insightful analyses.

Share on

Twitter Facebook LinkedIn

Comments

M

July 16, 2022 at 04:09 AM

Very interesting

Your email address will not be published. Required fields are marked *

Donghyeon Won

Food Waste - Exploring “Food Loss and Waste” Database from FAO

TL;DR

Database Overview

Downloading the Data

Columns

Data Visualization

Which Commodity Has The Most Food Waste?

The U.S. (Whole Supply Chain)

Other Countries (Whole Supply Chain)

Canada

Austria, Germany, Hungary

Mexico, Peru

Food Loss Per Commodity (Whole Supply Chain)

Which Country Has the Highest Food Loss?

Limitation of the Dataset

Share on

Comments

M

Leave a comment

You may also enjoy

2024 Recap — Books

Satellite Image Analysis - NDVI Change Over Time

2024 Recap — Concerts

Half-Earth: Our Planet’s Fight for Life - E.O. Wilson