Global Data Science Forum

Call for Code: Water Sustainability Datasets

By Susan Malaika posted 14 days ago

  

Webinar Series starting March 2020:

Sustainable food systems and nutrition: Food Post-Harvest Losses


There is an upcoming webinar on March 25 2020 organized by Agreenium (l'Institut agronomique, vétérinaire et forestier de France), UN-ESCAP (United Nations Economic and Social Commission for Asia and the Pacific), and FAO (Food and Agriculture Organization of the United Nations),:
  • How can we better measure and reduce post-harvest losses worldwide, especially in South East Asian Countries?
  • Join the webinar and share your thoughts and ideas! Mar 25, 2020 02:30 PM (CET)and 9:30am US Eastern
You will find more information about the webinar here bit.ly/2vU0dis. The session is the first in a series.

Call for Code: Water Sustainability Datasets

Following on from the blog Useful data sets for Call for Code we will now focus our attention on datasets and models for water sustainability and agriculture, and some basic tools to manipulate them. Agriculture is the largest consumer of water at about 70%  of all withdrawals globally as stated by the World Bank. Tio emphasize the point, an  article at Penn State entitled How much water does it take to make a pair of jeans?  says:  It takes around 1,800 gallons of water to grow enough cotton to produce just one pair of regular ol’ blue jeans. That’s more water than it takes to make a ton of cement"

You will find some water, agriculture, soil, and crops related datasets through the Google Dataset Search Tool as described in the earlier Useful data sets for Call for Code  and you should pay attention to the license when you download. You will also find water related datasets in the various government websites listed in the useful datasets blog such as https://www.data.gov/climate/water/, and in other locations such as Oak Ridge National Lab in the US https://daac.ornl.gov/. See the Get Data section https://daac.ornl.gov/get_data/

Another great source is the Food and Agriculture Organization (FAO) at the United Nations, whose goal is  make sure that people have regular access to enough high-quality food to lead active and healthy lives, maintains. The organization compiles datasets such as http://www.fao.org/aquastat/en/countries-and-basins/country-profiles.in collections such as http://www.fao.org/aquastat/en/  The European Data Portal https://www.europeandataportal.eu/data/datasets  has datasets in the agriculture and other important categories. On dataworld https://data.world/ you will find datasets about soil. You will find further suitable datasets in many collections to help you motivate your Call for Code solution and tell the story, or to build your solution by :

  • visualizing  data perhaps as part of a dashboard
  • using data in an application
  • training or using a model to make predictions
 
Visualizing  data
The Humanitarian Data Exchange (HDX) is an open data sharing platform managed by the United Nations Office for the Coordination of Humanitarian Affairs. In the exchange you will find a climate change dataset for each country, derived from world bank data. The climate change datasets typically track indicators such as arable land, land under cereal production, fertilizer consumption, etc over a number of years.  The indicators will vary from country to country, but will help you tell a story around your solution. You can find and download the climate change country datasets by selecting location or by using the search option - For example: You can also create quick graphs using the  HDX tools.

GeoJSON is sometimes used as a format for data on the HDX site for example:  https://data.humdata.org/dataset/indicator-6-3-2-proportion-of-bodies-of-water-with-good-ambient-water-quality-percent  a dataset of proportion of bodies of water on earth with good ambient water quality (%) Indicator.  GeoJSON is a format for encoding geographic data structures. The Pandas Python library is an excellent tool to manipulate GeoJSON files along with many other data formats including the popular CSV format. You can learn how to use Pandas by following this Call for Code Leaning Path  https://developer.ibm.com/technologies/analytics/tutorials/data-analysis-in-python-using-pandas.

Below is a simple visualization of the ambient quality dataset mentioned above created with GeoPandas which is another Python library focused on Geospatial data.  The top portion shows the dataset in GeoPandas with all its attributes, followed by two simple map plots illustrating the various GeoAreaNames by location.

Exploring the "proportion of bodies of water on earth with good ambient water quality (%) Indicator" dataset with GeoPandas


Another site that makes data available via GeoJSON is GreenSpin https://www.greenspin.de/ whose stated goal is to "digitize, quantify and monitor every single agricultural field on the planet"

Using data in an application

A simple way to access and explore any data that you have downloaded from one of the sites is to load the data in JSON format to the Cloudant JSON database - and then access the data through Cloudant's HTTP API.  The following videos explain how to do that. There are many easy tools to issue HTTP requests, such as cURL, and you can find a blog introducing the HTTP tools.
 

WaterRainbow2020.jpg

Using a model for water and agrarian solutions

AquaCrop-OS  is a free, open-source version of AquaCrop, a crop water productivity model developed by the Food and Agriculture Organization of the United Nations (FAO).mentioned earlier in this blog. AquaCrop-OS simulates efficiently water-limited crop production across diverse environmental and agronomic conditions. AquaCrop-OS covers multiple crop types and environmental conditions, and is designed specifically for regions where water is a critical limiting factor in crop production. The model can be used from multiple programming languages and operating environments.


There are more useful open source models on the Model Asset Exchange such as the Weather Forecaster https://developer.ibm.com/exchanges/models/all/max-weather-forecaster/ which takes hourly weather data as input and returns hourly weather predictions for  variables such as temperature or windspeed. You can learn more about the Model Asset Exchange through this tutorial https://developer.ibm.com/series/create-model-asset-exchange/

Academic connections and journals
There are numerous articles that deal with water, agriculture and related topics. Some universities specialize such as Wageningen University. See
Multifunctional agriculture WUR-INRA https://www.wur.nl/en/Research-Results/Chair-groups/Plant-Sciences/Farming-Systems-Ecology-Group/Research/Multifunctional-agriculture-WUR-INRA-the-Netherlands.htm which is a long running project of the Farming Systems Ecology Group at the university. It aims to provide scientific support for continuous and sustainable development of agro-ecosystems.

You will also find a lot of papers about weather forecasting such as "Weather Forecasting Using Sliding Window Algorithm" at

https://www.hindawi.com/journals/isrn/2013/156540/

Please also see the following article about that incorporates a description for quantifying risks from flooding for insurance purposes.  from the IBM Journal of Research and Development  Volume 64 - Jan-Feb 2020 

  • Creating a water risk index to improve community resilienceK. Klima ; L. El Gammal ; W. Kong ; D. Prosdocimi

    Abstract: Flood risk reduction is an existent discourse and agenda in policy and insurance. Existing approaches such as linking hydrological models to economic loss models may be highly inequitable between areas of different socio-economic vulnerability. To our knowledge, no one has tried to adapt the more advanced known heat risk theory by first informing flood risk with the socio-economic vulnerability, and then investigating the sensitivity of risk reduction policies to that flood risk. In this article, we demonstrate two methods to combine water hazard data with a derived water vulnerability index to characterize water risk. We then compare the costs of two potential government policies: buyout of the home versus funding for foundation elevation. We use the case study area of Pittsburgh, PA, which faces severe precipitation and riverine flooding hazards. We find that while small differences in characterizing flood risk can result in large differences between flood risk maps, the cost of the flood risk reduction policy is not sensitive to the method of representing the socio-economic vulnerability. This suggests that while validation of flood risk incorporating socio-economic data is needed, for some policies, policymakers can prioritize environmental justice with little to no additional cost.
You can download a full  copy of the article  IJRD-64-18-Creating_a_water_risk_index_to_improve_community_resilience.pdf and read more about the other articles in the journal 

Increasingly articles are accompanied by datasets so it is always worth looking at recent academic publications on water and agriculture.

#Highlights-home
0 comments
1487 views

Permalink