# Merge, analyse and visualize OGGM GCM runs#

In this notebook we want to show:

• How to merge the output of different OGGM GCM projections into one dataset

• How to calculate regional values (e.g. volume) using the merged dataset

• How to use matplotlib or seaborn to visualize the projections and plot different statistical estimates (median, interquartile range, mean, std, …)

• How to use HoloViews and Panel to visualize the outcome. This part uses advanced plotting capabilities which are not necessary to understand the rest of the notebook.

This notebook is intended to explain the postprocessing steps, rather than the OGGM workflow itself. Therefore some code (especially conducting the GCM projection runs) does not have many explanations. If you are more interested in these steps you should check out the notebook Run OGGM with GCM data.

## GCM projection runs#

The first step is to conduct the GCM projection runs. We choose two different glaciers by their rgi_ids and conduct the GCM projections. Again if you do not understand all of the following code you should check out the Run OGGM with GCM data notebook.

# Libs
from time import gmtime, strftime
import xarray as xr
import numpy as np

# Locals
from oggm import utils, workflow, tasks
import oggm.cfg as cfg
from oggm.shop import gcm_climate


### Pre-processed directories#

# Initialize OGGM and set up the default run parameters
cfg.initialize(logging_level='WARNING')

# change border around the individual glaciers
cfg.PARAMS['border'] = 80

# Use Multiprocessing
cfg.PARAMS['use_multiprocessing'] = True

# For hydro output
cfg.PARAMS['store_model_geometry'] = True

# Local working directory (where OGGM will write its output)
cfg.PATHS['working_dir'] = utils.gettempdir('OGGM_merge_gcm_runs', reset=True)

# RGI glaciers: Ngojumba and Khumbu
rgi_ids = utils.get_rgi_glacier_entities(['RGI60-15.03473', 'RGI60-15.03733'])

# Go - get the pre-processed glacier directories
# in OGGM v1.6 you have to explicitly indicate the url from where you want to start from
# we will use here the elevation band flowlines which are much simpler than the centerlines
base_url = 'https://cluster.klima.uni-bremen.de/~oggm/gdirs/oggm_v1.6/L3-L5_files/2023.1/elev_bands/W5E5/'
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=5,
prepro_base_url=base_url)

2023-03-10 19:21:43: oggm.cfg: Reading default parameters from the OGGM params.cfg configuration file.
2023-03-10 19:21:43: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2023-03-10 19:21:43: oggm.cfg: Multiprocessing: using all available processors (N=2)
2023-03-10 19:21:43: oggm.cfg: Multiprocessing switched ON after user settings.
2023-03-10 19:21:43: oggm.cfg: PARAMS['store_model_geometry'] changed from False to True.
2023-03-10 19:21:50: oggm.workflow: init_glacier_directories from prepro level 5 on 2 glaciers.
2023-03-10 19:21:50: oggm.workflow: Execute entity tasks [gdir_from_prepro] on 2 glaciers


In this notebook, we will use the bias-corrected ISIMIP3b GCM files. You can also use directly CMIP5 or CMIP6 (how to do that is explained in run_with_gcm).

all_GCM = ['gfdl-esm4_r1i1p1f1', 'ipsl-cm6a-lr_r1i1p1f1',
'mpi-esm1-2-hr_r1i1p1f1',
'mri-esm2-0_r1i1p1f1', 'ukesm1-0-ll_r1i1p1f2']

# define the SSP scenarios
all_scenario = ['ssp126', 'ssp370', 'ssp585']

for GCM in all_GCM:
for ssp in all_scenario:
# we will pretend that 'mpi-esm1-2-hr_r1i1p1f1' is missing for ssp370
# to later show how to deal with missing values,
# if you want to use this
# code you can of course remove the "if" and just download all GCMs and SSPS
if (ssp == 'ssp370') & (GCM=='mpi-esm1-2-hr_r1i1p1f1'):
pass
else:
ssp = ssp,
# gcm ensemble -> you can choose another one
member=GCM,
# recognize the climate file for later
output_filesuffix=f'_{GCM}_{ssp}'
);

# you could create a similar workflow with CMIP5 or CMIP6

2023-03-10 19:21:51: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:21:53: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:21:54: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:21:56: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:21:57: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:21:59: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:00: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:02: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:04: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:05: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:07: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:08: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:10: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers
2023-03-10 19:22:11: oggm.workflow: Execute entity tasks [process_monthly_isimip_data] on 2 glaciers


Here we defined, downloaded and processed all the GCM models and scenarios (here 5 GCMs and three SSPs). We pretend that one GCM is missing for one scenario (this is sometimes the case in for example CMIP5 GCMs, see e.g. this table). We deal with possible missing GCMs for specific SCENARIOs by by including a try/except in the code below and by taking care that the missing values are filled with NaN values (and stay NaN when doing sums).

### Actual projection Runs#

Now we conduct the actual projection runs. Again handling the case that for a certain (GCM, SCENARIO) combination no data is available with a try/except.

for GCM in all_GCM:
for scen in all_scenario:
rid = '_{}_{}'.format(GCM, scen)
try:  # check if (GCM, scen) combination exists
ys=2020,  # star year of our projection runs
climate_filename='gcm_data',  # use gcm_data, not climate_historical
climate_input_filesuffix=rid,  # use the chosen GCM and scenario
init_model_filesuffix='_historical',  # this is important! Start from 2020 glacier
output_filesuffix=rid,  # the filesuffix of the resulting file, so we can find it later
store_monthly_hydro=True
);

except FileNotFoundError:
# if a certain scenario is not available for a GCM we land here
# and we inidcate this by printing a message so the user knows
# this scenario is missing
# (in this case of course, the file actually is available, but we just pretend that it is not...)
print('No ' + GCM +' run with scenario ' + scen + ' available!')

No mpi-esm1-2-hr_r1i1p1f1 run with scenario ssp370 available!

2023-03-10 19:22:13: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:14: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:15: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:15: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:16: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:17: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:18: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:19: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:19: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:20: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:21: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:21: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:22: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:23: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers
2023-03-10 19:22:24: oggm.workflow: Execute entity tasks [run_with_hydro] on 2 glaciers


## Merge datasets together#

Now that we have conducted the projection Runs we can start merging all outputs into one dataset. The individual files can be opened by providing the output_filesuffix of the projection runs as the input_filesuffix. Let us have a look at one resulting dataset!

file_id  = '_{}_{}'.format(all_GCM[0], all_scenario[0])
ds = utils.compile_run_output(gdirs, input_filesuffix=file_id)
ds

2023-03-10 19:22:25: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:25: oggm.utils: Applying compile_run_output on 2 gdirs.

<xarray.Dataset>
Dimensions:                       (time: 82, rgi_id: 2, month_2d: 12)
Coordinates:
* time                          (time) float64 2.02e+03 ... 2.101e+03
* rgi_id                        (rgi_id) <U14 'RGI60-15.03473' 'RGI60-15.03...
hydro_year                    (time) int64 2020 2021 2022 ... 2099 2100 2101
hydro_month                   (time) int64 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4
calendar_year                 (time) int64 2020 2021 2022 ... 2099 2100 2101
calendar_month                (time) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
* month_2d                      (month_2d) int64 1 2 3 4 5 6 7 8 9 10 11 12
calendar_month_2d             (month_2d) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables: (12/26)
volume                        (time, rgi_id) float64 7.042e+09 ... 1.451e+09
volume_m3_min_h               (time, rgi_id) float64 7.042e+09 ... 1.451e+09
volume_bsl                    (time, rgi_id) float64 0.0 0.0 0.0 ... 0.0 0.0
volume_bwl                    (time, rgi_id) float64 0.0 0.0 0.0 ... 0.0 0.0
area                          (time, rgi_id) float64 6.046e+07 ... 1.628e+07
area_m2_min_h                 (time, rgi_id) float64 6.046e+07 ... 1.628e+07
...                            ...
liq_prcp_on_glacier_monthly   (time, month_2d, rgi_id) float64 0.0 ... nan
snowfall_off_glacier_monthly  (time, month_2d, rgi_id) float64 1.072e+05 ...
snowfall_on_glacier_monthly   (time, month_2d, rgi_id) float64 2.203e+08 ...
water_level                   (rgi_id) float64 0.0 0.0
glen_a                        (rgi_id) float64 1.624e-23 1.624e-23
fs                            (rgi_id) float64 0.0 0.0
Attributes:
description:    OGGM model output
oggm_version:   1.5.4.dev78+g6d305a4
calendar:       365-day no leap
creation_date:  2023-03-10 19:22:25

We see that as a result, we get a xarray.Dataset. We also see that this dataset has three dimensions time, rgi_id and month_2d. When we look at the Data variables (click to expand) we can see that for each of these variables a certain combination of the dimensions is given. For example, for the volume we see the dimensions (time, rgi_id) which indicates that the underlying data is separated by each year and each glacier. Let’s have a look at the volume in 2030 and 2040 at the glacier ‘RGI60-15.03473’ (Ngojumba):

ds['volume'].loc[{'time': [2030, 2040], 'rgi_id': ['RGI60-15.03473' ]}]

<xarray.DataArray 'volume' (time: 2, rgi_id: 1)>
array([[6.49251786e+09],
[5.87132702e+09]])
Coordinates:
* time            (time) float64 2.03e+03 2.04e+03
* rgi_id          (rgi_id) <U14 'RGI60-15.03473'
hydro_year      (time) int64 2030 2040
hydro_month     (time) int64 4 4
calendar_year   (time) int64 2030 2040
calendar_month  (time) int64 1 1
Attributes:
description:  Total glacier volume
unit:         m 3

A more complete overview of how to access the data of xarray Dataset can be found here.

This is quite useful, but unfortunately, we have one xarray Dataset for each (GCM, SCENARIO) pair. For an easier calculation and comparison between different GCMs and Scenarios(here SSPs), it would be desirable to combine all individual Datasets into one Dataset with the additional dimensions (GCM, SCENARIO). Fortunately, xarray provides different functions to merge Datasets (here you can find an Overview).

In our case, we want to merge all individual datasets with two additional coordinates GCM and SCENARIO. Additional coordinates can be added with the comment .coords[], also you can include a description with .coords[].attrs['description']. For example, let us add GCM as a new coordinate:

ds.coords['GCM'] = all_GCM[0]
ds.coords['GCM'].attrs['description'] = 'Global Circulation Model'
ds

<xarray.Dataset>
Dimensions:                       (time: 82, rgi_id: 2, month_2d: 12)
Coordinates:
* time                          (time) float64 2.02e+03 ... 2.101e+03
* rgi_id                        (rgi_id) <U14 'RGI60-15.03473' 'RGI60-15.03...
hydro_year                    (time) int64 2020 2021 2022 ... 2099 2100 2101
hydro_month                   (time) int64 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4
calendar_year                 (time) int64 2020 2021 2022 ... 2099 2100 2101
calendar_month                (time) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
* month_2d                      (month_2d) int64 1 2 3 4 5 6 7 8 9 10 11 12
calendar_month_2d             (month_2d) int64 1 2 3 4 5 6 7 8 9 10 11 12
GCM                           <U18 'gfdl-esm4_r1i1p1f1'
Data variables: (12/26)
volume                        (time, rgi_id) float64 7.042e+09 ... 1.451e+09
volume_m3_min_h               (time, rgi_id) float64 7.042e+09 ... 1.451e+09
volume_bsl                    (time, rgi_id) float64 0.0 0.0 0.0 ... 0.0 0.0
volume_bwl                    (time, rgi_id) float64 0.0 0.0 0.0 ... 0.0 0.0
area                          (time, rgi_id) float64 6.046e+07 ... 1.628e+07
area_m2_min_h                 (time, rgi_id) float64 6.046e+07 ... 1.628e+07
...                            ...
liq_prcp_on_glacier_monthly   (time, month_2d, rgi_id) float64 0.0 ... nan
snowfall_off_glacier_monthly  (time, month_2d, rgi_id) float64 1.072e+05 ...
snowfall_on_glacier_monthly   (time, month_2d, rgi_id) float64 2.203e+08 ...
water_level                   (rgi_id) float64 0.0 0.0
glen_a                        (rgi_id) float64 1.624e-23 1.624e-23
fs                            (rgi_id) float64 0.0 0.0
Attributes:
description:    OGGM model output
oggm_version:   1.5.4.dev78+g6d305a4
calendar:       365-day no leap
creation_date:  2023-03-10 19:22:25

You can see that now GCM is added as Coordinate, but when you look at the Data variables you see that this new Coordinate is not used by any of them. So we must add this new Coordinate to the Data variables using the .expand_dims() comment:

ds = ds.expand_dims('GCM')
ds

<xarray.Dataset>
Dimensions:                       (time: 82, rgi_id: 2, month_2d: 12, GCM: 1)
Coordinates:
* time                          (time) float64 2.02e+03 ... 2.101e+03
* rgi_id                        (rgi_id) <U14 'RGI60-15.03473' 'RGI60-15.03...
hydro_year                    (time) int64 2020 2021 2022 ... 2099 2100 2101
hydro_month                   (time) int64 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4
calendar_year                 (time) int64 2020 2021 2022 ... 2099 2100 2101
calendar_month                (time) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
* month_2d                      (month_2d) int64 1 2 3 4 5 6 7 8 9 10 11 12
calendar_month_2d             (month_2d) int64 1 2 3 4 5 6 7 8 9 10 11 12
* GCM                           (GCM) <U18 'gfdl-esm4_r1i1p1f1'
Data variables: (12/26)
volume                        (GCM, time, rgi_id) float64 7.042e+09 ... 1...
volume_m3_min_h               (GCM, time, rgi_id) float64 7.042e+09 ... 1...
volume_bsl                    (GCM, time, rgi_id) float64 0.0 0.0 ... 0.0
volume_bwl                    (GCM, time, rgi_id) float64 0.0 0.0 ... 0.0
area                          (GCM, time, rgi_id) float64 6.046e+07 ... 1...
area_m2_min_h                 (GCM, time, rgi_id) float64 6.046e+07 ... 1...
...                            ...
liq_prcp_on_glacier_monthly   (GCM, time, month_2d, rgi_id) float64 0.0 ....
snowfall_off_glacier_monthly  (GCM, time, month_2d, rgi_id) float64 1.072...
snowfall_on_glacier_monthly   (GCM, time, month_2d, rgi_id) float64 2.203...
water_level                   (GCM, rgi_id) float64 0.0 0.0
glen_a                        (GCM, rgi_id) float64 1.624e-23 1.624e-23
fs                            (GCM, rgi_id) float64 0.0 0.0
Attributes:
description:    OGGM model output
oggm_version:   1.5.4.dev78+g6d305a4
calendar:       365-day no leap
creation_date:  2023-03-10 19:22:25

Now we see that GCM was added to the Dimensions and all Data variables now use this new coordinate. The same can be done with SCENARIO. As a standalone, this is not very useful. But if we add these coordinates to all datasets, it becomes quite handy for merging. Therefore we now open all datasets and add to each one the two coordinates GCM and SCENARIO:

ds_all = []  # in this array all datasets going to be stored with additional coordinates GCM and SCENARIO
creation_date = strftime("%Y-%m-%d %H:%M:%S", gmtime())  # here add the current time for info
for GCM in all_GCM:  # loop through all GCMs
for scen in all_scenario:  # loop through all SSPs
try:  # check if GCM, SCENARIO combination exists
rid = '_{}_{}'.format(GCM, scen)  # put together the same filesuffix which was used during the projection runs

ds_tmp = utils.compile_run_output(gdirs, input_filesuffix=rid)  # open one model run

ds_tmp.coords['GCM'] = GCM  # add GCM as a coordinate
ds_tmp.coords['GCM'].attrs['description'] = 'used Global circulation Model'  # add a description for GCM
ds_tmp = ds_tmp.expand_dims("GCM")  # add GCM as a dimension to all Data variables

ds_tmp.coords['SCENARIO'] = scen  # add scenario (here ssp) as a coordinate
ds_tmp.coords['SCENARIO'].attrs['description'] = 'used scenario (here SSPs)'
ds_tmp = ds_tmp.expand_dims("SCENARIO")  # add SSO as a dimension to all Data variables

ds_tmp.attrs['creation_date'] = creation_date  # also add todays date for info
ds_all.append(ds_tmp)  # add the dataset with extra coordinates to our final ds_all array

except RuntimeError as err:  # here we land if an error occured
if str(err) == 'Found no valid glaciers!':  # This is the error message if the GCM, SCENARIO (here ssp) combination does not exist
print(f'No data for GCM {GCM} with SCENARIO {scen} found!')  # print a descriptive message
else:
raise RuntimeError(err)  # if an other error occured we just raise it

No data for GCM mpi-esm1-2-hr_r1i1p1f1 with SCENARIO ssp370 found!

2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:26: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:26: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:27: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:27: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:27: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:27: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:27: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:27: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:27: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:27: oggm.utils: Applying compile_run_output on 2 gdirs.
2023-03-10 19:22:27: oggm.utils: Applying global task compile_run_output on 2 glaciers
2023-03-10 19:22:27: oggm.utils: Applying compile_run_output on 2 gdirs.


### Actual merging#

ds_all now contains all outputs with the additionally added coordinates. Now we can merge them with the very convenient xarray function xr.combine_by_coords():

ds_merged = xr.combine_by_coords(ds_all, fill_value=np.nan)  # define how the missing GCM, SCENARIO combinations should be filled


Great! You see that the result is one Dataset, but we still can identify the individual runs with the two additionally added Coordinates GCM and SCENARIO (SSP). If you want to save this new Dataset for later analysis you can use for example .to_netcdf() (here you can find an Overview of xarray writing options):

ds_merged.to_netcdf('merged_GCM_projections.nc')


And to open it again later and automatically close the file after reading you can use:

with xr.open_dataset('merged_GCM_projections.nc') as ds:
ds_merged = ds


## Calculate regional values#

Now we have our merged dataset and we can start with the analysis. To do this xarray gives us a ton of different mathematical functions which can be used directly with our dataset. Here you can find a great overview of what is possible.

As an example, we could calculate the mean and std of the total regional glacier volume for different scenarios of our merged dataset. Here our added coordinates come in very handy. However, for a small amount of GCMs (for ISIMIP3b e.g. 5 GCMs), it is often better to compute instead more robust estimates, such as the median, the 25th and 75th percentile or the total range. We will show different options later:

Let us start by calculating the total “regional” glacier volume using .sum() (of course, here it is just the sum over two glaciers…):

# the missing values were filled with np.NaN values for all time points and glaciers
assert np.all(np.isnan(ds_merged['volume'].sel(SCENARIO='ssp370').sel(GCM='mpi-esm1-2-hr_r1i1p1f1')))

# So, what happens to the missing GCM and ssp if we just do the sum?
ds_total_volume_wrong = ds_merged['volume'].sum(dim='rgi_id',  # over which dimension the sum should be taken, here we want to sum up over all glacier ids
keep_attrs=True)  # keep the variable descriptions
ds_total_volume_wrong.sel(SCENARIO='ssp370').sel(GCM='mpi-esm1-2-hr_r1i1p1f1')

<xarray.DataArray 'volume' (time: 82)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
* time            (time) float64 2.02e+03 2.021e+03 ... 2.1e+03 2.101e+03
SCENARIO        <U6 'ssp370'
GCM             <U22 'mpi-esm1-2-hr_r1i1p1f1'
hydro_year      (time) int64 2020 2021 2022 2023 ... 2098 2099 2100 2101
hydro_month     (time) int64 4 4 4 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4 4 4 4 4
calendar_year   (time) int64 2020 2021 2022 2023 ... 2098 2099 2100 2101
calendar_month  (time) int64 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1
Attributes:
description:  Total glacier volume
unit:         m 3

At least in my xarray version, this creates a volume of “0”. That is not desirable, the sum over NaN, should give a NaN. To be sure that this happens, you should add the keyword arguments skipna=True and min_count=amount_of_rgi_ids:

ds_total_volume = ds_merged['volume'].sum(dim='rgi_id',  # over which dimension the sum should be taken, here we want to sum up over all glacier ids
skipna=True,  # ignore nan values
# important, we need values for every glacier (here 2) to do the sum
# if some have NaN, the sum should also be NaN (if you don't set min_count, the sum will be 0, which is bad)
min_count=len(ds_merged.rgi_id),
keep_attrs=True)  # keep the variable descriptions
# perfect, now, the NaN is preserved!
assert np.all(np.isnan(ds_total_volume.sel(SCENARIO='ssp370').sel(GCM='mpi-esm1-2-hr_r1i1p1f1')))


To conclude, for this calculation:

• we told xarray over which dimension we want to sum up dim='rgi_id' (all glaciers),

• that missing values should be ignored skipna=True (as some GCM/SCENARIO combinations are not available),

• defined the minimum amount of required observations to do the operation (i.e. here we need at least two observations that are not NaN),

• and that the attributes (the description of the variables) should be preserved keep_attrs=True. You can see that the result has three dimensions left (GCM, time, SCENARIO) and the dimension rgi_id disappeared as we have summed all values up along this dimension.

Likewise, we can now calculate for example the median of all GCM runs:

ds_total_volume_median = ds_total_volume.median(dim='GCM',  # over which dimension the median should be calculated
skipna=True,  # ignore nan values
keep_attrs=True)  # keep all variable descriptions
ds_total_volume_median

<xarray.DataArray 'volume' (SCENARIO: 3, time: 82)>
array([[8.93141868e+09, 8.86627546e+09, 8.86750097e+09, 8.79975730e+09,
8.74290799e+09, 8.69525104e+09, 8.62209421e+09, 8.53689107e+09,
8.45209258e+09, 8.36166500e+09, 8.27133038e+09, 8.20964098e+09,
8.14563135e+09, 8.06941917e+09, 8.02291101e+09, 7.91913642e+09,
7.85969914e+09, 7.75565675e+09, 7.66260179e+09, 7.54986935e+09,
7.40818831e+09, 7.33130806e+09, 7.28928025e+09, 7.23947050e+09,
7.16855322e+09, 7.06570971e+09, 6.96239494e+09, 6.89020276e+09,
6.81102847e+09, 6.76072014e+09, 6.67617822e+09, 6.59492628e+09,
6.50765480e+09, 6.45392001e+09, 6.44913669e+09, 6.36763079e+09,
6.32396442e+09, 6.25902552e+09, 6.14982832e+09, 6.09462762e+09,
6.01644813e+09, 5.91769503e+09, 5.84395968e+09, 5.77525701e+09,
5.70460104e+09, 5.63126931e+09, 5.53449785e+09, 5.48841182e+09,
5.41175224e+09, 5.45268854e+09, 5.36046082e+09, 5.31220617e+09,
5.25735802e+09, 5.17622302e+09, 5.16114145e+09, 5.13565980e+09,
5.11278017e+09, 5.08373028e+09, 5.06400762e+09, 5.04480906e+09,
5.03511494e+09, 5.02440252e+09, 4.99000766e+09, 4.94002300e+09,
4.91246516e+09, 4.87141456e+09, 4.87787016e+09, 4.88830525e+09,
4.86124367e+09, 4.82410088e+09, 4.81265542e+09, 4.82311281e+09,
4.86614599e+09, 4.85912621e+09, 4.82342451e+09, 4.86558720e+09,
4.93660698e+09, 4.97378117e+09, 4.97138153e+09, 4.97838170e+09,
...
8.68514176e+09, 8.61135324e+09, 8.56704164e+09, 8.48503549e+09,
8.41325809e+09, 8.29299141e+09, 8.24316169e+09, 8.19880324e+09,
8.14926839e+09, 8.07683719e+09, 8.01090822e+09, 7.92686171e+09,
7.84919542e+09, 7.77427557e+09, 7.70514623e+09, 7.59061382e+09,
7.55527058e+09, 7.49044390e+09, 7.27014520e+09, 7.07003428e+09,
6.93788372e+09, 6.75505521e+09, 6.58245715e+09, 6.44345162e+09,
6.32843508e+09, 6.24641566e+09, 6.17555491e+09, 6.07064456e+09,
5.95682116e+09, 5.87553893e+09, 5.72355600e+09, 5.59831264e+09,
5.50672134e+09, 5.38398744e+09, 5.27073365e+09, 5.11903361e+09,
4.94723853e+09, 4.83652612e+09, 4.64146627e+09, 4.50144248e+09,
4.39883903e+09, 4.27333420e+09, 4.15559350e+09, 4.03733967e+09,
3.96493481e+09, 3.85336948e+09, 3.74637002e+09, 3.66706492e+09,
3.56472183e+09, 3.47200104e+09, 3.39800699e+09, 3.30321344e+09,
3.22333683e+09, 3.17994865e+09, 3.12840389e+09, 3.07829065e+09,
3.00726319e+09, 3.00134862e+09, 2.99324037e+09, 2.97524140e+09,
2.91440860e+09, 2.84212496e+09, 2.79504385e+09, 2.77727601e+09,
2.69076033e+09, 2.63402835e+09, 2.57898977e+09, 2.54212919e+09,
2.49490416e+09, 2.43934526e+09, 2.43712167e+09, 2.39755985e+09,
2.41220711e+09, 2.33231989e+09, 2.26526901e+09, 2.23414616e+09,
2.21803367e+09, 2.20598386e+09]])
Coordinates:
* time            (time) float64 2.02e+03 2.021e+03 ... 2.1e+03 2.101e+03
* SCENARIO        (SCENARIO) <U6 'ssp126' 'ssp370' 'ssp585'
hydro_year      (time) int64 2020 2021 2022 2023 ... 2098 2099 2100 2101
hydro_month     (time) int64 4 4 4 4 4 4 4 4 4 4 4 ... 4 4 4 4 4 4 4 4 4 4 4
calendar_year   (time) int64 2020 2021 2022 2023 ... 2098 2099 2100 2101
calendar_month  (time) int64 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 1
Attributes:
description:  Total glacier volume
unit:         m 3

And you see that now we are left with two dimensions (SCENARIO, time). This means we have calculated the median total volume for all different scenarios (here SSPs) and along the projection period. The mean, standard deviation (std) or percentiles can be also calculated in the same way as the median. Again, bare in mind, that for small sample sizes of GCM ensembles (around 15), which is almost always the case, it is often much more robust to use median and the interquartile range or other percentile ranges.

## Visualisation with matplotlib or seaborn#

We will first use matplotlib, where we need to estimate the statistical estimates ourselves. This is good to understand what we are doing.

import matplotlib.pyplot as plt

# estimate the minimum volume for each time step and scenario over the GCMs
ds_total_volume_min = ds_total_volume.min(dim='GCM', keep_attrs=True,skipna=True)
# estimate the 25th percentile volume for each time step and scenario over the GCMs
ds_total_volume_p25 = ds_total_volume.quantile(0.25, dim='GCM', keep_attrs=True,skipna=True)
# estimate the 50th percentile volume for each time step and scenario over the GCMs
ds_total_volume_p50 = ds_total_volume.quantile(0.5, dim='GCM', keep_attrs=True,skipna=True)
# this is the same as the median -> Let's check
np.testing.assert_allclose(ds_total_volume_median,ds_total_volume_p50)
# estimate the 75th percentile volume for each time step and scenario over the GCMs
ds_total_volume_p75 = ds_total_volume.quantile(0.75, dim='GCM', keep_attrs=True,skipna=True)
# estimate the maximum volume for each time step and scenario over the GCMs
ds_total_volume_max = ds_total_volume.max(dim='GCM', keep_attrs=True,skipna=True)

# Think twice if it is appropriate to compute a mean/std over your GCM sample, is it Gaussian distributed?
# Otherwise use instead median and percentiles or total range
ds_total_volume_mean = ds_total_volume.mean(dim='GCM', keep_attrs=True,
skipna=True)
ds_total_volume_std = ds_total_volume.std(dim='GCM', keep_attrs=True,
skipna=True)

color_dict={'ssp126':'blue', 'ssp370':'orange', 'ssp585':'red'}

fig, axs = plt.subplots(1,2, figsize=(12,6),
sharey=True # we want to share the y axis betweeen the subplots
)
for scenario in color_dict.keys():
# get amount of GCMs per Scenario to add it to the legend:
n = len(ds_total_volume.sel(SCENARIO=scenario).dropna(dim='GCM').GCM)
axs[0].plot(ds_total_volume_median.time,
ds_total_volume_median.sel(SCENARIO=scenario)/1e9,  # m3 -> km3
label=f'{scenario}: n={n} GCMs',
color=color_dict[scenario],lw=3)
axs[0].fill_between(ds_total_volume_median.time,
ds_total_volume_p25.sel(SCENARIO=scenario)/1e9,
ds_total_volume_p75.sel(SCENARIO=scenario)/1e9,
color=color_dict[scenario],
alpha=0.5,
label='interquartile range\n(75th-25th percentile)')
axs[0].fill_between(ds_total_volume_median.time,
ds_total_volume_min.sel(SCENARIO=scenario)/1e9,
ds_total_volume_max.sel(SCENARIO=scenario)/1e9,
color=color_dict[scenario],
alpha=0.1,
label='total range')

for scenario in color_dict.keys():
axs[1].plot(ds_total_volume_mean.time,
ds_total_volume_mean.sel(SCENARIO=scenario)/1e9,  # m3 -> km3
color=color_dict[scenario],
label='mean',lw=3)
axs[1].fill_between(ds_total_volume_mean.time,
ds_total_volume_mean.sel(SCENARIO=scenario)/1e9 - ds_total_volume_std.sel(SCENARIO=scenario)/1e9,
ds_total_volume_mean.sel(SCENARIO=scenario)/1e9 + ds_total_volume_std.sel(SCENARIO=scenario)/1e9,
alpha=0.3,
color=color_dict[scenario],
label='standard deviation')

for ax in axs:
# get all handles and labels and then create two different legends
handles, labels = ax.get_legend_handles_labels()
if ax == axs[0]:
# we want to have two legends, let's save the first one
# which just shows the colors and the different SSPs here
leg1 = ax.legend(handles[:3], labels[:3], title='Scenario')
ax.set_ylabel(r'Volume (km$^3$)')
# create the second one, that shows the different statistical estimates
ax.legend([handles[0],handles[3],handles[4]],
['median',labels[3],labels[4]], loc='lower left')
# we need to allow for two legend, this is done like that
else:
# for the second plot, we only want to have the legend for mean and std
ax.legend(handles[::3], labels[::3], loc='lower left')
ax.set_xlabel('Year');
ax.grid()


You have to choose which statistical estimate is best in your case. It is also a good possibility to just plot all the scenarios, and then to decide which statistical estimate describes best the spread in your case:

# for example, if there are just 5 GCMs, maybe you can just show all of them?
for gcm in all_GCM:
plt.plot(ds_total_volume.sel(SCENARIO=scenario).time,
ds_total_volume.sel(SCENARIO=scenario, GCM=gcm),
color='grey')
plt.title(scenario)

Text(0.5, 1.0, 'ssp585')


You could create similar plots even easier with seaborn, which has a lot of statistical estimation tools directly included (specifically in seaborn>=v0.12). You can for example check out this errorbar seaborn tutorial. Note that the outcome can be slightly different even for the same statistical estimate as e.g. seaborn and xarray might use different methods to compute the same thing.

import seaborn as sns
# this code might only work with seaborn v>=0.12
# seaborn always likes to have pandas dataframes
pd_total_volume = ds_total_volume.to_dataframe('volume_m3').reset_index()
sns.lineplot(x='time', y='volume_m3',
hue='SCENARIO', # these are the dimension for the different colors
estimator='median', # here you could also choose the mean
data=pd_total_volume,
lw=2, # increase the linewidth a bit
palette= color_dict, # let's use the same colors as before
# for errorbar you could also choose another percentile range
# see: https://seaborn.pydata.org/tutorial/error_bars.html
# "90" means the range goes from the 5th to the 95th percentile
# (50 would be the interquartile range, i.e., the same as two plots above)
errorbar=('pi', 90)
);


## Interactive visualisation with HoloViews and Panel#

• the following code only works if you install holoviews and panel

We can also visualize the data using tools from the HoloViz framework (namely HoloViews and Panel). For an introduction to HoloViz, you can have a look at the Small overview of HoloViz capability of data exploration notebook.

# Plotting
import holoviews as hv
import panel as pn
hv.extension('bokeh')
pn.extension()