Merge, analyse and visualize OGGM GCM runs#

In this notebook we want to show:

  • How to merge the output of different OGGM GCM projections into one dataset

  • How to calculate regional values (e.g. volume) using the merged dataset

  • How to use HoloViews and Panel to visualize the outcome. This part uses advanced plotting capabilities which are not necessary to understand the rest of the notebook.

This notebook is intended to explain the postprocessing steps, rather than the OGGM workflow itself. Therefore some code (especially conducting the GCM projection runs) does not have many explanations. If you are more interested in these steps you should check out the notebook Run OGGM with GCM data.

GCM projection runs#

The first step is to conduct the GCM projection runs. We choose two different glaciers by their rgi_ids and conduct the GCM projections. Again if you do not understand all of the following code you should check out the Run OGGM with GCM data notebook.

# Libs
from time import gmtime, strftime
import geopandas as gpd
import shapely.geometry as shpg
import xarray as xr
import numpy as np

# Plotting
import holoviews as hv
import panel as pn

hv.extension('bokeh')
pn.extension()

# Locals
from oggm import utils, workflow, tasks
import oggm.cfg as cfg
from oggm.shop import gcm_climate

Pre-processed directories#

# Initialize OGGM and set up the default run parameters
cfg.initialize(logging_level='WARNING')

# change border around the individual glaciers
cfg.PARAMS['border'] = 40

# Use Multiprocessing
cfg.PARAMS['use_multiprocessing'] = True

# For hydro output
cfg.PARAMS['store_model_geometry'] = True

# Local working directory (where OGGM will write its output)
cfg.PATHS['working_dir'] = utils.gettempdir('OGGM_merge_gcm_runs', reset=True)

# RGI glaciers: Ngojumba and Khumbu
rgi_ids = utils.get_rgi_glacier_entities(['RGI60-15.03473', 'RGI60-15.03733'])

# Go - get the pre-processed glacier directories
gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=5)
2023-03-07 12:44:26: oggm.cfg: Reading default parameters from the OGGM `params.cfg` configuration file.
2023-03-07 12:44:26: oggm.cfg: Multiprocessing switched OFF according to the parameter file.
2023-03-07 12:44:26: oggm.cfg: Multiprocessing: using all available processors (N=2)
2023-03-07 12:44:26: oggm.cfg: Multiprocessing switched ON after user settings.
2023-03-07 12:44:26: oggm.cfg: PARAMS['store_model_geometry'] changed from `False` to `True`.
2023-03-07 12:44:33: oggm.workflow: init_glacier_directories from prepro level 5 on 2 glaciers.
2023-03-07 12:44:33: oggm.workflow: Execute entity tasks [gdir_from_prepro] on 2 glaciers
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py", line 108, in __call__
    res = self._call_internal(func, arg, kwargs)
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py", line 102, in _call_internal
    return call_func(gdir, **kwargs)
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py", line 251, in gdir_from_prepro
    return oggm.GlacierDirectory(entity, from_tar=from_tar)
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_workflow.py", line 2529, in __init__
    self.name = filter_rgi_name(name)
  File "/usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_funcs.py", line 734, in filter_rgi_name
    if name is None or len(name) == 0:
TypeError: object of type 'float' has no len()
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[2], line 20
     17 rgi_ids = utils.get_rgi_glacier_entities(['RGI60-15.03473', 'RGI60-15.03733'])
     19 # Go - get the pre-processed glacier directories
---> 20 gdirs = workflow.init_glacier_directories(rgi_ids, from_prepro_level=5)

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:533, in init_glacier_directories(rgidf, reset, force, from_prepro_level, prepro_border, prepro_rgi_version, prepro_base_url, from_tar, delete_tar)
    531     if cfg.PARAMS['dl_verify']:
    532         utils.get_dl_verify_data('cluster.klima.uni-bremen.de')
--> 533     gdirs = execute_entity_task(gdir_from_prepro, entities,
    534                                 from_prepro_level=from_prepro_level,
    535                                 prepro_border=prepro_border,
    536                                 prepro_rgi_version=prepro_rgi_version,
    537                                 base_url=prepro_base_url)
    538 else:
    539     # We can set the intersects file automatically here
    540     if (cfg.PARAMS['use_intersects'] and
    541             len(cfg.PARAMS['intersects_gdf']) == 0 and
    542             not from_tar):

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:185, in execute_entity_task(task, gdirs, **kwargs)
    183 if cfg.PARAMS['use_multiprocessing'] and ng > 1:
    184     mppool = init_mp_pool(cfg.CONFIG_MODIFIED)
--> 185     out = mppool.map(pc, gdirs, chunksize=1)
    186 else:
    187     if ng > 3:

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
    362 def map(self, func, iterable, chunksize=None):
    363     '''
    364     Apply `func` to each element in `iterable`, collecting the results
    365     in a list that is returned.
    366     '''
--> 367     return self._map_async(func, iterable, mapstar, chunksize).get()

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
    772     return self._value
    773 else:
--> 774     raise self._value

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:125, in worker()
    123 job, i, func, args, kwds = task
    124 try:
--> 125     result = (True, func(*args, **kwds))
    126 except Exception as e:
    127     if wrap_exception and func is not _helper_reraises_exception:

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/multiprocessing/pool.py:48, in mapstar()
     47 def mapstar(args):
---> 48     return list(map(*args))

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:108, in __call__()
    106 for func in self.call_func:
    107     func, kwargs = func
--> 108     res = self._call_internal(func, arg, kwargs)
    109 return res

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:102, in _call_internal()
     99     gdir, gdir_kwargs = gdir
    100     kwargs.update(gdir_kwargs)
--> 102 return call_func(gdir, **kwargs)

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/workflow.py:251, in gdir_from_prepro()
    248 tar_base = utils.get_prepro_gdir(prepro_rgi_version, rid, prepro_border,
    249                                  from_prepro_level, base_url=base_url)
    250 from_tar = os.path.join(tar_base.replace('.tar', ''), rid + '.tar.gz')
--> 251 return oggm.GlacierDirectory(entity, from_tar=from_tar)

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_workflow.py:2529, in __init__()
   2525     raise RuntimeError('RGI Version not supported: '
   2526                        '{}'.format(self.rgi_version))
   2528 # remove spurious characters and trailing blanks
-> 2529 self.name = filter_rgi_name(name)
   2531 # region
   2532 reg_names, subreg_names = parse_rgi_meta(version=self.rgi_version[0])

File /usr/local/pyenv/versions/3.10.10/lib/python3.10/site-packages/oggm/utils/_funcs.py:734, in filter_rgi_name()
    728 def filter_rgi_name(name):
    729     """Remove spurious characters and trailing blanks from RGI glacier name.
    730 
    731     This seems to be unnecessary with RGI V6
    732     """
--> 734     if name is None or len(name) == 0:
    735         return ''
    737     if name[-1] in ['À', 'È', 'è', '\x9c', '3', 'Ð', '°', '¾',
    738                     '\r', '\x93', '¤', '0', '`', '/', 'C', '@',
    739                     'Å', '\x06', '\x10', '^', 'å', ';']:

TypeError: object of type 'float' has no len()

Download and process GCM data#

Here we define all the CMIP5 GCM models and the RCP scenarios we want to use. Here you can get an overview of all available options. As you see in the overview only for two of the three GCMs the rcp60 Scenario is available! We deal with this issue by including a try/except in the code below.

# define the GCM models you want to use
all_GCM = ['CCSM4',
           'CNRM-CM5',
           'CSIRO-Mk3-6-0',
          ]

# define the rcp scenarios to use
all_rcp = ['rcp26',
           'rcp45',
           'rcp60',
           'rcp85']

# download locations for precipitation and temperature
bp = 'https://cluster.klima.uni-bremen.de/~oggm/cmip5-ng/pr/pr_mon_{}_{}_r1i1p1_g025.nc'
bt = 'https://cluster.klima.uni-bremen.de/~oggm/cmip5-ng/tas/tas_mon_{}_{}_r1i1p1_g025.nc'

for GCM in all_GCM:
    for rcp in all_rcp:
        # Download the files
        ft = utils.file_downloader(bt.format(GCM, rcp))
        fp = utils.file_downloader(bp.format(GCM, rcp))

        try:
            # bias correct them
            workflow.execute_entity_task(gcm_climate.process_cmip_data, gdirs, 
                                         filesuffix='_{}_{}'.format(GCM, rcp),  # recognize the climate file for later
                                         fpath_temp=ft,  # temperature projections
                                         fpath_precip=fp,  # precip projections
                                         );
        except ValueError:
            # if a certain scenario is not available for a GCM we land here
            # and we inidcate this by printing a message so the user knows
            # this scenario is missing
            print('No ' + GCM +' run with scenario ' + rcp + ' available!')

Actual projection Runs#

Now we conduct the actual projection Runs. Again handling the case that for a certain (GCM, RCP) combination no data is available with a try/except.

for GCM in all_GCM:
    for rcp in all_rcp:
        rid = '_{}_{}'.format(GCM, rcp)
        try:  # check if GCM, RCP combination exists
            workflow.execute_entity_task(tasks.run_with_hydro, gdirs,
                                         run_task=tasks.run_from_climate_data,
                                         ys=2020,  # star year of our projection runs
                                         climate_filename='gcm_data',  # use gcm_data, not climate_historical
                                         climate_input_filesuffix=rid,  # use the chosen GCM and scenario
                                         init_model_filesuffix='_historical',  # this is important! Start from 2020 glacier
                                         output_filesuffix=rid,  # the filesuffix of the resulting file, so we can find it later
                                         store_monthly_hydro=True
                                        );

        except FileNotFoundError:
            # if a certain scenario is not available for a GCM we land here
            # and we inidcate this by printing a message so the user knows
            # this scenario is missing
            print('No ' + GCM +' run with scenario ' + rcp + ' available!')

Merge datasets together#

Now that we have conducted the projection Runs we can start merging all outputs into one dataset. The individual files can be opened by providing the output_filesuffix of the projection runs as the input_filesuffix. Let us have a look at one resulting dataset!

file_id  = '_{}_{}'.format(all_GCM[0], all_rcp[0])
ds = utils.compile_run_output(gdirs, input_filesuffix=file_id)
ds

We see that as a result, we get a xarray.Dataset. We also see that this dataset has three dimensions time, rgi_id and month_2d. When we look at the Data variables (click to expand) we can see that for each of these variables a certain combination of the dimensions is given. For example, for the volume we see the dimensions (time, rgi_id) which indicates that the underlying data is separated by each year and each glacier. Let’s have a look at the volume in 2030 and 2040 at the glacier ‘RGI60-15.03473’ (Ngojumba):

ds['volume'].loc[{'time': [2030, 2040], 'rgi_id': ['RGI60-15.03473' ]}]

A more complete overview of how to access the data of xarray Dataset can be found here.

This is quite useful, but unfortunately, we have one xarray Dataset for each (GCM, RCP) pair. For an easier calculation and comparison between different GCMs and RCPs, it would be desirable to combine all individual Datasets into one Dataset with the additional dimensions (GCM, RCP). Fortunately, xarray provides different functions to merge Datasets (here you can find an Overview).

In our case, we want to merge all individual datasets with two additional coordinates GCM and RCP. Additional coordinates can be added with the comment .coords[], also you can include a description with .coords[].attrs['description']. For example, let us add GCM as a new coordinate:

ds.coords['GCM'] = all_GCM[0]
ds.coords['GCM'].attrs['description'] = 'Global Circulation Model'
ds

You can see that now GCM is added as Coordinate, but when you look at the Data variables you see that this new Coordinate is not used by any of them. So we must add this new Coordinate to the Data variables using the .expand_dims() comment:

ds = ds.expand_dims('GCM')
ds

Now we see that GCM was added to the Dimensions and all Data variables now use this new coordinate. The same can be done with RCP. As a standalone, this is not very useful. But if we add these coordinates to all datasets, it becomes quite handy for merging. Therefore we now open all datasets and add to each one the two coordinates GCM and RCP:

Add new coordinates#

ds_all = []  # in this array all datasets going to be stored with additional coordinates GCM and RCP
creation_date = strftime("%Y-%m-%d %H:%M:%S", gmtime())  # here add the current time for info
for GCM in all_GCM:  # loop through all GCMs
    for RCP in all_rcp:  # loop through all RCPs
        try:  # check if GCM, RCP combination exists
            rid = '_{}_{}'.format(GCM, RCP)  # put together the same filesuffix which was used during the projection runs

            ds_tmp = utils.compile_run_output(gdirs, input_filesuffix=rid)  # open one model run

            ds_tmp.coords['GCM'] = GCM  # add GCM as a coordinate
            ds_tmp.coords['GCM'].attrs['description'] = 'used Global circulation Model'  # add a description for GCM
            ds_tmp = ds_tmp.expand_dims("GCM")  # add GCM as a dimension to all Data variables

            ds_tmp.coords['RCP'] = RCP  # add RCP as a coordinate
            ds_tmp.coords['RCP'].attrs['description'] = 'used Representative Concentration Pathway'  # add a description for RCP
            ds_tmp = ds_tmp.expand_dims("RCP")  # add RCP as a dimension to all Data variables

            ds_tmp.attrs['creation_date'] = creation_date  # also add todays date for info
            ds_all.append(ds_tmp)  # add the dataset with extra coordinates to our final ds_all array

        except RuntimeError as err:  # here we land if an error occured
            if str(err) == 'Found no valid glaciers!':  # This is the error message if the GCM, RCP combination does not exist
                print(f'No data for GCM {GCM} with RCP {RCP} found!')  # print a descriptive message
            else:
                raise RuntimeError(err)  # if an other error occured we just raise it

Actual merging#

ds_all now contains all outputs with the additionally added coordinates. Now we can merge them with the very convenient xarray function xr.combine_by_coords():

ds_merged = xr.combine_by_coords(ds_all,
                                 fill_value=np.nan)  # define how the missing GCM, RCP combinations should be filled
ds_merged

Great! You see that the result is one Dataset, but we still can identify the individual runs with the two additionally added Coordinates GCM and RCP. If you want to save this new Dataset for later analysis you can use for example .to_netcdf() (here you can find an Overview of xarray writing options):

ds_merged.to_netcdf('merged_GCM_projections.nc')

And to open it again later and automatically close the file after reading you can use:

with xr.open_dataset('merged_GCM_projections.nc') as ds:
    ds_merged = ds

Calculate regional values#

Now we have our merged dataset and we can start with the analysis. To do this xarray gives us a ton of different mathematical functions which can be used directly with our dataset. Here you can find a great overview of what is possible.

As an example, we want to calculate the mean and std of the total regional glacier volume for different scenarios of our merged dataset. Here our added coordinates come in very handy.

Let us start by calculating the total regional glacier volume using .sum():

ds_total_volume = ds_merged['volume'].sum(dim='rgi_id',  # over which dimension the sum should be taken, here we want to sum up over all glacier ids
                                          skipna=True,  # ignore nan values
                                          keep_attrs=True)  # keep the variable descriptions
ds_total_volume

For this calculation, we told xarray over which dimension we want to sum up dim='rgi_id' (all glaciers ), that missing values should be ignored skipna=True (as some GCM/RCP combinations are not available) and that the attributes (the description of the variables) should be preserved keep_attrs=True. You can see that the result has three dimensions left (GCM, time, RCP) and the dimension rgi_id disappeared as we have summed all values up along this dimension.

Likewise, we can now calculate the mean of all GCM runs:

ds_total_volume_mean = ds_total_volume.mean(dim='GCM',  # over which dimension the mean should be calculated
                                            skipna=True,  # ignore nan values
                                            keep_attrs=True)  # keep all variable descriptions
ds_total_volume_mean

And you see that now we are left with two dimensions (RCP, time). This means we now have calculated the mean total volume for all different RCP scenarios and along the projection period. The standard deviation (std) can be also calculated in the same way as the mean.

If you want to calculate the total mean and std values for different variables it is convenient to define a small function:

def calculate_total_mean_and_std(ds, variable):
    mean = ds[variable].sum(dim='rgi_id',  # first sum up over all glaciers
                            skipna=True,
                            keep_attrs=True,
                           ).mean(dim='GCM',  # afterwards calculate the mean of all GCMs
                                  skipna=True,
                                  keep_attrs=True,
                                 )
    std = ds[variable].sum(dim='rgi_id',  # first sum up over all glaciers 
                           skipna=True,
                           keep_attrs=True,
                          ).std(dim='GCM',  # afterwards calculate the std of all GCMs
                                skipna=True,
                                keep_attrs=True,
                               )
    return mean, std

This function takes a dataset ds and a variable name variable for which the mean and the std should be calculated. This function will come in handy in the next section dealing with the visualisation of the calculated values.

Visualisation#

Now we can calculate the regional values of our projection runs it is time to visualize them. In the following, we show one way to visualize the data using tools from the HoloViz framework (namely HoloViews and Panel). For an introduction to HoloViz, you can have a look at the Small overview of HoloViz capability of data exploration notebook.

First, we create a single curve for a single RCP scenario:

# calculate mean and std with the previously defined function
total_volume_mean, total_volume_std = calculate_total_mean_and_std(ds_merged, 'volume')

# select only on RCP scenario
total_volume_mean_rcp85 = total_volume_mean.loc[{'RCP': 'rcp85'}]

# plot a curve
x = total_volume_mean_rcp85.coords['time']
y = total_volume_mean_rcp85
hv.Curve((x, y),
         kdims=x.name,
         vdims=y.name,
        ).opts(xlabel=x.attrs['description'],
               ylabel=f"{y.attrs['description']} in {y.attrs['unit']}")

We used a HoloViews Curve and defined kdims and vdims. This definition is not so important for a single plot but if we start to compose different plots all axis of the different plots with the same kdims or vdims are connected. This means for example whenever you zoom in on one plot all other plots also zoom in. Further, you can see that we have defined xlabel and ylabel using the variable description of the dataset, therefore it was useful to keep_attrs=True when we calculated the total values (see above).

As a next step we can add the std as a shaded area (HoloViews Area) and again define the whole plot as a single curve in a new function:

Create single mean curve with std area#

def get_single_curve(mean, std, rcp):
    
    mean_use = mean.loc[{'RCP': rcp}]  # read out the mean of the RCP to plot 
    std_use = std.loc[{'RCP': rcp}]  # read out the std of the RCP to plot
    time = mean.coords['time']  # get the time for the x axis
    
    return (hv.Area((time,  # plot std as an area
                     mean_use + std_use,  # upper boundary of the area
                     mean_use - std_use),  # lower boundary of the area
                    vdims=[mean_use.name, 'y2'],  # vdims for both boundaries
                    kdims='time',
                    label=rcp,
                   ).opts(alpha=0.2,
                          line_color=None,
                         ) *
            hv.Curve((time, mean_use),
                     vdims=mean_use.name,
                     kdims='time',
                     label=rcp,
                    )
           ).opts(width=400,  # width of the total plot
                  height=400,  # height of the total plot
                  xlabel=time.attrs['description'],
                  ylabel=f"{mean_use.attrs['description']} in {mean_use.attrs['unit']}",
                 )

Overlay different scenarios#

The single curves of the different scenarios we can put together in a HoloViews HoloMap, which is comparable to a dictionary. This further can be easily used to create a nice overlay of all curves.

def overlay_scenarios(ds, variable):
    hmap = hv.HoloMap(kdims='Scenarios')   # create a HoloMap
    mean, std = calculate_total_mean_and_std(ds, variable)  # calculate mean and std for all RCPs using our previously defined function
    for rcp in all_rcp:
        hmap[rcp] = get_single_curve(mean, std, rcp)  # add a curve for each RCP to the HoloMap, using the RCP as a key (when you compare it do a dictonary)
    return hmap.overlay().opts(title=variable)  # create an overlay of all curves

Show different variables in one figure and save as html file#

Now that we have defined how our plots should look like we can compose different variables we want to explore in one plot. To do so we can use Panel Column and Row for customization of the plot layout:

all_plots = pn.Column(pn.Row(overlay_scenarios(ds_merged, 'volume'),
                             overlay_scenarios(ds_merged, 'area'),
                            ),
                      overlay_scenarios(ds_merged, 'melt_on_glacier'))
all_plots

When you start exploring the plots by dragging them around or zooming in (using the tools of the toolboxes in the upper right corner of each plot) you see that the x-axes are connected. This makes it very convenient for example to look at different periods for all variables interactively.

You also can open the plots in a new browser tab by using

all_plots.show()

or save it as a html file for sharing

plots_to_save = pn.panel(all_plots)
plots_to_save.save('GCM_runs.html', embed=True)

What’s next?#