Basics: GridRad Severe ====================== This demo/tutorial illustrates the basics of THUNER by tracking and visualizing mesoscale convective system (MCS) objects in `GridRad Severe `__ data. See `Short et al. (2023) `__ for methodological details. By the end of the notebook, you should be able to generate the animation below. .. figure:: https://raw.githubusercontent.com/THUNER-project/THUNER/refs/heads/main/gallery/mcs_gridrad_20100120.gif :alt: Animation depicting tracked MCSs. Animation depicting tracked MCSs. Setup ----- First, import the requisite modules. .. code-block:: python3 :linenos: """GridRad Severe demo/test.""" %load_ext autoreload %autoreload 2 import shutil import yaml import numpy as np import xarray as xr import thuner.data as data import thuner.option as option import thuner.analyze as analyze import thuner.parallel as parallel import thuner.visualize as visualize import thuner.attribute as attribute import thuner.default as default import thuner.config as config import thuner.utils as utils .. code-block:: text Welcome to the Thunderstorm Event Reconnaissance (THUNER) package v0.0.16! This package is still in testing and development. Please visit github.com/THUNER-project/THUNER for examples, and to report issues or contribute. THUNER is a flexible toolkit for performing multi-feature detection, tracking, tagging and analysis of events within meteorological datasets. The intended application is to convective weather events. For examples and instructions, see https://github.com/THUNER-project/THUNER and https://thuner.readthedocs.io/en/latest/. If you use THUNER in your research, consider citing the following papers; Short et al. (2023), doi: 10.1175/MWR-D-22-0146.1 Raut et al. (2021), doi: 10.1175/JAMC-D-20-0119.1 Fridlind et al. (2019), doi: 10.5194/amt-12-2979-2019 ... Next, specify the folders where THUNER outputs will be saved. Note that THUNER stores a fallback output directory in a config file, accessible via the functions :func:`thuner.config.set_outputs_directory` and :func:`thuner.config.get_outputs_directory`. By default, this fallback directory is ``Path.home() / THUNER_output``. .. code-block:: python3 :linenos: # Set a flag for whether or not to remove existing output directories remove_existing_outputs = False # Parent directory for saving outputs base_local = config.get_outputs_directory() output_parent = base_local / f"runs/gridrad/gridrad_demo" options_directory = output_parent / "options" visualize_directory = output_parent / "visualize" # Delete the output directory for the run if it already exists if output_parent.exists() & remove_existing_outputs: shutil.rmtree(output_parent) Next download the demo data for the tutorial, if you haven’t already. .. code-block:: python3 :linenos: # Download the demo data remote_directory = "s3://thuner-storage/THUNER_output/input_data/raw/d81006" data.get_demo_data(base_local, remote_directory) remote_directory = "s3://thuner-storage/THUNER_output/input_data/raw/" remote_directory += "era5_monthly_39N_102W_27N_89W" data.get_demo_data(base_local, remote_directory) .. code-block:: text 2025-07-09 17:02:45,218 - thuner.data._utils - INFO - Syncing directory /home/ewan/THUNER_output/input_data/raw/d81006. Please wait. 2025-07-09 17:02:46,368 - thuner.data._utils - INFO - Syncing directory /home/ewan/THUNER_output/input_data/raw/era5_monthly_39N_102W_27N_89W. Please wait. Options ------- We now specify the options for the THUNER run. Options classes in THUNER are built on the :class:`pydantic.BaseModel`, which provides a simple way to describe and validate options. Options objects are initialized using keyword, value pairs. Below we specify the options for a GridRad Severe dataset. .. code-block:: python3 :linenos: # Uncomment the line below to download the demo data if not already present # data.get_demo_data() event_directories = data.gridrad.get_event_directories(year=2010, base_local=base_local) event_directory = event_directories[0] # Take the first event from 2010 for the demo # Get the start and end times of the event, and the date of the event start start, end, event_start = data.gridrad.get_event_times(event_directory) times_dict = {"start": start, "end": end} gridrad_dict = {"event_start": event_start} gridrad_options = data.gridrad.GridRadSevereOptions(**times_dict, **gridrad_dict) .. code-block:: text 2025-07-09 17:02:48,733 - thuner.data.gridrad - INFO - Generating GridRad filepaths. Options instances can be examined using the ``model_dump`` method, which converts the instance to a dictionary. .. code-block:: python3 :linenos: gridrad_options.model_dump() .. code-block:: text {'type': 'GridRadSevereOptions', 'name': 'gridrad', 'start': '2010-01-20T18:00:00', 'end': '2010-01-21T03:30:00', 'fields': ['reflectivity'], 'parent_remote': 'https://data.rda.ucar.edu', 'parent_local': '/home/ewan/THUNER_output/input_data/raw', 'converted_options': {'type': 'ConvertedOptions', 'save': False, 'load': False, 'parent_converted': '/home/ewan/THUNER_output/input_data/converted'}, 'filepaths': ['/home/ewan/THUNER_output/input_data/raw/d841006/volumes/2010/20100120/nexrad_3d_v4_2_20100120T180000Z.nc', '/home/ewan/THUNER_output/input_data/raw/d841006/volumes/2010/20100120/nexrad_3d_v4_2_20100120T181000Z.nc', '/home/ewan/THUNER_output/input_data/raw/d841006/volumes/2010/20100120/nexrad_3d_v4_2_20100120T182000Z.nc', '/home/ewan/THUNER_output/input_data/raw/d841006/volumes/2010/20100120/nexrad_3d_v4_2_20100120T183000Z.nc', ... The ``model_summary()`` method of an options instance returns a string summary of the fields in the model. Note the ``parent_local`` field, which provides the parent directory on local disk containing the dataset. Analogously, ``parent_remote`` specifies the remote location of the data; which is useful when one wants to access data from a remote location during the tracking run. Note also the ``filepaths`` field, which provides a list of the dataset’s absolute filepaths. The idea is that for standard datasets, ``filepaths`` can be populated automatically by looking in the ``parent_local`` directory, assuming the same sub-directory structure as in the dataset’s original location. If the dataset is nonstandard, the ``filepaths`` list can be explicitly provided by the user. For datasets that do not yet have convenience classes in THUNER, the :class:`thuner.utils.BaseDatasetOptions` class can be used. Note also the ``use`` field, which tells THUNER whether the dataset will be used to ``track`` or ``tag`` objects. Tracking in THUNER means detecting objects in a dataset, and matching those objects across time. Tagging means attaching attributes from potentially different datasets to detected objects. .. code-block:: python3 :linenos: print(gridrad_options.model_summary()) .. code-block:: text Field Name: Type, Description ------------------------------------- type: typing.Literal['GridRadSevereOptions'], None name: , Name of the dataset. start: str | numpy.datetime64, Tracking start time. end: str | numpy.datetime64, Tracking end time. fields: list[str] | None, List of dataset fields, i.e. variables, to use. Fields should be given using their thuner, i.e. CF-Conventions, names, e.g. 'reflectivity'. parent_remote: str | None, Parent directory of the dataset on remote storage. parent_local: str | pathlib.Path | None, Parent directory of the dataset on local storage. converted_options: , Options for saving and loading converted data. filepaths: list[str] | dict, List of filepaths for the dataset. attempt_download: , Whether to attempt to download the data. deque_length: , Number of current/previous grids from this dataset to keep in memory. Most tracking algorithms require a 'next' grid, 'current' grid, and at least two previous grids. use: typing.Literal['track', 'tag', 'both'], Whether this dataset will be used for tagging, tracking or both. start_buffer: , Minutes before interval start time to include. Useful for tagging when one wants to record pre-storm ambient profiles. ... We will also create dataset options for ERA5 single-level and pressure-level data, which we use for tagging the storms detected in the GridRad Severe dataset with other attributes, e.g. ambient winds and temperature. .. code-block:: python3 :linenos: era5_dict = {"latitude_range": [27, 39], "longitude_range": [-102, -89]} era5_pl_options = data.era5.ERA5Options(**times_dict, **era5_dict) era5_dict.update({"data_format": "single-levels"}) era5_sl_options = data.era5.ERA5Options(**times_dict, **era5_dict) .. code-block:: text 2025-07-09 17:02:53,390 - thuner.data.era5 - INFO - Generating era5 filepaths. 2025-07-09 17:02:53,392 - thuner.data.era5 - INFO - Generating era5 filepaths. All the dataset options are grouped into a single :class:`thuner.option.data.DataOptions` object, which is passed to the THUNER tracking function. We also save these options as a YAML file. .. code-block:: python3 :linenos: datasets = [gridrad_options, era5_pl_options, era5_sl_options] data_options = option.data.DataOptions(datasets=datasets) data_options.to_yaml(options_directory / "data.yml") Now create and save options describing the grid. If ``regrid`` is ``False`` and grid properties like ``altitude_spacing`` or ``geographic_spacing`` are set to ``None``, THUNER will attempt to infer these from the tracking dataset. .. code-block:: python3 :linenos: # Create and save the grid_options dictionary kwargs = {"name": "geographic", "regrid": False, "altitude_spacing": None} kwargs.update({"geographic_spacing": None}) grid_options = option.grid.GridOptions(**kwargs) grid_options.to_yaml(options_directory / "grid.yml") .. code-block:: text 2025-07-09 17:02:55,334 - thuner.option.grid - WARNING - altitude_spacing not specified. Will attempt to infer from input. 2025-07-09 17:02:55,335 - thuner.option.grid - WARNING - shape not specified. Will attempt to infer from input. Finally, we create options describing how the tracking should be performed. In multi-feature tracking, some objects, like mesoscale convective systems (MCSs), can be defined in terms of others, like convective and stratiform echoes. THUNER’s approach is to first specify object options seperately for each object type, e.g. convective echoes, stratiform echoes, mesoscale convective systems, and so forth. Object options are specified using ``pydantic`` models which inherit from :class:`thuner.option.track.BaseObjectOptions`. Related objects are then grouped together into :class:`thuner.option.track.LevelOptions` models. The final :class:`thuner.option.track.TrackOptions` model, which is passed to the tracking function, then contains a list of :class:`thuner.option.track.LevelOptions` models. The idea is that “lower level” objects, can comprise the building blocks of “higher level” objects, with THUNER processing the former before the latter. In this tutorial, level 0 objects are the convective, middle and stratiform echo regions, and level 1 objects are mesoscale convective systems defined by grouping the level 0 objects. Because :class:`thuner.option.track.TrackOptions` models can be complex to construct, a function for creating a default :class:`thuner.option.track.TrackOptions` model matching the approach of `Short et al. (2023) `__ is defined in the module :mod:`thuner.default`. .. code-block:: python3 :linenos: # Create the track_options dictionary track_options = default.track(dataset_name="gridrad") # Show the options for the level 0 objects print(f"Level 0 objects list: {track_options.levels[0].object_names}") # Show the options for the level 1 objects print(f"Level 1 objects list: {track_options.levels[1].object_names}") .. code-block:: text Level 0 objects list: ['convective', 'middle', 'anvil'] Level 1 objects list: ['mcs'] Note a core component of the options for each object is the ``atributes`` field, which describes how object attributes like position, velocity and area, are to be retrieved and stored. In THUNER, the code for collecting object attributes is seperated out from the core tracking code, allowing different attributes for different objects to be swapped in and out as needed. Individual attributes are described by the :class:`thuner.option.attribute.Attribute` model, where each :class:`thuner.option.attribute.Attribute` will form a column of an output CSV file. Sometimes multiple :class:`thuner.option.attribute.Attribute` are grouped into a :class:`thuner.option.attribute.AttributeGroup` model, in which all attributes in the group are retrieved at once using the same method. For instance, attributes based on ellipse fitting, like major and minor axis, eccentricity and orientation, form a :class:`thuner.option.attribute.AttributeGroup`. Note however that each member of the group will still form a seperate column in the output CSV file. Finally, collections of attributes and attribute groups are organized into :class:`thuner.option.attribute.AttributeType` models. Each attribute type corresponds to related attributes that will be stored in a single CSV file. This makes the number of columns in each file much smaller, and THUNER outputs easier to manage and inspect directly. To illustrate, below we print the MCS object’s “core” attribute type options. .. code-block:: python3 :linenos: # Show the options for mcs coordinate attributes mcs_attributes = track_options.object_by_name("mcs").attributes core_mcs_attributes = mcs_attributes.attribute_type_by_name("core") core_mcs_attributes.model_dump() .. code-block:: text {'type': 'AttributeType', 'name': 'core', 'description': 'Core attributes of tracked object, e.g. position and velocities.', 'attributes': [{'type': 'Attribute', 'name': 'time', 'retrieval': {'type': 'Retrieval', 'function': , 'keyword_arguments': {}}, 'data_type': numpy.datetime64, 'precision': None, 'description': 'Time taken from the tracking process.', 'units': 'yyyy-mm-dd hh:mm:ss'}, {'type': 'Attribute', 'name': 'universal_id', 'retrieval': {'type': 'Retrieval', ... The default :class:`thuner.option.track.TrackOptions` use “local” and “global” cross-correlations to measure object velocities, as described by `Raut et al. (2021) `__ and `Short et al. (2023) `__. For GridRad severe, we modify this approach slightly so that “global” cross-correlations are calculated using boxes encompassing each object, with a margin of 70 km around the object. Note that pydantic models are automatically validated when first created. Because we are changing the model instance, we should revalidate the object options model to check we haven’t broken anything. .. code-block:: python3 :linenos: track_options.levels[1].objects[0].tracking.unique_global_flow = False track_options.levels[1].objects[0].tracking.global_flow_margin = 70 track_options.levels[1].objects[0].revalidate() track_options.to_yaml(options_directory / "track.yml") Users can also specify visualization options for generating figures during a tracking run. Uncomment the line below to generate figures that visualize the matching algorithm - naturally this makes a tracking run much slower. .. code-block:: python3 :linenos: visualize_options = None # visualize_options = default.runtime(visualize_directory=visualize_directory) # visualize_options.to_yaml(options_directory / "visualize.yml") Tracking -------- To perform the tracking run, we need an iterable of the times at which objects will be detected and tracked. The convenience function :func:`thuner.utils.generate_times` creates a generator from the dataset options for the tracking dataset. We can then pass this generator, and the various options, to the tracking function :func:`thuner.parallel.track`. During the tracking run, outputs will be created in the ``output_parent`` directory, within the subfolders ``interval_0``, ``interval_1`` etc, which represent subintervals of the time period being tracked. At the end of the run, these outputs are stiched together. .. code-block:: python3 :linenos: times = utils.generate_times(data_options.dataset_by_name("gridrad").filepaths) args = [times, data_options, grid_options, track_options, visualize_options] num_processes = 4 # If visualize_options is not None, num_processes must be 1 kwargs = {"output_directory": output_parent, "num_processes": num_processes} # In parallel tracking runs, we need to tell the tracking function which dataset to use # for tracking, so the subinterval data_options can be generated correctly kwargs.update({"dataset_name": "gridrad"}) parallel.track(*args, **kwargs) .. code-block:: text 2025-07-09 17:03:08,621 - thuner.parallel - INFO - Beginning parallel tracking with 4 processes. 2025-07-09 17:03:15,233 - thuner.track.track - INFO - Beginning thuner tracking. Saving output to /home/ewan/THUNER_output/runs/gridrad/gridrad_demo/interval_0. 2025-07-09 17:03:15,538 - thuner.track.track - INFO - Beginning thuner tracking. Saving output to /home/ewan/THUNER_output/runs/gridrad/gridrad_demo/interval_1. 2025-07-09 17:03:15,623 - thuner.track.track - INFO - Beginning thuner tracking. Saving output to /home/ewan/THUNER_output/runs/gridrad/gridrad_demo/interval_2. 2025-07-09 17:03:15,732 - thuner.track.track - INFO - Beginning thuner tracking. Saving output to /home/ewan/THUNER_output/runs/gridrad/gridrad_demo/interval_3. 2025-07-09 17:03:16,005 - thuner.track.track - INFO - Processing 2010-01-20T18:00:00. 2025-07-09 17:03:16,006 - thuner.utils - INFO - Updating gridrad input record for 2010-01-20T18:00:00. 2025-07-09 17:03:16,198 - thuner.track.track - INFO - Processing 2010-01-20T20:20:00. 2025-07-09 17:03:16,199 - thuner.utils - INFO - Updating gridrad input record for 2010-01-20T20:20:00. 2025-07-09 17:03:16,460 - thuner.track.track - INFO - Processing 2010-01-20T22:40:00. 2025-07-09 17:03:16,462 - thuner.utils - INFO - Updating gridrad input record for 2010-01-20T22:40:00. 2025-07-09 17:03:16,817 - thuner.track.track - INFO - Processing 2010-01-21T01:00:00. 2025-07-09 17:03:16,819 - thuner.utils - INFO - Updating gridrad input record for 2010-01-21T01:00:00. 2025-07-09 17:03:22,065 - thuner.utils - INFO - Grid options not set. Inferring from dataset. 2025-07-09 17:03:22,069 - thuner.utils - WARNING - Altitude spacing not uniform. ... The outputs of the tracking run are saved in the ``output_parent`` directory. The options for the run are saved in human-readable YAML files within the ``options`` directory. For reproducibility, Python objects can be rebuilt from these YAML files by reading the YAML, and passing this to the appropriate ``pydantic`` model. .. code-block:: python3 :linenos: with open(options_directory / "data.yml", "r") as f: data_options = option.data.DataOptions(**yaml.safe_load(f)) # Note yaml.safe_load(f) is a dictionary. # Prepending with ** unpacks the dictionary into keyword/argument pairs. data_options.model_dump() The convenience function ``thuner.analyze.utils.read_options`` reloads all options in the above way, storing the different options in a dictionary. .. code-block:: python3 :linenos: all_options = analyze.utils.read_options(output_parent) all_options["data"].model_dump() Object attributes, e.g. MCS position, area and velocity, are saved as CSV files in nested subfolders. Attribute metadata is recorded in YAML files. One can then load attribute data using ``pandas.read_csv``. One can also create an appropriately formatted :class:`pandas.DataFrame` using the convenience function :func:`thuner.attribute.utils.read_attribute_csv`. .. code-block:: python3 :linenos: core = attribute.utils.read_attribute_csv(output_parent / "attributes/mcs/core.csv") print(core.head(20).to_string()) Records of the filepaths corresponding to each time of the tracking run are saved in the ``records`` folder. These records are useful for generating figures after a tracking run. .. code-block:: python3 :linenos: filepath = output_parent / "records/filepaths/gridrad.csv" records = attribute.utils.read_attribute_csv(filepath) print(records.head(20).to_string()) Object masks are saved as ZARR files, which can be read using :mod:`xarray`. .. code-block:: python3 :linenos: xr.open_dataset(output_parent / "masks/mcs.zarr").info() Analysis and Visualization -------------------------- We can then perform analysis on the tracking run outputs. Below we perform the MCS classifications discussed by `Short et al. (2023) `__. .. code-block:: python3 :linenos: analysis_options = analyze.mcs.AnalysisOptions() analyze.mcs.process_velocities(output_parent, profile_dataset="era5_pl") analyze.mcs.quality_control(output_parent, analysis_options) analyze.mcs.classify_all(output_parent, analysis_options) filepath = output_parent / "analysis/classification.csv" classifications = attribute.utils.read_attribute_csv(filepath) print("\n" + classifications.head(20).to_string()) We can also generate figures and animations from the output. Below we visualize the convective and stratiform regions of each MCS, displaying each system’s velocity and stratiform-offset, and the boundaries of the radar mosaic domain, as discussed by `Short et al. (2023) `__. By default, figures and animations are saved in the ``output_parent`` directory in the ``visualize`` folder. The code below should generate an animation ``mcs_gridrad_20100120.gif``, matching the animation provided at the start of the notebook. .. code-block:: python3 :linenos: name = f"mcs_gridrad_{event_start.replace('-', '')}" style = "presentation" attribute_handlers = default.grouped_attribute_handlers(output_parent, style) kwargs = {"name": name, "object_name": "mcs", "style": style} kwargs.update({"attribute_handlers": attribute_handlers, "dt": 7200}) figure_options = option.visualize.GroupedHorizontalAttributeOptions(**kwargs) args = [output_parent, start, end, figure_options, "gridrad"] args_dict = {"parallel_figure": True, "by_date": False, "num_processes": 4} visualize.attribute.series(*args, **args_dict) Relabelling ----------- Sometimes we need to define new objects based on the split-merge history of the objects tracked during a THUNER run.