4. Aggregation

Aggregation is an option that can be applied to MET stat output (in the appropriate format) to calculate aggregation statistics and confidence intervals. Input data must first be reformatted using the METdataio METreformat module to label all the columns with the corresponding statistic name specified in the MET User’s Guide for point-stat, grid-stat, or ensemble-stat .stat output data.

4.1. Python Requirements

The third-party Python packages and the corresponding version numbers are found in the requirements.txt and nco_requirements.txt files:

For Non-NCO systems:

For NCO systems:

4.2. Retrieve Code

Refer to the Installation Guide for instructions.

4.3. Retrieve Sample Data

The sample data used for this example is located in the $METCALCPY_BASE/test directory, where $METCALCPY_BASE is the full path to the location of the METcalcpy source code (e.g. /User/my_dir/METcalcpy). The example data file used for this example is rrfs_ecnt_for_agg.data. This data was reformatted from the MET .stat output using the METdataio METreformat module. The reformatting step labels the columns with the corresponding statistics, based on the MET tool (point-stat, grid-stat, or ensemble-stat). The ECNT linetype of the MET grid-stat output has been reformatted to include the statistics names for all ECNT specific columns.

Input data must be in this format prior to using the aggregation module, agg_stat.py.

The example data can be copied to a working directory, or left in this directory. The location of the data will be specified in the YAML configuration file.

Please refer to the METdataio User’s Guide for instructions for reformatting MET .stat files : https://metdataio.readthedocs.io/en/develop/Users_Guide/reformat_stat_data.html

4.4. Aggregation

The agg_stat module, agg_stat.py to is used to calculate aggregated statistics and confidence intervals. This module can be run as a script at the command-line, or imported in another Python script.

A required YAML configuration file, config_agg_stat.yaml file is used to define the location of input data and the name and location of the output file.

The agg_stat module support the ECNT linetype that are output from the MET ensemble-stat tool

The input to the agg_stat module must have the appropriate format. The ECNT linetype must first be reformatted via the METdataio METreformat module by following the instructions under the Reformatting for computing aggregation statistics with METcalcpy agg_stat header.

4.4.1. Modify the YAML configuration file

The config_agg_stat.yaml is required to perform aggregation statistics calculations. This configuration file is located in the $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config directory. The $METCALCPY_BASE is the directory where the METcalcpy source code is saved (e.g. /Users/my_acct/METcalcpy). Change directory to $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config and modify the config_agg_stat.yaml file.

  1. Specify the input and output files

agg_stat_input: /path-to/test/data/rrfs_ecnt_for_agg.data
agg_stat_output: /path-to/ecnt_aggregated.data

Replace the path-to in the above two settings to the location where the input data was stored (either in a working directory or the $METCALCPY_BASE/test directory). NOTE: Use the full path to the input and output directories (no environment variables).

  1. Specify the meteorological and the stat variables:

fcst_var_val_1:
  TMP:
    - ECNT_RMSE
    - ECNT_SPREAD_PLUS_OERR
  1. Specify the selected models/members:

series_val_1:
  model:
   - RRFS_GEFS_GF.SPP.SPPT
  1. Specify the selected statistics to be aggregated, in this case, the RMSE and SPREAD_PLUS_OERR statistics from the ECNT ensemble-stat tool output are to be calculated. The aggregated statistics are named ECNT_RMSE and ECNT_SPREAD_PLUS_OERR (append original statistic name with the linetype):

    list_stat_1:
    • ECNT_RMSE

    • ECNT_SPREAD_PLUS_OERR

The full config_agg_stat.yaml file is shown below:

agg_stat_input: /path-to-METcalcpy-base/test/data/rrfs_ecnt_for_agg.data
agg_stat_output: /path-to/ecnt_aggregated.data
alpha: 0.05
append_to_file: null
circular_block_bootstrap: True
derived_series_1: []
derived_series_2: []
event_equal:  False
fcst_var_val_1:
  TMP:
  - ECNT_RMSE
  - ECNT_SPREAD_PLUS_OERR
fcst_var_val_2: {}
indy_vals:
- '30000'
- '60000'
- '90000'
- '120000'
- '150000'
- '160000'
- '170000'
- '180000'
- '200000'
- '240000'
- '270000'
- '300000'
- '330000'
- '340000'
- '360000'
indy_var: fcst_lead
line_type: ecnt
list_stat_1:
 - ECNT_RMSE
 - ECNT_SPREAD_PLUS_OERR
list_stat_2: []
method: perc
num_iterations: 1
num_threads: -1
random_seed: null
series_val_1:
  model:
  - RRFS_GEFS_GF.SPP.SPPT
series_val_2: {}

NOTE: Use full directory paths when specifying the location of the input file and output file.

4.4.2. Set the Environment and PYTHONPATH

bash shell:

export METCALCPY_BASE=/path-to-METcalcpy

csh shell:

setenv METCALCPY_BASE /path-to-METcalcpy

where path-to-METcalcpy is the full path to where the METcalcpy source code is located (e.g. /User/my_dir/METcalcpy)

bash shell:

export PYTHONPATH=$METCALCPY_BASE/:$METCALCPY_BASE/metcalcpy

csh shell

setenv PYTHONPATH $METCALCPY_BASE/:$METCALCPY_BASE/metcalcpy

Where $METCALCPY_BASE is the full path to where the METcalcpy code resides (e.g. /User/ my_dir/METcalcpy).

4.4.3. Run the python script:

The following are instructions for performing aggregation from the command-line:

python $METCALCPY_BASE/metcalcpy/agg_stat.py $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config/config_stat_agg.yaml

This will generate the file ecnt_aggregated.data (from the agg_stat_output setting) which now contains the aggregated statistics data.

Additionally, the agg_stat.py module can be invoked by another script or module by importing the package:

from metcalcpy.agg_stat import AggStat

AGG_STAT = AggStat(PARAMS)
AGG_STAT.calculate_stats_and_ci()

where PARAMS is a dictionary containing the parameters indicating the location of input and output data. The structure is similar to the original Rscript template from which this Python implementation was derived.

NOTE: Remember to use the same PYTHONPATH defined above to ensure that the agg_stat module is found by the Python import process.