Start Training your Model

Start Training your Model#

After all files are preprocessed, we can start with the training. In the following sections, the global settings and the neural network settings are explained more in detail. If you want to train completely fresh, you can just adapt one of the example config files. The creation of the model folder is taken care of when we start the actual training!

Global Settings#

The global settings are the base settings needed for every training. Here we define general things like the model name and which preprocessing config file was used to produce the train dataset.

# Set modelname and path to Pflow preprocessing config file
model_name: dips_lr_0.001_bs_15000_epoch_200_nTrainJets_Full
preprocess_config: examples/preprocessing/PFlow-Preprocessing.yaml

# Add here a pretrained model to start with.
# Leave empty for a fresh start
model_file:

# Add training file
train_file: <path_place_holder>/PFlow-hybrid-preprocessed_shuffled.h5

# Defining templates for the variable cuts
.variable_cuts_ttbar: &variable_cuts_ttbar
    variable_cuts:
        - pt_btagJes:
              operator: "<="
              condition: 2.5e5

.variable_cuts_zpext: &variable_cuts_zpext
    variable_cuts:
        - pt_btagJes:
              operator: ">"
              condition: 2.5e5

# Add validation files
validation_files:
    ttbar_r21_val:
        path: <path_place_holder>/inclusive_validation_ttbar_PFlow.h5
        label: "$t\\bar{t}$ Release 21"
        <<: *variable_cuts_ttbar

    zprime_r21_val:
        path: <path_place_holder>/inclusive_validation_zprime_PFlow.h5
        label: "$Z'$ Release 21"
        <<: *variable_cuts_zpext

test_files:
    ttbar_r21:
        path: <path_place_holder>/inclusive_testing_ttbar_PFlow.h5
        <<: *variable_cuts_ttbar

    ttbar_r22:
        path: <path_place_holder>/inclusive_testing_ttbar_PFlow_r22.h5
        <<: *variable_cuts_ttbar

    zpext_r21:
        path: <path_place_holder>/inclusive_testing_zprime_PFlow.h5
        <<: *variable_cuts_zpext

    zpext_r22:
        path: <path_place_holder>/inclusive_testing_zpext_PFlow_r22.h5
        <<: *variable_cuts_zpext


exclude: null

# Tracks dataset name
tracks_name: "tracks"

Options	Data Type	Necessary, Optional	Explanation
`model_name`	`str`	Necessary	Name of the model you want to train. This will be the name of the folder, where all results etc. will be saved in. This folder will automatically be created if not existing.
`preprocess_config`	`str`	Necessary	Path to the preprocessing config you used to produce the train datasets. When you start the training and the folder for the model is created, this file is copied to the `metadata/` folder inside the model folder. Also, the path here in the train config will be changed to the new path of the preprocessing config inside the `metadata/` folder.
`model_file`	`str`	Optional	If you already have a model and want to use the weights of this model as start point (maybe you have a R21 trained model and now you want to use the weights of that model as initial weights for your R22 training), you can give the path to this model here. This model will be loaded and used instead of init a new one. If you don't set `load_optimiser` in `nn_structure`, the optimiser state will be resetted. If you just want to continue a specific training, use `continue_training` and leave this option empty.
`train_file`	`str`	Necessary	Path to the training sample. This is given by the `preprocessing` step of Umami. One can also train with a TDD file format that is available after resampling but before writing step. The scaling and shifting dict will be taken from the path in preprocessing configs automatically. If you want to use the `TFRecords` format to train, this must be the path to the folder where the `TFRecords` files are saved.
`continue_training`	`bool`	Optional	If your training died due to time constrains of jobs and you just want to continue the training from the latest point on, set this value to `True` and restart the training.
`exclude`	`list`	Necessary	List of jet variables that are excluded from training. Only compatible with DL1 training!. To include all, just set this option to `null`. If you don't train DL1 also set this just to `null`.
`tracks_name`	`str`	Necessary	Name of the tracks data-set to use for training and evaluation, default is "tracks". If you are training DL1, just remove this option. This option is necessary when using tracks, but, when working with old preprocessed files (before January 2022, Tag 05 or older) this option has to be removed form the config file to ensure compatibility*

Two uncovered options are the validation_files and the test_files. Here we simply define the files which are later used for the validation/evaluation step. It is possible to define multiple validation files using the dict structure shown in the example. Focusing on the validation_files first, each entry needs a unique name. This name is the internal identifier for this specific file. The options defined in this entry are explained here:

Options	Data Type	Necessary, Optional	Explanation
`path`	`str`	Necessary	Path to the validation/test file which is to be used. Using wildcards is possible.
`label`	`str`	Necessary	Label which is used for this file when plotting the validation plots.
`variable_cuts`	`dict`	Optional	`dict` of cuts which are applied when loading the different test files. Only jet variables can be cut on. These are in this example defined as templates for the different samples types.

Focusing now on the test_files, the options are the same but no label is needed, because the plotting of the evaluation results is an extra step covered here.

Both options can also be added after the training is finished. For the training itself, you can leave this options blank.

Network Settings#

The next section in the train config is the nn_structure. Here we define all the information needed for building the model, like which tagger we want to use and also the number of hidden layer and nodes per hidden layer. The general options are shown next while the tagger dependant options are shown in their respective subsections.

nn_structure:
    # Decide, which tagger is used
    tagger: "dips"

    # NN Training parameters
    learning_rate: 0.001
    batch_size: 15000
    epochs: 200

    # Number of jets used for training
    # To use all: Fill nothing
    n_jets_train:

    # Define which classes are used for training
    # These are defined in the global_config
    class_labels: ["ujets", "cjets", "bjets"]

    # Main class which is to be tagged
    main_class: "bjets"

    # Decide if Batch Normalisation is used
    batch_normalisation: True

    # Structure of the dense layers after summing up the track outputs + respective
    # dropout rates
    dense_sizes: [100, 100, 100, 30]
    dropout_rate: [0.1, 0.1, 0.1, 0.1]

    # Options for the Learning Rate reducer
    lrr: True

    # Option if you want to use sample weights for training
    use_sample_weights: False

Options	Data Type	Necessary, Optional	Explanation
`tagger`	`str`	Necessary	Name of the tagger that is used/to be trained. The currently supported taggers are `dips`, `dips_attention`, `cads`, `dl1`, `umami`, `umami` and `umami_cond_att`. Note All version of DL1* (like DL1r or DL1d) uses the `tagger` `dl1`!
`learning_rate`	`float`	Necessary	Learning rate which is used for training.
`batch_size`	`int`	Necessary	Batch size which is used for training.
`epochs`	`int`	Necessary	Number of epochs of the training.
`n_jets_train`	`int`	Necessary	Number of jets used for training. Leave empty to use all.
`class_labels`	`list`	Necessary	List of flavours used in training. NEEDS TO BE THE SAME AS IN THE `preprocess_config`. Even the ordering needs to be the same!
`main_class`	`str` or `list` of `str`	Necessary	Main class which is to be tagged. Needs to be in `class_labels`. This can either be one single class (`str`) or multiple classes (`list` of `str`).
`batch_normalisation`	`bool`	Necessary	Decide, if batch normalisation is used in the network. (Look in the model files where this is used for the specific models)
`dense_sizes`	`list`	Necessary	List of nodes per layer of the network. Every entry is one layer. The numbers need to be `int`! For DL1r/DL1d, this is the number of nodes per layer. For DIPS/DIPS Attention/Umami/CADS this is the number of nodes per layer for the F network.
`dropout_rate`	`list`	List of dropout rates for the layers defined via `dense_sizes`. Has to be of the same length as the `dense_sizes` list.
`load_optimiser`	`bool`	Optional	When loading a model (via `model_file`), you can load the optimiser state for continuing a training (`True`) or initialize a new optimiser to use the model as a start point for a fresh training (`False`).
`use_sample_weights`	`bool`	Optional	Applies the weights, you calculated with the `--weighting` flag from the preprocessing to the training loss function.
`nfiles_tfrecord`	`int`	Optional	Number of files that are loaded at the same time when using `TFRecords` for training.
`lrr`	`bool`	Optional	Decide, if a Learning Rate Reducer (lrr) is used or not. If yes, the following options can be added.
`lrr_monitor`	`str`	Optional	Quantity to be monitored. Default: "loss"
`lrr_factor`	`float`	Optional	Factor by which the learning rate will be reduced. `new_lr = lr * factor`. Default: 0.8
`lrr_patience`	`int`	Optional	Number of epochs with no improvement after which learning rate will be reduced. Default: 3
`lrr_verbose`	`int`	Optional	0: Quiet, 1: Update messages. Default: 1
`lrr_mode`	`str`	Optional	One of `{"auto", "min", "max"}`. In "min" mode, the learning rate will be reduced when the quantity monitored has stopped decreasing; in "max" mode it will be reduced when the quantity monitored has stopped increasing; in "auto" mode, the direction is automatically inferred from the name of the monitored quantity. Default: "auto"
`lrr_cooldown`	`int`	Optional	Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 5
`lrr_min_learning_rate`	`float`	Optional	Lower bound on the learning rate. Default: 0.000001

DIPS

DIPS#

    # Structure of the dense layers for each track + respective dropout rates
    ppm_sizes: [100, 100, 128]
    dropout_rate_phi: [0, 0, 0]

Options	Data Type	Necessary, Optional	Explanation
`ppm_sizes`	`list`	Necessary	List of nodes per layer of the ϕ network. Every entry is one layer. The numbers need to be `int`!
`dropout_rate_phi`	`list`	Necessary	List of dropout rates in the ϕ network. Has to be of the same length as the `ppm_sizes` list.

DL1*

DL1*#

    # Activations of the layers. Starting with first dense layer.
    activations: ["relu", "relu", "relu", "relu", "relu", "relu", "relu", "relu"]

Options	Data Type	Necessary, Optional	Explanation
`activations`	`list`	Necessary	List of activations per layer defined in `dense_sizes`. Every entry is the activation for one hidden layer. The entries must be `str` and activations supported by `Keras`.
`repeat_end`	`list`	Optional	Experimental. List of input variables that are folded into the output of the penultimate layer. This is then feeded into the last layer.

Umami

Umami#

    # DIPS structure
    dips_ppm_units: [100, 100, 128]
    dips_dense_units: [100, 100, 100, 30]

    # These are the layers that will be concatenated with the last layer of dips_dense_units
    intermediate_units: [72]

    # DL1 structure
    dl1_units: [57, 60, 48, 36, 24, 12, 6]

    # total loss = loss(umami) + dips_loss_weight * loss(dips)
    dips_loss_weight: 1.0

    # Define which classes are used for training
    # These are defined in the global_config
    class_labels: ["ujets", "cjets", "bjets"]

    # Main class which is to be tagged
    main_class: "bjets"

    # Options for the Learning Rate reducer
    lrr: True

    # Option if you want to use sample weights for training
    use_sample_weights: False

Options	Data Type	Necessary, Optional	Explanation
`dips_ppm_units`	`list`	Necessary	Similar to DIPS `ppm_sizes`. List of nodes per layer of the ϕ network. Every entry is one layer. The numbers need to be `int`!
`dips_dense_units`	`list`	Necessary	Similar to DIPS `dense_sizes`. List of nodes per layer of the F network. Every entry is one layer. The numbers need to be `int`!
`intermediate_units`	`list`	Necessary	These are the layers that will be concatenated with the last layer of `dips_dense_units`. Every entry is one layer. The numbers need to be `int`!
`dl1_units`	`list`	Necessary	Similar to DL1+ `dense_sizes`. List of nodes per layer of the DL1-like network. Every entry is one layer. The numbers need to be `int`!
`dips_loss_weight`	`float` or `int`	Necessary	Loss weight $w_{\text{DIPS}}$ for the DIPS loss. While training Umami, two losses are obtained: The final Umami loss and the DIPS loss. This value is the factor how important the DIPS loss is for the final model loss. $\text{Loss}_{\text{Total}} = \text{Loss}_{\text{Umami}} + w_{\text{DIPS}} * \text{Loss}_{\text{DIPS}}$ .

DIPS Attention/CADS

DIPS Attention/CADS#

The different between the tagger here is the *_condition options. If all *_condition options are False, the tagger to use is dips_attention while if one *_condition option is True, the tagger to use is cads.

    ppm_sizes: [100, 100, 128]

    # Decide, if the pT and eta info is folded into the deep sets input
    ppm_condition: True

    # Structure of the dense layers after summing up the track outputs
    dense_sizes: [100, 100, 100, 30]

    # Decide, if the pT and eta info is folded into the F network input
    dense_condition: False

    # Number of conditions for conditional deep sets
    n_conditions: 2

    # Decide which pooling should be used
    pooling: "attention"

    # Number of attention nodes
    attention_sizes: [128, 128]

    # Decide, if the pT and eta info is folded into the attention network input
    attention_condition: True

    # Options for the Learning Rate reducer
    lrr: True

    # Option if you want to use sample weights for training
    use_sample_weights: False

Options	Data Type	Necessary, Optional	Explanation
`ppm_sizes`	`list`	Necessary	Similar to DIPS `ppm_sizes`. List of nodes per layer of the ϕ network. Every entry is one layer. The numbers need to be `int`!
`ppm_condition`	`bool`	Necessary	If you want to use/fold the conditional information into the input of the ϕ network.
`dense_sizes`	`list`	Necessary	Similar to DIPS `dense_sizes`. List of nodes per layer of the F network. Every entry is one layer. The numbers need to be `int`!
`dense_condition`	`bool`	Necessary	If you want to use/fold the conditional information into the input of the F network.
`n_conditions`	`int`	Necessary	Number of conditional jet input variables to use for CADS.
`pooling`	`string`	Necessary	Pooling method that is used to pool the output of the ϕ and A networks.
`attention_sizes`	`string`	Necessary	Similar to `ppm_sizes`. List of nodes per layer of the A network. Every entry is one layer. The numbers need to be `int`!
`attention_condition`	`bool`	Necessary	If you want to use/fold the conditional information into the input of the A network.

Running the Training#

After the global- and network settings are prepared, you can start training your model. To start the training, switch to the umami/umami folder and run the following command:

train.py -c <path to train config file> --prepare

This command will not directly start the training, but will prepare the model folder with all needed configs/scale dicts etc. Before starting the real training, you should check all config files again. The new folder will have the name given in model_name. In there is a folder called metadata in which all configs/scale dicts etc. were copied. Also, all paths in the config are adapted to the metadata folder, like the path to the preprocessing config etc. After you checked everything, you can run the real training via the command:

train.py -c <path to train config file>

This will start the real training. Other command line arguments available for train.py are -o which will overwrite the configs/dicts in metadata if you run the training.

Note When training, the callback methods of the different taggers validate the training on the fly which can lead to memory issues. To deactivate the on the fly validation, what we recommend, just set the n_jets option in the validation_settings section of the train config to 0.