Plotting the evaluation results#
The evaluation results can be plotted using different functions. There is the plotting_umami.py, plotting_epoch_performance and the plot_input_variables.py. Each plotting script is explained in its dedicated section.
plotting_umami.py#
The plotting_umami.py is used to plot the results of the evaluation script. Different plots can be produced with it which are fully customizable. All plots that are defined in the plotting_umami_config_X.yaml
. The X
defines the tagger here but its just a name. All config files are usable with the plotting_umami.py script.
Yaml Config File#
Important: The indentation in this .yaml is important due to the way the files are read by the script.
A fully written one can be found here.
The name of your freshly trained tagger, the tagger_name
here in the config, is always the name of your model you have trained. The name is the value of tagger
from the nn_structure
.
The config file starts with the Eval_parameters
. Here the Path_to_models_dir
is set, where the models are saved. Also the model_name
and the epoch
which is to be plotted is set. A boolean parameter can be §set here to add the epoch to the end of the plot name. This is epoch_to_name
. For example, this can look like this:
# Evaluation parameters
Eval_parameters:
Path_to_models_dir: <path_palce_holder>/umami/umami
model_name: dips_Loose_lr_0.001_bs_15000_epoch_200_nTrainJets_Full
epoch: 59
epoch_to_name: True
In the different available plots, there are options that are available in mostly all of them. So they will be explained next. For specific options, look at the comment in the section of the plot.
Options | Explanation |
---|---|
Name_of_the_plot |
All plots start with no indentation and the name of plot. This will be the output name of the plot file and has no impact on the plot itself. |
type |
This option specifies the plot function that is used. |
data_set_name |
Decides which evaluated dataset (or file) is used. This data_set_name are set in the train_config yaml file which is used in the evaluation of the model. There the different files are getting their own data_set_name which needs to be the same as here! |
class_labels |
List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important! (Possible entries are defined in the global_config.yaml) |
models_to_plot |
In the plots, the models which are to be plotted needs to be defined in here. You can add as many models as you want. For example this can be used to plot the results of the different taggers in one plot (e.g. for score or ROC curves). The different models can be assisted with evaluation_file to point to the results file you have created with evaluate_model.py . e.g.evaluation_file: YOURMODEL/results/results-rej_per_eff-229.h5 |
plot_settings |
In this section, all optional plotting settings are defined. They don't need to be defined but you can. For the specific available options in each function, look in the corresponding section. |
In plot_settings
, some general options can be set which are used in all of the
available plots. These are:
Parameter | Type | Description |
---|---|---|
title |
str , optional |
Title of the plot, by default "" |
draw_errors |
bool , optional |
Draw statistical uncertainty on the lines, by default True |
xmin |
float , optional |
Minimum value of the x-axis, by default None |
xmax |
float , optional |
Maximum value of the x-axis, by default None |
ymin |
float , optional |
Minimum value of the y-axis, by default None |
ymax |
float , optional |
Maximum value of the y-axis, by default None |
ymin_ratio |
list , optional |
Set the lower y limit of each of the ratio subplots, by default None. |
ymax_ratio |
list , optional |
Set the upper y limit of each of the ratio subplots, by default None. |
y_scale |
float , optional |
Scaling up the y axis, e.g. to fit the ATLAS Tag. Applied if ymax not defined, by default 1.3 |
xlabel |
str , optional |
Label of the x-axis, by default None |
ylabel |
str , optional |
Label of the y-axis, by default None |
ylabel_ratio |
list , optional |
List of labels for the y-axis in the ratio plots, by default "Ratio" |
label_fontsize |
int , optional |
Used fontsize in label, by default 12 |
fontsize |
int , optional |
Used fontsize, by default 10 |
n_ratio_panels |
int , optional |
Amount of ratio panels between 0 and 2, by default 0 |
figsize |
(float, float) , optional |
Tuple of figure size (width, height) in inches, by default (8, 6) |
dpi |
int , optional |
DPI used for plotting, by default 400 |
transparent |
bool , optional |
Specify if the background of the plot should be transparent, by default False |
grid |
bool , optional |
Set the grid for the plots. |
leg_fontsize |
int , optional |
Fontsize of the legend, by default 10 |
leg_loc |
str , optional |
Position of the legend in the plot, by default "upper right" |
leg_ncol |
int , optional |
Number of legend columns, by default 1 |
leg_linestyle_loc |
str , optional |
Position of the linestyle legend in the plot, by default "upper center" |
apply_atlas_style |
bool , optional |
Apply ATLAS style for matplotlib, by default True |
use_atlas_tag |
bool , optional |
Use the ATLAS Tag in the plots, by default True |
atlas_first_tag |
str , optional |
First row of the ATLAS tag (i.e. the first row is "ATLAS |
atlas_second_tag |
str , optional |
Second row of the ATLAS tag, by default "" |
atlas_fontsize |
float , optional |
Fontsize of ATLAS label, by default 10 |
atlas_vertical_offset |
float , optional |
Vertical offset of the ATLAS tag, by default 7 |
atlas_horizontal_offset |
float , optional |
Horizontal offset of the ATLAS tag, by default 8 |
atlas_brand |
str , optional |
brand argument handed to atlasify. If you want to remove it just use an empty string or None, by default "ATLAS" |
atlas_tag_outside |
bool , optional |
outside argument handed to atlasify. Decides if the ATLAS logo is plotted outside of the plot (on top), by default False |
atlas_second_tag_distance |
float , optional |
Distance between the atlas_first_tag and atlas_second_tag text in units of line spacing, by default 0 |
For plotting, these different plots are available:
Confusion Matrix#
Plot a confusion matrix. For example:
confusion_matrix_Dips_ttbar:
type: "confusion_matrix"
data_set_name: "ttbar_r21"
tagger_name: "dips"
class_labels: ["ujets", "cjets", "bjets"]
plot_settings:
colorbar: True
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
colourbar |
bool |
Optional | Define, if the colourbar on the side is shown or not. |
Probability#
Plotting the DNN probability output for a specific class. For example:
Dips_prob_pb:
type: "probability"
prob_class: "bjets"
models_to_plot:
dips_r22:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
class_labels: ["ujets", "cjets", "bjets"]
plot_settings:
logy: True
bins: 50
y_scale: 1.5 # Increasing of the y axis so the plots dont collide with labels (mainly atlas_first_tag)
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample"
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
type |
str |
Necessary | This gives the type of plot function used. Must be "probability" here. |
prob_class |
str |
Necessary | Class of the to be plotted probability. |
dips_r22 |
None |
Necessary | Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here. |
data_set_name |
str |
Necessary | Name of the dataset that is used. This is the name of the test_file which you want to use. |
label |
str |
Necessary | Legend label of the model. |
tagger_name |
str |
Necessary | Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger. |
class_labels |
list |
Necessary | List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important! |
Scores#
Plotting the b-tagging discriminant scores for the different jet flavors. For example:
scores_Dips_ttbar:
type: "scores"
main_class: "bjets"
models_to_plot:
dips_r21:
data_set_name: "ttbar_r21"
tagger_name: "dips"
class_labels: ["ujets", "cjets", "bjets"]
label: "$t\\bar{t}$"
plot_settings:
working_points: [0.60, 0.70, 0.77, 0.85] # Set Working Point Lines in plot
bins: 50 # Number of bins
y_scale: 1.3 # Increasing of the y axis so the plots dont collide with labels (mainly atlas_first_tag)
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample"
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
type |
str |
Necessary | This gives the type of plot function used. Must be "scores" here. |
main_class |
str |
Class which is to be tagged. | |
dips_r21 |
None |
Necessary | Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here. |
data_set_name |
str |
Necessary | Name of the dataset that is used. This is the name of the test_file which you want to use. |
tagger_name |
str |
Necessary | Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger. |
class_labels |
list |
Necessary | List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important! |
label |
str |
Necessary | Legend label of the model. |
working_points |
list |
Optional | The specified WPs are calculated and at the calculated b-tagging discriminant there will be a vertical line with a small label on top which prints the WP. |
ROC Curves#
Plotting the ROC Curves of the rejection rates against the b-tagging efficiency. For example:
Dips_light_flavour_ttbar:
type: "ROC"
models_to_plot:
dips_r21_u:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
rejection_class: "ujets"
plot_settings:
draw_errors: True
xmin: 0.5
ymax: 1000000
figsize: [7, 6] # [width, hight]
working_points: [0.60, 0.70, 0.77, 0.85]
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ validation sample, fc=0.018"
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
type |
str |
Necessary | This gives the type of plot function used. Must be "ROC" here. |
main_class |
str |
Class which is to be tagged. | |
dips_r21 |
None |
Necessary | Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here. |
data_set_name |
str |
Necessary | Name of the dataset that is used. This is the name of the test_file which you want to use. |
label |
str |
Necessary | Legend label of the model. |
tagger_name |
str |
Necessary | Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger. |
rejection_class |
str |
Necessary | Class which the main flavour is plotted against. |
draw_errors |
bool |
Optional | Plot binomial errors to plot. |
xmin |
float |
Optional | Set the minimum b efficiency in the plot (which is the xmin limit). |
ymax |
float |
Optional | The maximum y axis. |
working_points |
list |
Optional | The specified WPs are calculated and at the calculated b-tagging discriminant there will be a vertical line with a small label on top which prints the WP. |
You can plot two rejections at the same time with two subplots with the ratios. One for each rejection. An example for this can be seen here:
Dips_Comparison_flavour_ttbar:
type: "ROC"
models_to_plot:
dips_r21_u:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
rejection_class: "ujets"
dips_r21_c:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
rejection_class: "cjets"
plot_settings:
draw_errors: True
xmin: 0.5
ymax: 1000000
figsize: [9, 9] # [width, hight]
working_points: [0.60, 0.70, 0.77, 0.85]
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ validation sample, fc=0.018"
Variable vs Efficiency#
Plot the b efficiency/c-rejection/light-rejection against the pT. For example:
Dips_pT_vs_beff:
type: "pT_vs_eff"
models_to_plot:
dips:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
plot_settings:
bin_edges: [0, 20, 30, 40, 60, 85, 110, 140, 175, 250, 400, 1000]
flavour: "cjets"
variable: "pt"
class_labels: ["ujets", "cjets", "bjets"]
main_class: "bjets"
working_point: 0.77
working_point_line: True
fixed_eff_bin: False
figsize: [7, 5]
logy: False
use_atlas_tag: True
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample"
y_scale: 1.3
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
type |
str |
Necessary | This gives the type of plot function used. Must be "var_vs_eff" here. |
dips |
None |
Necessary | Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here. |
data_set_name |
str |
Necessary | Name of the dataset that is used. This is the name of the test_file which you want to use. |
label |
str |
Necessary | Legend label of the model. |
tagger_name |
str |
Necessary | Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger. |
bin_edges |
list |
Necessary | Setting the edges of the bins. Don't forget the first/last edge! |
flavour |
str |
Necessary | Flavour class rejection which is to be plotted. |
variable |
str |
Necessary | Variable against the efficiency/rejection is plotted against. |
class_labels |
list |
Necessary | List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important! |
main_class |
str |
Necessary | Class which is to be tagged. |
working_point |
float |
Necessary | Float of the working point that will be used. |
working_point_line |
float |
Optional | Print a horizontal line at this value efficiency. |
fixed_eff_bin |
bool |
Optional | Calculate the WP cut on the discriminant per bin. |
Saliency Plots#
To evaluate the impact of the track variables to the final b-tagging discriminant can't be found using SHAPley. To make the impact visible (for each track of the jet), so-called Saliency maps are used. These maps are calculated when evaluating the model you have trained (if it is activated). A lot of different options can be set. An example is given here:
Dips_saliency_b_WP77_passed_ttbar:
type: "saliency"
data_set_name: "ttbar_r21"
target_eff: 0.77
jet_flavour: "bjets"
PassBool: True
nFixedTrks: 8
plot_settings:
title: "Saliency map for $b$ jets from \n $t\\bar{t}$ who passed WP = 77% \n with exactly 8 tracks"
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets"
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
type |
str |
Necessary | This gives the type of plot function used. Must be "saliency" here. |
data_set_name |
str |
Necessary | Name of the dataset that is used. This is the name of the test_file which you want to use. |
target_eff |
float |
Necessary | Efficiency of the target flavour you want to use (Which WP you want to use). The value is given between 0 and 1. |
jet_flavour |
str |
Necessary | Name of flavour you want to plot. |
PassBool |
str |
Necessary | Decide if the jets need to pass the working point discriminant cut or not. False would give you, for example, truth b-jets which does not pass the working point discriminant cut and are therefore not tagged a b-jets. |
nFixedTrks |
int |
Necessary | The saliency maps can only be calculated for jets with a fixed number of tracks. This number of tracks can be set with this parameter. For example, if this value is 8 , than only jets which have exactly 8 tracks are used for the saliency maps. This value needs to be set in the train config when you run the evaluation! If you run the evaluation with, for example 5 , you can't plot the saliency map for 8 . |
Fraction Contour Plot#
Plot two rejections against each other for a given working point with different fraction values. This is very helpful when you want to tune the fraction values for the different background classes for your model.
Note: This is a 2D plot. So you can only plot 2 different rejections, one per axis. If you have a training with more than 2 background classes (for example a training with tau jets), you need to fix the fraction value to a certain value here and vary the other two. This can be done with the
contour_fraction_ttbar:
type: "fraction_contour"
rejections: ["ujets", "cjets"]
models_to_plot:
dips:
tagger_name: "dips"
colour: "b"
linestyle: "--"
label: "DIPS"
data_set_name: "ttbar_r21"
marker:
cjets: 0.1
ujets: 0.9
marker_style: "x"
rnnip:
tagger_name: "rnnip"
colour: "r"
linestyle: "--"
label: "RNNIP"
data_set_name: "ttbar_r21"
plot_settings:
y_scale: 1.3 # Increasing of the y axis so the plots dont collide with labels (mainly atlas_first_tag)
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample, WP = 77 %"
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
rejections |
list |
Necessary | List with two items. These are the rejections that are plotted against each other. Only background classes can be plotted like this. Note: If you have more than two background classes, you need to fix one to a certain value. This needs to be done for every model you define in the plot. Have a closer look at fixed_rejections for that. |
tagger_name |
str |
Necessary | Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger. |
fixed_rejections |
dict |
Optional | Dict with the fractions you want to fix to a certain value. The flavour is the key and the fraction value is the value. |
colour |
str |
Optional | Give a specific colour to the tagger. |
linestyle |
str |
Optional | Give a specific linestyle to the tagger. |
label |
str |
Necessary | Give a label for the tagger that will be printed to the legend. |
data_set_name |
str |
Necessary | The dataset to use from the dataframe as specified in evaluation. |
marker |
dict |
Optional | You can set a marker (a x or something like that) at a certain fraction combination if you want to. All important information for that are added here. |
rejection |
float |
Necessary (if marker is used) |
Give two fraction values for your selected rejections. This is the position where the marker will be plotted. In the example, this is cjets and ujets . |
marker_style |
str |
Optional | Give a marker style that is used for the marker. Default is "x". |
marker_label |
str |
Optional | Give a custom marker legend label. Default is the tagger label + the fraction values. |
markersize |
int |
Optional | Size of the marker. Default is 15 . |
markeredgewidth |
int |
Optional | Size of the lines of the marker. Default is 2 . |
Executing the Script#
The script can be executed by using the following command:
plotting_umami.py -c ${EXAMPLES}/plotting_umami_config_dips.yaml -o dips_eval_plots
The -o
option defines the name of the output directory. It will be added to the model folder where also the results are saved. Also you can set the output filetype by using the -f
option. For example:
plotting_umami.py -c ${EXAMPLES}/plotting_umami_config_dips.yaml -o dips_eval_plots -f png
The output plots will be .png now. Standard is pdf.