Plotting Input Variables#
The input variables for different files can also be plotted using the plot_input_variables.py
script. Its also steered by a yaml file. An example for such a file can be found here. The structure is close to the one from plotting_umami
but still a little bit different.
To start the plotting of the input variables, you need to run the following command
plot_input_vars.py -c <path/to/config> --tracks
or
plot_input_vars.py -c <path/to/config> --jets
which will plot either all plots defined using jet- or track variables. You can also give the -f
or --format
option where you can decide on a format for the plots. The default is pdf
.
Yaml File#
In the following, the possible configration parameters are listed with a brief description.
Number of jets#
Here you can define the number of jets that are used.
Click to see corresponding code in the example config file
Eval_parameters:
# Number of jets which are used
n_jets: 3e4
Number of Tracks per Jet#
The number of tracks per jet can be plotted for all different files. This can be given like this:
Click to see corresponding code in the example config file
nTracks:
variables: "tracks"
folder_to_save: nTracks
nTracks: True
Datasets_to_plot:
R21:
files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
label: "R21 Loose"
tracks_name: "tracks"
R22:
files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
label: "R22 Loose"
tracks_name: "tracks"
<<: *ttbar_cuts
plot_settings:
<<: *default_plot_settings
ymin_ratio: [0.5]
ymax_ratio: [2]
class_labels: ["bjets", "cjets", "ujets"]
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
nTracks_ttbar_loose |
str |
Necessary | Name of the plots. This does not effect anything for the plots itself. |
variables |
str |
Necessary | Must be set to "tracks" for this function. Decides, which functions for plotting are used. |
folder_to_save |
str |
Necessary | Path where the plots should be saved. This is a relative path. Add a folder name as path. |
nTracks |
bool |
Necessary | MUST BE TRUE HERE! Decide if the Tracks per Jets are plotted or the input variable. |
Datasets_to_plot |
None | Necessary | Here the category starts of which plots shall be plotted. |
R21 |
None | Necessary | Name of the fileset which is to be plotted. Does not effect anything! |
files |
str |
Necessary | Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters . |
label |
str |
Necessary | Plot label for the plot legend. |
tracks_name |
str |
Necessary | Name of the tracks inside the h5 files you want to plot. |
cut_vars_dict |
list |
Necessary | A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes ) and then as sub-entries the operator used for the cut (operator ) and the condition used for the cut (condition ). |
plot_settings |
dict |
Necessary | Here starts the plot settings. See possible parameters in the section below. |
Input Variables Tracks#
To plot the track input variables, the following options are used.
Click to see corresponding code in the example config file
tracks_input_vars:
variables: "tracks"
folder_to_save: tracks_input_vars
Datasets_to_plot:
R21:
files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
label: "R21 Loose"
tracks_name: "tracks"
R22:
files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
label: "R22 Loose"
tracks_name: "tracks"
plot_settings:
<<: *default_plot_settings
sorting_variable: "ptfrac"
n_leading: [None, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
ymin_ratio: [0.5]
ymax_ratio: [1.5]
<<: *ttbar_cuts
var_dict:
IP3D_signed_d0_significance: 100
IP3D_signed_z0_significance: 100
numberOfInnermostPixelLayerHits: [0, 4, 1]
numberOfNextToInnermostPixelLayerHits: [0, 4, 1]
numberOfInnermostPixelLayerSharedHits: [0, 4, 1]
numberOfInnermostPixelLayerSplitHits: [0, 4, 1]
numberOfPixelSharedHits: [0, 4, 1]
numberOfPixelSplitHits: [0, 9, 1]
numberOfSCTSharedHits: [0, 4, 1]
ptfrac: [0, 5, 0.05]
dr: 100
numberOfPixelHits: [0, 11, 1]
numberOfSCTHits: [0, 19, 1]
btagIp_d0: 100
btagIp_z0SinTheta: 100
number_nPix_nSCT:
variables: ["numberOfPixelHits", "numberOfSCTHits"]
binning: [0, 19, 1]
operator: "+"
class_labels: ["bjets", "cjets", "ujets"]
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
input_vars_trks_ttbar_loose_ptfrac |
str |
Necessary | Name of the plots. This does not effect anything for the plots itself. |
variables |
str |
Necessary | Must be set to "tracks" for this function. Decides, which functions for plotting are used. |
folder_to_save |
str |
Necessary | Path where the plots should be saved. This is a relative path. Add a folder name as path. |
nTracks |
bool |
Necessary | To plot the input variable distributions, this must be False . |
Datasets_to_plot |
None | Necessary | Here the category starts of which plots shall be plotted. |
R21 |
None | Necessary | Name of the fileset which is to be plotted. Does not effect anything! |
files |
str |
Necessary | Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters . |
label |
str |
Necessary | Plot label for the plot legend. |
tracks_name |
str |
Necessary | Name of the tracks inside the h5 files you want to plot. |
plot_settings |
dict |
Necessary | Here starts the plot settings. See possible parameters in the section below. |
var_dict |
dict |
Necessary | A dict with all the variables you want to plot inside. The key of the entry is the name of the variable you want to plot (how it is named in the files) and the entry itself is the binning. If you give an int , you will get your chosen number of equidistant bins. You can also give a three element list which will be used in the numpy.arange function. The first element is start, second is stop and third is number of bins. The so arranged numbers are bin edges not bins! If no value is given, the standard value is 100 . If you want, for example, plot the sum of numberOfPixelHits and numberOfSCTHits , the entry needs to be a dict itself with three entries. variables , which is a list of variables you want to add up for example. operator which is the operation how to merge them. Available are "+" , "-" , "*" and "/" . And last the binning. This is the same as explained before with the int and the list . An example is given in the config above. The variable is named number_nPix_nSCT . You can also apply the log to one variable. This can be done by defining only one variable in the dict and set the operator to "log" . |
cut_vars_dict |
list |
Necessary | A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes ) and then as sub-entries the operator used for the cut (operator ) and the condition used for the cut (condition ). |
xlabels |
dict |
Optional | Dict with custom xlabels |
Input Variables Jets#
To plot the jet input variables, the following options are used.
Click to see corresponding code in the example config file
jets_input_vars:
variables: "jets"
folder_to_save: jets_input_vars
Datasets_to_plot:
R21:
files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
label: "R21 Loose"
# class_labels can also be defined for a specific dataset (the way it is done here,
# it doesn't change anything since it's the same as the globally defined class_labels)
class_labels: ["bjets", "cjets", "ujets"]
R22:
files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
label: "R22 Loose"
# If you want to specify the `class_labels` per dataset you can add it here
# If you don't specify anything here, the overall defined `class_labels` will be
# used
# class_labels: ["bjets", "cjets", "ujets"]
plot_settings:
<<: *default_plot_settings
class_labels: ["bjets", "cjets", "ujets"]
<<: *ttbar_cuts
special_param_jets:
SV1_NGTinSvx:
lim_left: 0
lim_right: 19
JetFitterSecondaryVertex_nTracks:
lim_left: 0
lim_right: 17
JetFitter_nTracksAtVtx:
lim_left: 0
lim_right: 19
JetFitter_nSingleTracks:
lim_left: 0
lim_right: 18
JetFitter_nVTX:
lim_left: 0
lim_right: 6
JetFitter_N2Tpair:
lim_left: 0
lim_right: 200
xlabels:
# here you can define xlabels, if a variable is not in this dict, the variable name
# will be used (i.e. for pT this would be 'pt_btagJes')
pt_btagJes: "$p_T$ [MeV]"
var_dict:
JetFitter_mass: 100
JetFitter_energyFraction: 100
JetFitter_significance3d: 100
JetFitter_deltaR: 100
JetFitter_nVTX: 7
JetFitter_nSingleTracks: 19
JetFitter_nTracksAtVtx: 20
JetFitter_N2Tpair: 201
JetFitter_isDefaults: 2
JetFitterSecondaryVertex_minimumTrackRelativeEta: 11
JetFitterSecondaryVertex_averageTrackRelativeEta: 11
JetFitterSecondaryVertex_maximumTrackRelativeEta: 11
JetFitterSecondaryVertex_maximumAllJetTrackRelativeEta: 11
JetFitterSecondaryVertex_minimumAllJetTrackRelativeEta: 11
JetFitterSecondaryVertex_averageAllJetTrackRelativeEta: 11
JetFitterSecondaryVertex_displacement2d: 100
JetFitterSecondaryVertex_displacement3d: 100
JetFitterSecondaryVertex_mass: 100
JetFitterSecondaryVertex_energy: 100
JetFitterSecondaryVertex_energyFraction: 100
JetFitterSecondaryVertex_isDefaults: 2
JetFitterSecondaryVertex_nTracks: 18
pt_btagJes: 100
absEta_btagJes: 100
SV1_Lxy: 100
SV1_N2Tpair: 8
SV1_NGTinSvx: 20
SV1_masssvx: 100
SV1_efracsvx: 100
SV1_significance3d: 100
SV1_deltaR: 10
SV1_L3d: 100
SV1_isDefaults: 2
rnnip_pb: 50
rnnip_pc: 50
rnnip_pu: 50
combined_rnnip:
variables: ["rnnip_pc", "rnnip_pu"]
binning: 50
operator: "+"
flavours:
b: 5
c: 4
u: 0
tau: 15
Options | Data Type | Necessary/Optional | Explanation |
---|---|---|---|
input_vars_trks_ttbar_loose_ptfrac |
str |
Necessary | Name of the plots. This does not effect anything for the plots itself. |
variables |
str |
Necessary | Must be set to "jets" for this function. Decides, which functions for plotting are used. |
folder_to_save |
str |
Necessary | Path where the plots should be saved. This is a relative path. Add a folder name as path. |
Datasets_to_plot |
None | Necessary | Here the category starts of which plots shall be plotted. |
R21 |
None | Necessary | Name of the fileset which is to be plotted. Does not effect anything! |
files |
str |
Necessary | Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters . |
label |
str |
Necessary | Plot label for the plot legend. |
special_param_jets |
None | Necessary | Here starts the special x axis limits for a variable. If you want to set the x range by hand, add the variable here and also the lim_left for xmin and lift_right for xmax. |
var_dict |
dict |
Necessary | A dict with all the variables you want to plot inside. The key of the entry is the name of the variable you want to plot (how it is named in the files) and the entry itself is the binning. If you give an int , you will get your chosen number of equidistant bins. You can also give a three element list which will be used in the numpy.arange function. The first element is start, second is stop and third is number of bins. The so arranged numbers are bin edges not bins! If no value is given, the standard value is 100 . If you want, for example, plot the sum of rnnip_pc and rnnip_pu , the entry needs to be a dict itself with three entries. variables , which is a list of variables you want to add up for example. operator which is the operation how to merge them. Available are "+", "-", "*" and "/". And last the binning. This is the same as explained before with the int and the list . An example is given in the config above. The variable is named combined_rnnip . You can also apply the log to one variable. This can be done by defining only one variable in the dict and set the operator to log . |
cut_vars_dict |
list |
Necessary | A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes ) and then as sub-entries the operator used for the cut (operator ) and the condition used for the cut (condition ). |
plot_settings |
dict |
Necessary | Here starts the plot settings. See possible parameters in the section below. |
xlabels |
dict | Optional | Dict with custom xlabels |
Plot settings#
The plot_settings
section is similar for all three cases described above.
In order to define some settings you want to apply to all plots, use yaml anchors
as shown here:
Click to see corresponding code in the example config file
.default_plot_settings: &default_plot_settings
logy: True
use_atlas_tag: True
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}$ = 13 TeV, $t\\bar{t}$ PFlow jets \n30000 jets"
y_scale: 2
figsize: [7, 5]
.ttbar_cuts: &ttbar_cuts
cut_vars_dict:
- pt_btagJes:
operator: ">"
condition: 2.0e4
Most of the plot settings are valid for all types of input variable plots (i.e. jet variables, track variables and the n_tracks plot). If a parameter is only valid for a certain type of plot, this is listed below.
Plot settings#
You can specify some parameters for the plots themselves. You can use the following parameters. Note that some parameters are not supported for all types of plots.
Options | Plot Type | Data Type | Necessary/Optional | Explanation |
---|---|---|---|---|
xlabels |
dict | Optional | Dict with custom xlabels | |
sorting_variable |
Track variables | str |
Optional | Variable Name to sort after. |
n_leading |
Track variables | list |
Optional | list of the x leading tracks. If None , all tracks will be plotted. If 0 the leading tracks sorted after sorting variable will be plotted. You can add like None , 0 and 1 for example and it will plot all 3 of them, each in their own folders with according labeling. This must be a list ! Even if there is only one option given. |
track_origins |
Track variables and n_tracks plot | list |
Optional | list that gives the desired track origins when plotting. |
All remaining plot settings are parameters which are handed to puma
(Plotting
UMami API) more specifically the HistogramPlot
class.
Therefore, all parameters supported by the HistogramPlot
class can be specified there.
List of puma
parameters#
Parameter | Type | Description |
---|---|---|
discrete_vals |
list , optional |
List of values if a variable only has discrete values. If discrete_vals is specified only the bins containing these values are plotted. By default None. |
norm |
bool , optional |
Specify if the histograms are normalised, this means that histograms are divided by the total numer of counts. Therefore, the sum of the bin counts is equal to one, but NOT the area under the curve, which would be sum(bin_counts * bin_width). By default True. |
logy |
bool , optional |
Set log scale on y-axis, by default False. |
bin_width_in_ylabel |
bool , optional |
Specify if the bin width should be added to the ylabel, by default False |
underoverflow |
bool , optional |
Option to include under- and overflow values in outermost bins. |
grid |
bool , optional |
Set the grid for the plots, by default False |
stacked |
bool , optional |
Decide, if all histograms (which are not data) are stacked, by default False |
histtype |
str , optional |
If stacked is used, define the type of histogram you would like to have, default is "bar" |
title |
str , optional |
Title of the plot, by default "" |
draw_errors |
bool , optional |
Draw statistical uncertainty on the lines, by default True |
xmin |
float , optional |
Minimum value of the x-axis, by default None |
xmax |
float , optional |
Maximum value of the x-axis, by default None |
ymin |
float , optional |
Minimum value of the y-axis, by default None |
ymax |
float , optional |
Maximum value of the y-axis, by default None |
ymin_ratio |
list , optional |
Set the lower y limit of each of the ratio subplots, by default None. |
ymax_ratio |
list , optional |
Set the upper y limit of each of the ratio subplots, by default None. |
y_scale |
float , optional |
Scaling up the y axis, e.g. to fit the ATLAS Tag. Applied if ymax not defined, by default 1.3 |
xlabel |
str , optional |
Label of the x-axis, by default None |
ylabel |
str , optional |
Label of the y-axis, by default None |
ylabel_ratio |
list , optional |
List of labels for the y-axis in the ratio plots, by default "Ratio" |
label_fontsize |
int , optional |
Used fontsize in label, by default 12 |
fontsize |
int , optional |
Used fontsize, by default 10 |
n_ratio_panels |
int , optional |
Amount of ratio panels between 0 and 2, by default 0 |
figsize |
(float, float) , optional |
Tuple of figure size (width, height) in inches, by default (8, 6) |
dpi |
int , optional |
DPI used for plotting, by default 400 |
transparent |
bool , optional |
Specify if the background of the plot should be transparent, by default False |
grid |
bool , optional |
Set the grid for the plots. |
leg_fontsize |
int , optional |
Fontsize of the legend, by default 10 |
leg_loc |
str , optional |
Position of the legend in the plot, by default "upper right" |
leg_ncol |
int , optional |
Number of legend columns, by default 1 |
leg_linestyle_loc |
str , optional |
Position of the linestyle legend in the plot, by default "upper center" |
apply_atlas_style |
bool , optional |
Apply ATLAS style for matplotlib, by default True |
use_atlas_tag |
bool , optional |
Use the ATLAS Tag in the plots, by default True |
atlas_first_tag |
str , optional |
First row of the ATLAS tag (i.e. the first row is "ATLAS |
atlas_second_tag |
str , optional |
Second row of the ATLAS tag, by default "" |
atlas_fontsize |
float , optional |
Fontsize of ATLAS label, by default 10 |
atlas_vertical_offset |
float , optional |
Vertical offset of the ATLAS tag, by default 7 |
atlas_horizontal_offset |
float , optional |
Horizontal offset of the ATLAS tag, by default 8 |
atlas_brand |
str , optional |
brand argument handed to atlasify. If you want to remove it just use an empty string or None, by default "ATLAS" |
atlas_tag_outside |
bool , optional |
outside argument handed to atlasify. Decides if the ATLAS logo is plotted outside of the plot (on top), by default False |
atlas_second_tag_distance |
float , optional |
Distance between the atlas_first_tag and atlas_second_tag text in units of line spacing, by default 0 |