Namespace: Event

namespace event

Functions

template<typename ...Args> inline auto CombineFlags(ROOT::RDF::RNode df, const std::string &outputname, Args... args)

This function combines multiple boolean flags into a single boolean value based on the selected mode (“any_of”, “all_of”, or “none_of”). The mode determines how the flags are evaluated:

"any_of": Returns true if at least one of the flags is true
"all_of": Returns true if all flags are true
"none_of": Returns true if none of the flags are true

Note

The mode ("any_of", "all_of", or "none_of") is extracted as the last argument in the args parameter pack, and the rest of the arguments are treated as individual flag columns.

Template Parameters:

Args – variadic template parameter pack representing the flag columns plus mode

Parameters:

df – input dataframe
outputname – name of the output column containing the combined flag
args – parameter pack of column names that contain the considered flags of type bool, with the last argument being the mode ("any_of", "all_of", or "none_of")

Returns:

a dataframe with a new column

namespace filter

Functions

ROOT::RDF::RNode GoldenJSON(ROOT::RDF::RNode df, correctionManager::CorrectionManager &correction_manager, const std::string &filtername, const std::string &run, const std::string &luminosity, const std::string &json_path)

This function applies a filter to the input dataframe using a Golden JSON file, which contains a mapping of valid run-luminosity pairs. The dataframe is filtered by checking if the run and luminosity values for each row match the entries in the Golden JSON. Rows with invalid run-luminosity pairs are removed.

The Golden JSON files are taken from the CMS recommendations.

Run2: https://twiki.cern.ch/twiki/bin/view/CMS/LumiRecommendationsRun2

Run3: https://twiki.cern.ch/twiki/bin/view/CMS/LumiRecommendationsRun3 (not added yet)

Parameters:

df – input dataframe
correction_manager – correction manager responsible for loading the Golden JSON
filtername – name of the filter to be applied (used in the dataframe report)
run – name of the run column
luminosity – name of the luminosity column
json_path – path to the Golden JSON file

Returns:

a filtered dataframe

inline ROOT::RDF::RNode Flag(ROOT::RDF::RNode df, const std::string &filtername, const std::string &flagname)

This function applies a filter to the input dataframe based on a boolean flag column. It returns only the rows where the flag value is true.

Use case examples are the noise filters recommended by the CMS JetMET group (https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETOptionalFiltersRun2).

Parameters:

df – input dataframe
filtername – name of the filter to be applied (used in the dataframe report)
flagname – name of the boolean flag column to use for filtering

Returns:

a filtered dataframe

inline ROOT::RDF::RNode InvertedFlag(ROOT::RDF::RNode df, const std::string &filtername, const std::string &flagname)

This function applies a filter to the input dataframe based on a boolean flag column. It returns only the rows where the flag value is false.

Parameters:

df – input dataframe
filtername – name of the filter to be applied (used in the dataframe report)
flagname – name of the boolean flag column to use for filtering

Returns:

a filtered dataframe

template<typename ...Args> inline auto Flags(ROOT::RDF::RNode df, const std::string &filtername, Args... args)

This function filters the rows of the input dataframe by evaluating multiple boolean flags according to a specified mode. The filtering mode can be “any_of”, “all_of”, or “none_of”:

"any_of": Keeps the rows where at least one flag is true
"all_of": Keeps the rows where all flags are true
"none_of": Keeps the rows where none of the flags are true

Note

The last argument must be the mode, while the preceding arguments are the boolean flag columns to be evaluated.

Template Parameters:

Args – variadic template parameter pack representing the flag columns plus mode

Parameters:

df – input dataframe
filtername – name of the filter to be applied (used in the dataframe report)
args – parameter pack of column names that contain the considered flags of type bool, with the last argument being the mode ("any_of", "all_of", or "none_of")

Returns:

a filtered dataframe

template<typename T> inline ROOT::RDF::RNode Quantity(ROOT::RDF::RNode df, const std::string &filtername, const std::string &quantity, const std::vector<T> &selection)

This function filters the rows of the input dataframe by checking if a specified quantity exists in the provided selection vector. Rows where the quantity is found in the selection vector are kept, while others are removed.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
filtername – name of the filter to be applied (used in the dataframe report)
quantity – name of the quantity column in the dataframe of type T
selection – a vector containing the selection of values of type T to filter the quantity against

Returns:

a filtered dataframe

namespace quantity

Functions

ROOT::RDF::RNode GenerateSeed(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lumi, const std::string &run, const std::string &event, const UInt_t &master_seed = 42)

This function defines a new column in the dataframe with seeds for a random number generator for each event.

The seed value for each event is calculated by concatenating event index variables and a seed value to {seed}_{lumi}_{run}_{event}. From that, a SHA256 hash is calculated. The first four bytes of the hash are then used to create a 32-bit unsigned integer, which serves as the event seed.

Parameters:

df – input dataframe
outputname – name of the new column containing the generated event seeds
lumi – name of the column containing the luminosity block number
run – name of the column containing the run number
event – name of the column containing the event number
master_seed – master seed value to be added to the hash used for event seed generation

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode EvenOddFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity)

This function creates a flag column based on a quantity. The flag is set to true if the quantity value is even and false if it is odd. This can be useful for splitting datasets into two subsets.

Template Parameters:

T – type of the quantity (e.g. ULong64_t, int)

Parameters:

df – input dataframe
outputname – name of the new flag column
quantity – name of the column containing a quantity that can be used to define the flag (e.g., event ID)

Returns:

a dataframe with the new flag column

template<typename T> inline ROOT::RDF::RNode MinFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a minimum threshold requirement. The flag is created by comparing the value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:

df – input dataframe
outputname – name of the new column containing the selected event flag
quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T
threshold – minimum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T> inline ROOT::RDF::RNode AbsMinFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a minimum threshold requirement. The flag is created by comparing the absolute value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:

df – input dataframe
outputname – name of the new column containing the selected event flag
quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T
threshold – minimum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T> inline ROOT::RDF::RNode MaxFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a maximum threshold requirement. The flag is created by comparing the value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:

df – input dataframe
outputname – name of the new column containing the selected event flag
quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T
threshold – maximum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T> inline ROOT::RDF::RNode AbsMaxFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a maximum threshold requirement. The flag is created by comparing the absolute value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:

df – input dataframe
outputname – name of the new column containing the selected event flag
quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T
threshold – maximum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T> inline ROOT::RDF::RNode EqualFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy an exact threshold requirement. The flag is created by comparing the value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:

df – input dataframe
outputname – name of the new column containing the selected event flag
quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T
threshold – exact threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T> inline ROOT::RDF::RNode AbsEqualFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy an exact threshold requirement. The flag is created by comparing the absolute value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:

df – input dataframe
outputname – name of the new column containing the selected event flag
quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T
threshold – exact threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T> inline ROOT::RDF::RNode Rename(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity)

This function creates a new column in the dataframe with the specified outputname, copying the values from an existing quantity column. The original column remains unchanged.

Template Parameters:

T – type of the input quantity values

Parameters:

df – input dataframe
outputname – name of the new column
quantity – name of the existing column to copy values from

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Define(ROOT::RDF::RNode df, const std::string &outputname, T const &value)

This function adds a new column to the dataframe, assigning it a constant value for all entries.

Template Parameters:

T – type of the value to be assigned

Parameters:

df – input dataframe
outputname – name of the new column
value – constant value to be assigned to the new column

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode GenerateRandomVector(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const int seed = 42)

This function defines a new column in the dataframe, where each element is a randomly generated number. The random values are generated using TRandom3, seeded with a user-specified value and uniformly distributed in the range [0,1]. The number of generated values matches the size of the input column vector.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
outputname – name of the new column containing the generated random vector
quantity – name of the input column whose size determines the length of the random vector
seed – seed value for the random number generator, if not set the answer to everything is used as default 42

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Negate(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity)

This function creates a new column in the dataframe by applying element-wise negation to an existing quantity column.

Template Parameters:

T – type of the input quantity values

Parameters:

df – input dataframe
outputname – name of the new column
quantity – name of the existing column to be negated

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Take(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const std::string &index_vector)

This function extracts values from the given quantity at the indices specified in a collection index. The order of the output values reflects the order of the indices. The function uses ROOT::VecOps::Take internally, leading to the following behavior:

 {C++}
auto values = ROOT::RVec<float>({0.1, 0.2, 0.3, 0.4});
auto index = ROOT::RVec<int>({2, 3, 1});
auto result = ROOT::VecOps::Take(values, index);
result
// (ROOT::VecOps::RVec<float>) {0.3, 0.4, 0.2}

The column index_vector must contain the indices for which values should be extracted, and the quantity column must contain the values of the quantity.

Note that T is the type of the values stored in the RVec containers in the quantity column, e.g., if the column has type RVec<float>, you must use T = float.

Note

If the index is out of range, a default value of type T is returned.

Template Parameters:

T – underlying type of the input column values

Parameters:

df – input dataframe
outputname – name of the new column containing the extracted value
quantity – name of the column from which the value is retrieved
index_vector – index list for values to be extracted

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Get(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const int &index)

This function extracts a value from the given column at a specified index. If the index is out of range, a default value of type T is returned.

Note

If the index is out of range, a default value of type T is returned.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
outputname – name of the new column containing the extracted value
quantity – name of the column from which the value is retrieved
index – fixed index position used to extract the value

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Get(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const std::string &index_vector, const int &position)

This function extracts a value from the given column based on an index stored in another column. If the index is out of range, a default value is returned.

Note

If the index is out of range, a default value of type T is returned.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
outputname – name of the new column containing the extracted value
quantity – name of the column from which the value is retrieved
index_vector – name of the column containing index values
position – position within the index vector used to retrieve the index

Returns:

a dataframe with the new column

template<typename T> ROOT::RDF::RNode GetGenJetForJet(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genjet_quantity, const std::string &jet_genjet_index, const std::string &index_vector, const int &position)

This function gets the gen. jet quantity for a given jet. This function finds the associated gen. jet to a reconstructed jet via indices that are present in nanoAODs.

If the generator-level jet cannot be accessed, the function returns a default value.

Example: Let the column "good_jet_indices" contain the indices of selected AK4 jets. For the Jet collection, the column "Jet_genJetIdx" contains the index of the matched generator-level jet in the GenJet collection. To define the generator-level p_T of the leading reconstructed AK4 jet, one needs to call:

event::quantity::GetGenJetForJet(
    df,
    "jet_gen_pt_1",
    "GenJet_pt",
    "Jet_genJetIdx",
    "good_jet_indices",
    0
)

Template Parameters:

T – type of the input gen. jet column values

Parameters:

df – input dataframe
outputname – name of the output column containing the gen. jet quantity value
genjet_quantity – name of the column containing the gen. jet quantity vector
jet_genjet_index – name of the column containing the association (via index) between the jet and the gen. jet collection
index_vector – name of the column containing the vector with the relevant jet indices
position – position in the index vector that specifies which jet in the jet vector should be used to get its associated gen. jet quantity

Returns:

a dataframe with the new column

template<typename T> ROOT::RDF::RNode GetGenJetForObject(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genjet_quantity, const std::string &jet_genjet_index, const std::string &object_jet_index, const std::string &object_index_vector, const int &position)

This function gets the gen. jet quantity for a given object. All objects are usually also reconstructed as jets. This function finds the corresponding jet and the associated gen. jet via indices that are present in nanoAODs.

If the generator-level jet cannot be accessed, the function returns a default value.

Template Parameters:

T – type of the input gen. jet column values

Parameters:

df – input dataframe
outputname – name of the output column containing the gen. jet quantity value
genjet_quantity – name of the column containing the gen. jet quantity vector
jet_genjet_index – name of the column containing the association (via index) between the jet and the gen. jet collection
object_jet_index – name of the column containing the association (via index) between the object and the jet collection
object_index_vector – name of the column containing the vector with the relevant object indices
position – position in the index vector that specifies which object in the object vector should be used to get its associated gen. jet quantity

Returns:

a dataframe with the new column

template<typename T> ROOT::RDF::RNode GetJetForObject(ROOT::RDF::RNode df, const std::string &outputname, const std::string &jet_quantity, const std::string &object_jet_index, const std::string &object_index_vector, const int &position)

This function gets the jet quantity for a given object. All objects are usually also reconstructed as jets. This function finds the corresponding jet via indices that are present in nanoAODs.

If the reconstruction-level jet cannot be accessed, the function returns a default value.

Template Parameters:

T – type of the input jet column values

Parameters:

df – input dataframe
outputname – name of the output column containing the jet quantity value
jet_quantity – name of the column containing the jet quantity vector
object_jet_index – name of the column containing the association (via index) between the object and the jet collection
object_index_vector – name of the column containing the vector with the relevant object indices
position – position in the index vector that specifies which object in the object vector should be used to get its associated jet quantity

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Sum(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T zero = T(0))

This function computes the sum of the elements in the quantity column for each event. If no elements are selected, a default value (provided by zero) is used as the sum for that event.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
outputname – name of the new column containing the summed values
quantity – name of the column containing the vector of values to be summed
zero – default value to use in ROOT::VecOps::Sum (default is T(0))

Returns:

a dataframe with the new column

template<typename T> inline ROOT::RDF::RNode Sum(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const std::string &index_vector, const T zero = T(0))

This function computes the sum of the elements in the quantity column, selected by the indices from the indices column. The sum is computed per event, and a default value (provided by zero) is used if no elements are selected.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
outputname – name of the new column containing the summed values
quantity – name of the column containing the vector of values to be summed
index_vector – name of the column containing the indices used to select values from quantity
zero – default value to use in ROOT::VecOps::Sum (default is T(0))

Returns:

a dataframe with the new column

template<typename ...Quantities> inline ROOT::RDF::RNode ScalarSum(ROOT::RDF::RNode df, const std::string &outputname, Quantities... quantities)

This function calculates the scalar sum of an arbitrary set of quantities of type float.

Template Parameters:

Quantities – variadic template parameter pack representing the quantity columns

Parameters:

df – input dataframe
outputname – name of the output column containing the scalar sum
quantities – parameter pack of column names that contain the considered quantities

Returns:

a dataframe with a new column

template<typename T> inline ROOT::RDF::RNode Unroll(ROOT::RDF::RNode df, const std::vector<std::string> &outputnames, const std::string &quantity, const size_t &index = 0)

This function recursively unrolls a vector (std::vector<T>) from the quantity column into individual columns in the dataframe. Each element of the vector is stored in a separate column with names provided in the outputnames vector. The function works recursively to define a new column for each element in the vector.

Note

The function is recursive and will create one column for each element of the vector in quantity. If outputnames has fewer entries than the number of elements in the vector, the function will stop at the end of outputnames. The index should not be set outside this function.

Warning

The length of the quantity vector has to be the same for each event.

Template Parameters:

T – type of the input column values

Parameters:

df – input dataframe
outputnames – a vector of names for the new columns where the individual elements of the vector will be stored
quantity – name of the column containing the vector of values to unroll
index – index of the current element to unroll (defaults to 0).

Returns:

a dataframe with the new columns containing each individual element of the vector from the quantity column

namespace reweighting

Functions

ROOT::RDF::RNode Pileup(ROOT::RDF::RNode df, correctionManager::CorrectionManager &correction_manager, const std::string &outputname, const std::string &true_pileup_number, const std::string &corr_file, const std::string &corr_name, const std::string &variation)

This function is used to correct Monte Carlo (MC) simulations for differences in the pileup distribution compared to the one measured in data. It retrieves a per-event weight from a correction file based on the true number of pileup interactions in an event.

The correction files are provided by the Luminosity POG and more information about the pileup reweighting can be found here: https://twiki.cern.ch/twiki/bin/view/CMS/PileupJSONFileforData

Parameters:

df – input dataframe
correction_manager – correction manager responsible for loading the pileup weights file
outputname – name of the output column containing the pileup event weight
true_pileup_number – name of the column containing the true mean number of the poisson distribution for an event from which the number of interactions each bunch crossing has been sampled
corr_file – path to the file with the pileup weights
corr_name – name of the pileup correction in the file, e.g. “Collisions18_UltraLegacy_goldenJSON”
variation – name of the pileup weight variation, options are “nominal”, “up” and “down”

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode PUWeightROOT(ROOT::RDF::RNode df, const std::string &outputname, const std::string &truePUMean, const std::string &datafilename, const std::string &mcfilename, const std::string &histname)

Function used to read out pileup weights from root files.

Note

This function is intended only for cases where the pileup weights are not available in the correction files.

Parameters:

df – input dataframe
outputname – name of the derived weight
truePUMean – name of the column containing the true PU mean of simulated events
datafilename – path to the data rootfile
mcfilename – path to the MC rootfile
histname – name of the histogram stored in the rootfile

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode PartonShower(ROOT::RDF::RNode df, const std::string &outputname, const std::string &ps_weights, const float isr, const float fsr)

This function is used to evaluate the parton shower (PS) weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the initial state radiation (ISR) and final state radiation (FSR) variations to the nominal PS weight.

Depending on the selected ISR and FSR value, a specific index has to be identified. The mapping between the index and the ISR and FSR values is:

ISR	FSR	index
2.0	1.0	0
1.0	2.0	1
0.5	1.0	2
1.0	0.5	3

Note

For some simulated samples this mapping might be defined differently, therefore, it is advisable to check the documentation of the PSWeight branch in the nanoAOD files of the samples if issues occur.

Parameters:

df – input dataframe
outputname – name of the output column containing the ISR/FSR event weight
ps_weights – name of the column containing the parton shower (ISR/FSR) weights
isr – value of the ISR variation, possible values are 0.5, 1.0, 2.0
fsr – value of the FSR variation, possible values are 0.5, 1.0, 2.0

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode LHEscale(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lhe_scale_weights, const float mu_r, const float mu_f)

This function is used to evaluate the LHE scale weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the factorization and renormalization scale variations to the nominal scale weight.

Depending on the selected \(\mu_R\) and \(\mu_F\) value, a specific index has to be identified. The mapping between the index and the \(\mu_R\) and \(\mu_F\) values is:

mu_f	mu_r	index
0.5	0.5	0
1.0	0.5	1
2.0	0.5	2
0.5	1.0	3
1.0	1.0	4 (not always included)
2.0	1.0	5 (4)
0.5	2.0	6 (5)
1.0	2.0	7 (6)
2.0	2.0	8 (7)

Note

For some simulated samples this mapping might be defined differently, therefore, it is advisable to check the documentation of the LHEScaleWeight branch in the nanoAOD files of the samples if issues occur.

Parameters:

df – input dataframe
outputname – name of the output column containing the LHE scale event weight
lhe_scale_weights – name of the column containing the LHE scale weights
mu_r – value of \(\mu_R\) variation, possible values are 0.5, 1.0, 2.0
mu_f – value of \(\mu_F\) variation, possible values are 0.5, 1.0, 2.0

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode LHEpdf(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lhe_pdf_weights, const std::string &variation)

This function is used to evaluate the LHE PDF weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the variation of the PDF weights to the nominal PDF weight.

The PDF weights consist of 101 weights, where the first weight is the nominal weight and the remaining 100 weights correspond to alternative PDF sets.

Note

The proper procedure is to use each alternative PDF set as an independent systematic vatiation. However, in case of this function, a simplified approach is used to calculate a single PDF weight variation. The standard deviation of the 100 alternative PDF weights is calculated and used to define the up and down variations as follows: \(w_{up/down} = 1 \pm \sqrt{\sum_{i=1}^{100} (w_i - 1)^2}\)

Parameters:

df – input dataframe
outputname – name of the output column containing the LHE PDF event weight
lhe_pdf_weights – name of the column containing the LHE PDF weights
variation – name of the variation that should be evaluated, possible values are “nominal”, “up”, “down”

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode LHEalphaS(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lhe_pdf_weights, const std::string &variation)

This function is used to evaluate the LHE \(\alpha_S\) weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the variation of the \(\alpha_S\) weight to the nominal weight.

For some samples the \(\alpha_S\) weight is included in the PDF weights vector. In that case the full PDF weights vector is expected to contains 103 entries, where the first 101 entries are PDF weights and the last two entries correspond to the up and down varied \(\alpha_S\) weight.

Parameters:

df – input dataframe
outputname – name of the output column containing the LHE \(\alpha_S\) event weight
lhe_pdf_weights – name of the column containing the LHE \(\alpha_S\) weights (it is part of the LHE PDF weights)
variation – name of the variation that should be evaluated, possible values are “nominal”, “up”, “down”

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode TopPt(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genparticles_pdg_id, const std::string &genparticles_status_flags, const std::string &genparticles_pt)

This function is used to calculate an event weight to correct the top quark \(p_T\) mismodeling in simulated \(t\bar{t}\) events. The correction is provided by the Top POG and in case of this function the calculated weight corrects NLO simulation (POWHEG+Pythia8) to data.

For reference: https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopPtReweighting

The weight is calculated as \(w=\sqrt{SF(t)\cdot SF(\bar{t})}\)

with \(SF= \exp(0.0615-0.0005\cdot p_T)\)

Note

The Top POG also provides other reweighting functions, e.g. for NNLO to data or NLO to NNLO which could be preferred depending on the use case.

Parameters:

df – input dataframe
outputname – name of the output column containing the derived event weight
genparticles_pdg_id – name of the column containing the PDG IDs of the generator particles
genparticles_status_flags – name of the column containing the status flags of the generator particles, where bit 13 contains the isLastCopy flag
genparticles_pt – name of the column containing the pt of the generator particles

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode TopPtRun3(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genparticles_pdg_id, const std::string &genparticles_status_flags, const std::string &genparticles_pt)

This function is used to calculate an event weight to correct the top quark \(p_T\) mismodeling in simulated \(t\bar{t}\) events. The correction is provided by the Top POG and in case of this function the calculated weight corrects NLO simulation (POWHEG+Pythia8) to NNLO.

For reference: https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopPtReweighting

The weight is calculated as \(w=\sqrt{SF(t)\cdot SF(\bar{t})}\)

with \(SF= 0.103\cdot\exp(-0.0118\cdot p_T)-0.000134\cdot p_T+0.973.\)

Futher this weight is multiplied with a correction factor to account for the changed center of mass energy between Run 2 and Run 3. The correction factor is calculated as

\(SF=0.991+0.000075\cdot p_T\)

and is applied to each top quark individually same as the first weight. The final weight it then calculated as

\(w_{total}=w\cdot *w_{corr}\)

Note

This follows the new recommendations from TOP PAG in AN-2025-050 for Run 3.

Parameters:

df – input dataframe
outputname – name of the output column containing the derived event weight
genparticles_pdg_id – name of the column containing the PDG IDs of the generator particles
genparticles_status_flags – name of the column containing the status flags of the generator particles, where bit 13 contains the isLastCopy flag
genparticles_pt – name of the column containing the pt of the generator particles

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode ZBosonPt(ROOT::RDF::RNode df, correctionManager::CorrectionManager &correction_manager, const std::string &outputname, const std::string &gen_boson, const std::string &corr_file, const std::string &corr_name, const std::string &order, const std::string &variation)

This function is used to calculate an event weight to correct the Z boson \(p_T\). These corrections are recommended especially for LO Drell-Yan samples, where the \(p_T\) and mass of the Z boson are mismodeled compared to data. This function is defined for the corrections provided by the CMS HLepRare group. More details can be found here: https://cms-higgs-leprare.docs.cern.ch/htt-common/DY_reweight/.

Note

HLepRare only provides corrections for Run3. For Run2 see event::reweighting::ZPtMass.

Parameters:

df – input dataframe
correction_manager – correction manager responsible for loading the correction file
outputname – name of the output column containing the derived event weight
gen_boson – name of the column containing the Lorentz vector of the generator-level boson
corr_file – path to the correction file containing the Z boson \(p_T\) corrections
corr_name – name of the correction in the json file
order – order of the used DY samples: “LO” for madgraph, “NLO” for amcatnlo, “NNLO” for powheg
variation – name of the variation that should be evaluated, options are “nom”, “up”, “down” or “upX”, “downX”. For “up” and “down” the uncertainty is defined by the envelope of all provided uncertainty sources in the correction file. Otherwise the specific uncertainty source “X” is used (where X is a number e.g. 1,2,3,…).

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode ZPtMass(ROOT::RDF::RNode df, const std::string &outputname, const std::string &gen_boson, const std::string &workspace_file, const std::string &functor_name, const std::string &argset)

This function is used to calculate an event weight based on Z boson \(p_T\) and mass corrections. These corrections are recommended especially for LO Drell-Yan samples, where the \(p_T\) and mass of the Z boson are mismodeled compared to data.

Note

The function is intended for Run 2 analysis. In Run 3 Zpt corrections are handled through correctionlib, see the function below.

Warning

This function is based on workspaces and functions that were derived for the legacy \(H(\tau\tau)\) analysis and therefore not up-to-date anymore for UL or Run3.

Parameters:

df – input dataframe
outputname – name of the output column containing the derived event weight
gen_boson – name of the column containing the Lorentz vector of the generator-level boson
workspace_file – path to the file which contains the workspace that should be used
functor_name – name of the function in the workspace that should be used
argset – additional arguments that are needed for the function

Returns:

a new dataframe containing the new column