Namespace: Event

namespace event

Functions

template<typename ...Args>
inline auto CombineFlags(ROOT::RDF::RNode df, const std::string &outputname, Args... args)

This function combines multiple boolean flags into a single boolean value based on the selected mode (“any_of”, “all_of”, or “none_of”). The mode determines how the flags are evaluated:

  • "any_of": Returns true if at least one of the flags is true

  • "all_of": Returns true if all flags are true

  • "none_of": Returns true if none of the flags are true

Note

The mode ("any_of", "all_of", or "none_of") is extracted as the last argument in the args parameter pack, and the rest of the arguments are treated as individual flag columns.

Template Parameters:

Args – variadic template parameter pack representing the flag columns plus mode

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the combined flag

  • args – parameter pack of column names that contain the considered flags of type bool, with the last argument being the mode ("any_of", "all_of", or "none_of")

Returns:

a dataframe with a new column

namespace filter

Functions

ROOT::RDF::RNode GoldenJSON(ROOT::RDF::RNode df, correctionManager::CorrectionManager &correction_manager, const std::string &filtername, const std::string &run, const std::string &luminosity, const std::string &json_path)

This function applies a filter to the input dataframe using a Golden JSON file, which contains a mapping of valid run-luminosity pairs. The dataframe is filtered by checking if the run and luminosity values for each row match the entries in the Golden JSON. Rows with invalid run-luminosity pairs are removed.

The Golden JSON files are taken from the CMS recommendations.

Run2: https://twiki.cern.ch/twiki/bin/view/CMS/LumiRecommendationsRun2

Run3: https://twiki.cern.ch/twiki/bin/view/CMS/LumiRecommendationsRun3 (not added yet)

Parameters:
  • df – input dataframe

  • correction_manager – correction manager responsible for loading the Golden JSON

  • filtername – name of the filter to be applied (used in the dataframe report)

  • run – name of the run column

  • luminosity – name of the luminosity column

  • json_path – path to the Golden JSON file

Returns:

a filtered dataframe

inline ROOT::RDF::RNode Flag(ROOT::RDF::RNode df, const std::string &filtername, const std::string &flagname)

This function applies a filter to the input dataframe based on a boolean flag column. It returns only the rows where the flag value is true.

Use case examples are the noise filters recommended by the CMS JetMET group (https://twiki.cern.ch/twiki/bin/viewauth/CMS/MissingETOptionalFiltersRun2).

Parameters:
  • df – input dataframe

  • filtername – name of the filter to be applied (used in the dataframe report)

  • flagname – name of the boolean flag column to use for filtering

Returns:

a filtered dataframe

inline ROOT::RDF::RNode InvertedFlag(ROOT::RDF::RNode df, const std::string &filtername, const std::string &flagname)

This function applies a filter to the input dataframe based on a boolean flag column. It returns only the rows where the flag value is false.

Parameters:
  • df – input dataframe

  • filtername – name of the filter to be applied (used in the dataframe report)

  • flagname – name of the boolean flag column to use for filtering

Returns:

a filtered dataframe

template<typename ...Args>
inline auto Flags(ROOT::RDF::RNode df, const std::string &filtername, Args... args)

This function filters the rows of the input dataframe by evaluating multiple boolean flags according to a specified mode. The filtering mode can be “any_of”, “all_of”, or “none_of”:

  • "any_of": Keeps the rows where at least one flag is true

  • "all_of": Keeps the rows where all flags are true

  • "none_of": Keeps the rows where none of the flags are true

Note

The last argument must be the mode, while the preceding arguments are the boolean flag columns to be evaluated.

Template Parameters:

Args – variadic template parameter pack representing the flag columns plus mode

Parameters:
  • df – input dataframe

  • filtername – name of the filter to be applied (used in the dataframe report)

  • args – parameter pack of column names that contain the considered flags of type bool, with the last argument being the mode ("any_of", "all_of", or "none_of")

Returns:

a filtered dataframe

template<typename T>
inline ROOT::RDF::RNode Quantity(ROOT::RDF::RNode df, const std::string &filtername, const std::string &quantity, const std::vector<T> &selection)

This function filters the rows of the input dataframe by checking if a specified quantity exists in the provided selection vector. Rows where the quantity is found in the selection vector are kept, while others are removed.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • filtername – name of the filter to be applied (used in the dataframe report)

  • quantity – name of the quantity column in the dataframe of type T

  • selection – a vector containing the selection of values of type T to filter the quantity against

Returns:

a filtered dataframe

namespace quantity

Functions

ROOT::RDF::RNode GenerateSeed(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lumi, const std::string &run, const std::string &event, const UInt_t &master_seed = 42)

This function defines a new column in the dataframe with seeds for a random number generator for each event.

The seed value for each event is calculated by concatenating event index variables and a seed value to {seed}_{lumi}_{run}_{event}. From that, a SHA256 hash is calculated. The first four bytes of the hash are then used to create a 32-bit unsigned integer, which serves as the event seed.

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the generated event seeds

  • lumi – name of the column containing the luminosity block number

  • run – name of the column containing the run number

  • event – name of the column containing the event number

  • master_seed – master seed value to be added to the hash used for event seed generation

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode EvenOddFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity)

This function creates a flag column based on a quantity. The flag is set to true if the quantity value is even and false if it is odd. This can be useful for splitting datasets into two subsets.

Template Parameters:

T – type of the quantity (e.g. ULong64_t, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new flag column

  • quantity – name of the column containing a quantity that can be used to define the flag (e.g., event ID)

Returns:

a dataframe with the new flag column

template<typename T>
inline ROOT::RDF::RNode MinFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a minimum threshold requirement. The flag is created by comparing the value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the selected event flag

  • quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T

  • threshold – minimum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T>
inline ROOT::RDF::RNode AbsMinFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a minimum threshold requirement. The flag is created by comparing the absolute value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the selected event flag

  • quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T

  • threshold – minimum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T>
inline ROOT::RDF::RNode MaxFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a maximum threshold requirement. The flag is created by comparing the value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the selected event flag

  • quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T

  • threshold – maximum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T>
inline ROOT::RDF::RNode AbsMaxFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy a maximum threshold requirement. The flag is created by comparing the absolute value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the selected event flag

  • quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T

  • threshold – maximum threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T>
inline ROOT::RDF::RNode EqualFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy an exact threshold requirement. The flag is created by comparing the value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the selected event flag

  • quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T

  • threshold – exact threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T>
inline ROOT::RDF::RNode AbsEqualFlag(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T &threshold)

This function defines a flag for event quantities that satisfy an exact threshold requirement. The flag is created by comparing the absolute value in the specified quantity column with the given threshold, marking elements as true if they pass the cut and false otherwise.

Template Parameters:

T – type of the threshold and input quantity (e.g. float, int)

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the selected event flag

  • quantity – name of the quantity column for which the cut should be evaluated, expected to be of type T

  • threshold – exact threshold value of type T

Returns:

a dataframe containing the new flag as a column

template<typename T>
inline ROOT::RDF::RNode Rename(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity)

This function creates a new column in the dataframe with the specified outputname, copying the values from an existing quantity column. The original column remains unchanged.

Template Parameters:

T – type of the input quantity values

Parameters:
  • df – input dataframe

  • outputname – name of the new column

  • quantity – name of the existing column to copy values from

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Define(ROOT::RDF::RNode df, const std::string &outputname, T const &value)

This function adds a new column to the dataframe, assigning it a constant value for all entries.

Template Parameters:

T – type of the value to be assigned

Parameters:
  • df – input dataframe

  • outputname – name of the new column

  • value – constant value to be assigned to the new column

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode GenerateRandomVector(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const int seed = 42)

This function defines a new column in the dataframe, where each element is a randomly generated number. The random values are generated using TRandom3, seeded with a user-specified value and uniformly distributed in the range [0,1]. The number of generated values matches the size of the input column vector.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the generated random vector

  • quantity – name of the input column whose size determines the length of the random vector

  • seed – seed value for the random number generator, if not set the answer to everything is used as default 42

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Negate(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity)

This function creates a new column in the dataframe by applying element-wise negation to an existing quantity column.

Template Parameters:

T – type of the input quantity values

Parameters:
  • df – input dataframe

  • outputname – name of the new column

  • quantity – name of the existing column to be negated

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Take(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const std::string &index_vector)

This function extracts values from the given quantity at the indices specified in a collection index. The order of the output values reflects the order of the indices. The function uses ROOT::VecOps::Take internally, leading to the following behavior:

 {C++}
auto values = ROOT::RVec<float>({0.1, 0.2, 0.3, 0.4});
auto index = ROOT::RVec<int>({2, 3, 1});
auto result = ROOT::VecOps::Take(values, index);
result
// (ROOT::VecOps::RVec<float>) {0.3, 0.4, 0.2}

The column index_vector must contain the indices for which values should be extracted, and the quantity column must contain the values of the quantity.

Note that T is the type of the values stored in the RVec containers in the quantity column, e.g., if the column has type RVec<float>, you must use T = float.

Note

If the index is out of range, a default value of type T is returned.

Template Parameters:

T – underlying type of the input column values

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the extracted value

  • quantity – name of the column from which the value is retrieved

  • index_vector – index list for values to be extracted

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Get(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const int &index)

This function extracts a value from the given column at a specified index. If the index is out of range, a default value of type T is returned.

Note

If the index is out of range, a default value of type T is returned.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the extracted value

  • quantity – name of the column from which the value is retrieved

  • index – fixed index position used to extract the value

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Get(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const std::string &index_vector, const int &position)

This function extracts a value from the given column based on an index stored in another column. If the index is out of range, a default value is returned.

Note

If the index is out of range, a default value of type T is returned.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the extracted value

  • quantity – name of the column from which the value is retrieved

  • index_vector – name of the column containing index values

  • position – position within the index vector used to retrieve the index

Returns:

a dataframe with the new column

template<typename T>
ROOT::RDF::RNode GetGenJetForJet(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genjet_quantity, const std::string &jet_genjet_index, const std::string &index_vector, const int &position)

This function gets the gen. jet quantity for a given jet. This function finds the associated gen. jet to a reconstructed jet via indices that are present in nanoAODs.

If the generator-level jet cannot be accessed, the function returns a default value.

Example: Let the column "good_jet_indices" contain the indices of selected AK4 jets. For the Jet collection, the column "Jet_genJetIdx" contains the index of the matched generator-level jet in the GenJet collection. To define the generator-level pT of the leading reconstructed AK4 jet, one needs to call:

event::quantity::GetGenJetForJet(
    df,
    "jet_gen_pt_1",
    "GenJet_pt",
    "Jet_genJetIdx",
    "good_jet_indices",
    0
)
Template Parameters:

T – type of the input gen. jet column values

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the gen. jet quantity value

  • genjet_quantity – name of the column containing the gen. jet quantity vector

  • jet_genjet_index – name of the column containing the association (via index) between the jet and the gen. jet collection

  • index_vector – name of the column containing the vector with the relevant jet indices

  • position – position in the index vector that specifies which jet in the jet vector should be used to get its associated gen. jet quantity

Returns:

a dataframe with the new column

template<typename T>
ROOT::RDF::RNode GetGenJetForObject(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genjet_quantity, const std::string &jet_genjet_index, const std::string &object_jet_index, const std::string &object_index_vector, const int &position)

This function gets the gen. jet quantity for a given object. All objects are usually also reconstructed as jets. This function finds the corresponding jet and the associated gen. jet via indices that are present in nanoAODs.

If the generator-level jet cannot be accessed, the function returns a default value.

Template Parameters:

T – type of the input gen. jet column values

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the gen. jet quantity value

  • genjet_quantity – name of the column containing the gen. jet quantity vector

  • jet_genjet_index – name of the column containing the association (via index) between the jet and the gen. jet collection

  • object_jet_index – name of the column containing the association (via index) between the object and the jet collection

  • object_index_vector – name of the column containing the vector with the relevant object indices

  • position – position in the index vector that specifies which object in the object vector should be used to get its associated gen. jet quantity

Returns:

a dataframe with the new column

template<typename T>
ROOT::RDF::RNode GetJetForObject(ROOT::RDF::RNode df, const std::string &outputname, const std::string &jet_quantity, const std::string &object_jet_index, const std::string &object_index_vector, const int &position)

This function gets the jet quantity for a given object. All objects are usually also reconstructed as jets. This function finds the corresponding jet via indices that are present in nanoAODs.

If the reconstruction-level jet cannot be accessed, the function returns a default value.

Template Parameters:

T – type of the input jet column values

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the jet quantity value

  • jet_quantity – name of the column containing the jet quantity vector

  • object_jet_index – name of the column containing the association (via index) between the object and the jet collection

  • object_index_vector – name of the column containing the vector with the relevant object indices

  • position – position in the index vector that specifies which object in the object vector should be used to get its associated jet quantity

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Sum(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const T zero = T(0))

This function computes the sum of the elements in the quantity column for each event. If no elements are selected, a default value (provided by zero) is used as the sum for that event.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the summed values

  • quantity – name of the column containing the vector of values to be summed

  • zero – default value to use in ROOT::VecOps::Sum (default is T(0))

Returns:

a dataframe with the new column

template<typename T>
inline ROOT::RDF::RNode Sum(ROOT::RDF::RNode df, const std::string &outputname, const std::string &quantity, const std::string &index_vector, const T zero = T(0))

This function computes the sum of the elements in the quantity column, selected by the indices from the indices column. The sum is computed per event, and a default value (provided by zero) is used if no elements are selected.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • outputname – name of the new column containing the summed values

  • quantity – name of the column containing the vector of values to be summed

  • index_vector – name of the column containing the indices used to select values from quantity

  • zero – default value to use in ROOT::VecOps::Sum (default is T(0))

Returns:

a dataframe with the new column

template<typename ...Quantities>
inline ROOT::RDF::RNode ScalarSum(ROOT::RDF::RNode df, const std::string &outputname, Quantities... quantities)

This function calculates the scalar sum of an arbitrary set of quantities of type float.

Template Parameters:

Quantities – variadic template parameter pack representing the quantity columns

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the scalar sum

  • quantities – parameter pack of column names that contain the considered quantities

Returns:

a dataframe with a new column

template<typename T>
inline ROOT::RDF::RNode Unroll(ROOT::RDF::RNode df, const std::vector<std::string> &outputnames, const std::string &quantity, const size_t &index = 0)

This function recursively unrolls a vector (std::vector<T>) from the quantity column into individual columns in the dataframe. Each element of the vector is stored in a separate column with names provided in the outputnames vector. The function works recursively to define a new column for each element in the vector.

Note

The function is recursive and will create one column for each element of the vector in quantity. If outputnames has fewer entries than the number of elements in the vector, the function will stop at the end of outputnames. The index should not be set outside this function.

Warning

The length of the quantity vector has to be the same for each event.

Template Parameters:

T – type of the input column values

Parameters:
  • df – input dataframe

  • outputnames – a vector of names for the new columns where the individual elements of the vector will be stored

  • quantity – name of the column containing the vector of values to unroll

  • index – index of the current element to unroll (defaults to 0).

Returns:

a dataframe with the new columns containing each individual element of the vector from the quantity column

namespace reweighting

Functions

ROOT::RDF::RNode Pileup(ROOT::RDF::RNode df, correctionManager::CorrectionManager &correction_manager, const std::string &outputname, const std::string &true_pileup_number, const std::string &corr_file, const std::string &corr_name, const std::string &variation)

This function is used to correct Monte Carlo (MC) simulations for differences in the pileup distribution compared to the one measured in data. It retrieves a per-event weight from a correction file based on the true number of pileup interactions in an event.

The correction files are provided by the Luminosity POG and more information about the pileup reweighting can be found here: https://twiki.cern.ch/twiki/bin/view/CMS/PileupJSONFileforData

Parameters:
  • df – input dataframe

  • correction_manager – correction manager responsible for loading the pileup weights file

  • outputname – name of the output column containing the pileup event weight

  • true_pileup_number – name of the column containing the true mean number of the poisson distribution for an event from which the number of interactions each bunch crossing has been sampled

  • corr_file – path to the file with the pileup weights

  • corr_name – name of the pileup correction in the file, e.g. “Collisions18_UltraLegacy_goldenJSON”

  • variation – name of the pileup weight variation, options are “nominal”, “up” and “down”

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode PUWeightROOT(ROOT::RDF::RNode df, const std::string &outputname, const std::string &truePUMean, const std::string &datafilename, const std::string &mcfilename, const std::string &histname)

Function used to read out pileup weights from root files.

Note

This function is intended only for cases where the pileup weights are not available in the correction files.

Parameters:
  • df – input dataframe

  • outputname – name of the derived weight

  • truePUMean – name of the column containing the true PU mean of simulated events

  • datafilename – path to the data rootfile

  • mcfilename – path to the MC rootfile

  • histname – name of the histogram stored in the rootfile

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode PartonShower(ROOT::RDF::RNode df, const std::string &outputname, const std::string &ps_weights, const float isr, const float fsr)

This function is used to evaluate the parton shower (PS) weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the initial state radiation (ISR) and final state radiation (FSR) variations to the nominal PS weight.

Depending on the selected ISR and FSR value, a specific index has to be identified. The mapping between the index and the ISR and FSR values is:

ISR

FSR

index

2.0

1.0

0

1.0

2.0

1

0.5

1.0

2

1.0

0.5

3

Note

For some simulated samples this mapping might be defined differently, therefore, it is advisable to check the documentation of the PSWeight branch in the nanoAOD files of the samples if issues occur.

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the ISR/FSR event weight

  • ps_weights – name of the column containing the parton shower (ISR/FSR) weights

  • isr – value of the ISR variation, possible values are 0.5, 1.0, 2.0

  • fsr – value of the FSR variation, possible values are 0.5, 1.0, 2.0

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode LHEscale(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lhe_scale_weights, const float mu_r, const float mu_f)

This function is used to evaluate the LHE scale weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the factorization and renormalization scale variations to the nominal scale weight.

Depending on the selected \(\mu_R\) and \(\mu_F\) value, a specific index has to be identified. The mapping between the index and the \(\mu_R\) and \(\mu_F\) values is:

mu_f

mu_r

index

0.5

0.5

0

1.0

0.5

1

2.0

0.5

2

0.5

1.0

3

1.0

1.0

4 (not always included)

2.0

1.0

5 (4)

0.5

2.0

6 (5)

1.0

2.0

7 (6)

2.0

2.0

8 (7)

Note

For some simulated samples this mapping might be defined differently, therefore, it is advisable to check the documentation of the LHEScaleWeight branch in the nanoAOD files of the samples if issues occur.

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the LHE scale event weight

  • lhe_scale_weights – name of the column containing the LHE scale weights

  • mu_r – value of \(\mu_R\) variation, possible values are 0.5, 1.0, 2.0

  • mu_f – value of \(\mu_F\) variation, possible values are 0.5, 1.0, 2.0

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode LHEpdf(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lhe_pdf_weights, const std::string &variation)

This function is used to evaluate the LHE PDF weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the variation of the PDF weights to the nominal PDF weight.

The PDF weights consist of 101 weights, where the first weight is the nominal weight and the remaining 100 weights correspond to alternative PDF sets.

Note

The proper procedure is to use each alternative PDF set as an independent systematic vatiation. However, in case of this function, a simplified approach is used to calculate a single PDF weight variation. The standard deviation of the 100 alternative PDF weights is calculated and used to define the up and down variations as follows: \(w_{up/down} = 1 \pm \sqrt{\sum_{i=1}^{100} (w_i - 1)^2}\)

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the LHE PDF event weight

  • lhe_pdf_weights – name of the column containing the LHE PDF weights

  • variation – name of the variation that should be evaluated, possible values are “nominal”, “up”, “down”

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode LHEalphaS(ROOT::RDF::RNode df, const std::string &outputname, const std::string &lhe_pdf_weights, const std::string &variation)

This function is used to evaluate the LHE \(\alpha_S\) weight of an event. The weights are stored in the nanoAOD files and defined as \(w_{variation}\) / \(w_{nominal}\). The nominal weight is already applied, therefore, the main use of this function is to get the variation of the \(\alpha_S\) weight to the nominal weight.

For some samples the \(\alpha_S\) weight is included in the PDF weights vector. In that case the full PDF weights vector is expected to contains 103 entries, where the first 101 entries are PDF weights and the last two entries correspond to the up and down varied \(\alpha_S\) weight.

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the LHE \(\alpha_S\) event weight

  • lhe_pdf_weights – name of the column containing the LHE \(\alpha_S\) weights (it is part of the LHE PDF weights)

  • variation – name of the variation that should be evaluated, possible values are “nominal”, “up”, “down”

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode TopPt(ROOT::RDF::RNode df, const std::string &outputname, const std::string &genparticles_pdg_id, const std::string &genparticles_status_flags, const std::string &genparticles_pt)

This function is used to calculate an event weight to correct the top quark \(p_T\) mismodeling in simulated \(t\bar{t}\) events. The correction is provided by the Top POG and in case of this function the calculated weight corrects NLO simulation (POWHEG+Pythia8) to data.

For reference: https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopPtReweighting

The weight is calculated as \(w=\sqrt{SF(t)\cdot SF(\bar{t})}\)

with \(SF= \exp(0.0615-0.0005\cdot p_T)\)

Note

The Top POG also provides other reweighting functions, e.g. for NNLO to data or NLO to NNLO which could be preferred depending on the use case.

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the derived event weight

  • genparticles_pdg_id – name of the column containing the PDG IDs of the generator particles

  • genparticles_status_flags – name of the column containing the status flags of the generator particles, where bit 13 contains the isLastCopy flag

  • genparticles_pt – name of the column containing the pt of the generator particles

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode ZBosonPt(ROOT::RDF::RNode df, correctionManager::CorrectionManager &correction_manager, const std::string &outputname, const std::string &gen_boson, const std::string &corr_file, const std::string &corr_name, const std::string &order, const std::string &variation)

This function is used to calculate an event weight to correct the Z boson \(p_T\). These corrections are recommended especially for LO Drell-Yan samples, where the \(p_T\) and mass of the Z boson are mismodeled compared to data. This function is defined for the corrections provided by the CMS HLepRare group. More details can be found here: https://cms-higgs-leprare.docs.cern.ch/htt-common/DY_reweight/.

Note

HLepRare only provides corrections for Run3. For Run2 see event::reweighting::ZPtMass.

Parameters:
  • df – input dataframe

  • correction_manager – correction manager responsible for loading the correction file

  • outputname – name of the output column containing the derived event weight

  • gen_boson – name of the column containing the Lorentz vector of the generator-level boson

  • corr_file – path to the correction file containing the Z boson \(p_T\) corrections

  • corr_name – name of the correction in the json file

  • order – order of the used DY samples: “LO” for madgraph, “NLO” for amcatnlo, “NNLO” for powheg

  • variation – name of the variation that should be evaluated, options are “nom”, “up”, “down” or “upX”, “downX”. For “up” and “down” the uncertainty is defined by the envelope of all provided uncertainty sources in the correction file. Otherwise the specific uncertainty source “X” is used (where X is a number e.g. 1,2,3,…).

Returns:

a new dataframe containing the new column

ROOT::RDF::RNode ZPtMass(ROOT::RDF::RNode df, const std::string &outputname, const std::string &gen_boson, const std::string &workspace_file, const std::string &functor_name, const std::string &argset)

This function is used to calculate an event weight based on Z boson \(p_T\) and mass corrections. These corrections are recommended especially for LO Drell-Yan samples, where the \(p_T\) and mass of the Z boson are mismodeled compared to data.

Note

The function is intended for Run 2 analysis. In Run 3 Zpt corrections are handled through correctionlib, see the function below.

Warning

This function is based on workspaces and functions that were derived for the legacy \(H(\tau\tau)\) analysis and therefore not up-to-date anymore for UL or Run3.

Parameters:
  • df – input dataframe

  • outputname – name of the output column containing the derived event weight

  • gen_boson – name of the column containing the Lorentz vector of the generator-level boson

  • workspace_file – path to the file which contains the workspace that should be used

  • functor_name – name of the function in the workspace that should be used

  • argset – additional arguments that are needed for the function

Returns:

a new dataframe containing the new column