Skip to content

Commit f70af83

Browse files
maadcoenJulesVandenbroeckjuvandenrigamafrahm
authored
Ghent analysis/merge upstream (#112)
* Fix to jer application on jec variations (columnflow#665) * initial commit, fix to jer application to jec variations * change smearing factor to a callable function and calculate smearing factor for each jec variation. * update jets and mets definitions to ensure deep copy of original event array is taken * add jec-specfic columns to uses * Vectorized jer application over jec variations (#92) * Simplify jer init. * Overhaul vectorized jer processing. * Minor sources fix in jec. * move jec_variations, jer_variations, and postfixes to jer_init. Also include jec_ prefix to jec_variations as jec_variations is only used for registering uses and produces and storing jer variations in a dictionary. * change jer_random_normal variable name to random_normal --------- Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Refactoring for 0.3 release (columnflow#628) * Fix task inehritance, adjust store parts. * Typo. * Revert stray changes. * Add store_part_anchor. * Re-purpose store part anchor for config part only. * Define config_store_anchor on ConfigTask for subclasses. * Fix inheritance order in datacard task. * TAF init refactoring draft. * Adapt template analysis. * Add comment. * Add review comments by @mafrahm. * Start. * Minor cleanup. * Port ConfigTask and ShiftTask. * Propagate ConfigTask changes to mixins and other tasks. * Update inference interface and tasks for multi config inputs. * Update hist hook handling. * Fix hist hook lookup. * Typo. * Update docstring. * Add union and intersection modes to default resolution. * Overhaul find_config_objects. * Update columnflow/tasks/framework/base.py Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> * Port config lookup to variables, review comment. * Typo. * Use dicts. * Update readme, fix typo in config task. * Update DatasetsProcessesMixin and ShiftSourcesMixin. * Improve loop over configs in shift validation. * Merge MultiConfigPlotting into refactor/taf_init (columnflow#630) * implement MultiConfigTask * disable TaskArrayFunction init when there is not config inst * fix MLEvaluation reqs * update template and CSPM parameter description * fix tests * add warning when cspm defaults are set in config_inst * tmp * fixes of PlotShiftMixin task * fixes in PlotShiftMixin * modify PlotVariables1d run method for multi config * reintroduce MLModelsMixin to plotting * add analysis_inst instead of config_inst in preparation_producer * resolve processes and variables per config * add PlotVariablesPerConfig wrapper tasks * split between DatasetsProcessesMixin and MultiConfigDatasetsProcessesMixin + cleanup * make ShiftSourcesMixin work with multiple configs * add PlotShiftedVariablesPerConfig1D mixin * fix bug when dataset is missing in first config * fix mixins from CreateDatacards * add default config to MultiConfigTask * move defaults to analysis_inst in AnalysisTask * move process and variable settings back to config inst * fixes to the two previous commits... * move multi config resolving function to AnalysisTask * review comments * set weight_producer in analysis inst in template * fix lint in template * decouple ShiftTask and ConfigTask and remove PlotShiftMixin * remove checking shifts for all reqs * move default categories, variables, and inference model to analysis inst * cleanup in VariablesMixin * remove config_inst from get_known_shifts signature * resolve shifts per config * store shift and category names instead of ids in histograms * fix hist tests * fix shifted variable plot func * handle missing shift bins in plot_shifted_variables * fix missing shift bins in plotting task * fill category as int and transform to str later * add growth to translated axis * fix and extend hist_util tests * loop over variables when switching to strcat * same for cutflow (+lint) * cleanup * fix resolving of ml model insts * fix process order in plotting * fix HistogramsUser task class inheritance * allow resolving selector step defaults * move selector_steps default to analysis_inst * fix bug in obtaining unique category ids * fir MRO in CreateHistograms * Update columnflow/tasks/plotting.py Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> * feature/MultiConfigPlotting * cleanup and reintroduce ML mixins from MultiConfigPlotting * fix murmuf_envelope Producer * remove config_inst from get_known_shifts signature * fix WeightProducerClassMixin inheritance and add MultiConfigDatasetsProcessesShiftSourcesMixin * decouple ShiftTask from ConfigTask * bugfixes and linting * fix PlotShiftedVariables * centralize definition of CSPW representations * fixes in ML tasks * fix inconsistencies after merging * add tests for default and group resolving * remove single_config tag from VariablesMixin * move CSPM groups to analysis inst in template * fix category/variable resolving and add resolving tests * streamline tests * cleanup and fix param resolving * add tests for process resolving * extend resolving tests and fix dataset/process resolving * remove duplicate lines * include shift inst tests --------- Co-authored-by: localusers user <juvanden@m0.iihe.ac.be> Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> * Refactor/taf init simplified shift validation (columnflow#641) * revert changes to ShiftSourcesMixin * simplify shift resolving as much as possible * streamline resolve_shifts function * Refactor/taf init (reorganized resolution + fix ml pipeline) (columnflow#643) * first draft for reordered TAF initialization and param resolution * remove shift bins that were not requested from branch map * fix single config tasks (yields) and cleanup * reintroduce ML training pipeline * switch to DatasetsProcessesMixin * recreate dependencies in run_post_init * Cleanup * perform shift resolution only if not yet done * fix single config datasets/processes resolving * move DatasetsProcessesMixin to PlotProcessBase and fallback to nominal shift in reqs * move logger messages into debug mode * revert 5c515bb * fallback branch to -1 if not existent * move default CSPs back to cofig inst * fix param resolving in wrapper_factory * update resolution class and function names * fix bug (pass shift name instead of inst in reqs) * Apply suggestions from code review Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * add resolve_instances for InferenceModelMixin * minor refactoring --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Improve known_shifts caching between workflow and branches. * Fixes edge cases. * Fix default resolution. * Refactor/taf init fixes (columnflow#645) * add missing MLEvaluation reqs * add producer_inst to ProduceColumns.reqs in ML pipeline * load ML columns in histograms and union tasks * locate shift name instead of id in histograms * Typo. * Adjust inference model tests. (columnflow#646) * Fix TAF post init order (columnflow#647) * Correct taf post init order. * Fix selector steps default. * Fix typo. * Add reducer interface. (columnflow#648) * Add reducer interface. * Additional reducer fallback to cf_default. * Add hist prodcer interface. (columnflow#650) * Cleanup top pt weight producer. (columnflow#625) * Cleanup top pt weight producer. * Add TopPtWeightConfig. * Update columnflow/production/cms/top_pt_weight.py Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> --------- Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> * Documentation update for refactoring (columnflow#652) * Start docs update. * Update README. * Add TAF docs. * Finish TAF docs, start transition. * Finish tafs in transition guide. * Finish changed task names docs. * Add multi-config update instructions. * Finish transition guide for reducers. * Finish inference model transition docs. * Finish transition docs. * Lint. * Systematic shift plotting (columnflow#649) * Update shift plots. * Fix id/name handling. * Address review comments by @mafrahm. * Update variable names, add comments. * Update sandboxes. * Update law. * Code harmonization * Apply review comment. * Use process names in hist axes. (columnflow#657) * Use process names in hist axes. * Apply axes conversion to remaining spots. * Add configurable string representations. * Add missing docstring. * Optimize hist filling, code alignment. * Feature: Add mechanism to transform hist into version with equally spaced b ins (columnflow#627) * added mechanism to transform hist into version with equally spaced bins, also added keyword to rotate xticks label * linter * added forgotten keyword argument in the preration of the config * correct typo, add new arguments to kwargs and change default x_ticks * Refactor rebinning function. * Simplify axis settings. * Feedback process and variable updates to style config. * Move x axis transformations to 'apply_variable_settings'. --------- Co-authored-by: Nathan Prouvost <nathan.prouvost@gmail.com> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Marcel R. <github.riga@icloud.com> * Add and use only_local_env decorator. * Make lumi in normalization weight producer configurable. * Minor fixes and consistency. * Fix config lookup for taf classes mixins. (columnflow#669) * CMS jet id producer (columnflow#661) * Add cms-related jet id producer. * Fix bit check. * Allow subpaths in external files. (columnflow#663) * Allow subpaths in external files. * Minor de-nesting. * Maintain subpaths type. * Eager taf teardown when call function fails. (columnflow#662) * Eager taf teardown when call function fails. * Gracefully trigger teardown via decorator. * minor fixes and streamlining (columnflow#671) * Unambiguous hashing. * fix plotting with single varied shift (columnflow#672) * remove flag from MergeHistograms * fix plotting single varied shift in PlotVariables1D * Update columnflow/tasks/histograms.py --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Update law. * adding dy weights producer (columnflow#622) * adding dy weights producer * redifining masks and adding dy_weights_init * adding dy_order input * adding order to DrellYanConfig * adding order to DrellYanConfig * adding check for dy order in cfg * add missing self.dy_unc_corrector * update dy weight producer * linting dy recoil producer * remove duplicate entry in dy recoil weights * fix logic in DY recoil vis dilepton selection * format with black * passed flake8 * linting * update recoil corrections by removing helper functions * fix linter * fix bug with import InsertableDict * Apply suggestions from code review Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * add suggestions from review to DY producer --------- Co-authored-by: Paul Philipp Gadow <paul.philipp.gadow@cern.ch> Co-authored-by: philippgadow <philipp.gadow@mytum.de> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * fix PlotCutflow task and requirements * Update columnflow/tasks/selection.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Shift-conform dy outputs. * fix ml_model repr * Rename recoil_corrections to recoil_corrected_met. * Apply new recommendation for egamma calibration (columnflow#674) * added more kwargs for config, that are necessary to handle run2 and run3 recommendation at the same time. * added more variables for variable maps, switched application of smearing to a standardized version, that results in the same result but is more robust. * removed comments * removed version check * change rand_func to separate normal_up, down variant * rewrap docstring and point to EGammaPog recommendation and example file * switched to concrete arguments in config and feedforward this change * add example into docstring about how to use the calibrator in combination with the config * Apply suggestions from code review --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * append scale label when not passing placeholder * implement own errorbar calculation (columnflow#675) * implement own errorbar calculation * make poisson error calculation independent of histogram shape * Apply suggestions from code review Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * change function name * Apply comments from review Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> * Fix flow handling for fake data in datacards. * Allow skipping histogram checks. * Fix used columns of btag weight producer. * minor plotting fixes * Fix norm_weight_producer_inst in MergeSelectionMasks. * Improve transition guide. * parton shower weights (columnflow#676) * init commit. To see commit history check scalefactor-development branch in GhentAnalysis fork * remove the cmsGhent folder and add parton_shower.py to production/cms * Update columnflow/production/cms/parton_shower.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Update columnflow/production/cms/parton_shower.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * add parton_shower to columnflow-cms specific production modules --------- Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Minor alignment. * Minor cleanup of electron code. * Fix typos in egamma calibrators. * Enable jet_id producer for data. * Hotfix ps weights when variations are missing. * Add cf_remove_tmp tool. * fix typo. * Fix shift selection for plotting. * Fixes for docs (pdf figures not displayed) (columnflow#679) * docs: evince-previewer -> evince evince-previewer is the print preview of evince * added filter to upload svg files to lfs * docs: converted all pdf plots to (additional) svg using `for f in *.pdf; do pdf2svg $f ${f%.pdf}.svg; done` uploaded to lfs * docs: using wildcard extensions for plot file names such that the html generation uses svg, while others (e.g. latex) may still use pdf Before, the image display in the browser was broken and only a link to the pdf file was shown (supposedly the alt text). * Rename histograming -> histogramming. (columnflow#680) * Add missing local_env check to BundleExternalFiles task. * allow diverging producers in MLEvaluation (columnflow#681) * hotfix: update producer_insts based on evaluation_producers * hotfix: update hists with remove_residual_axis function * allow passing mask to apply JER smearing only to a subset of jets * update faulty import in cms_minimal template * hotfix: allow running ml pipeline without preparation_producer * fix padding when ak.max returns None * update jer_horn_handling calibrator * cast undefined_category_ids to str before raising the error to avoid TypeError * Improve tmp file removal. * Update law. * Add preparation producer post init. * Avoid full config copy in plotting. * More verbose leaf category check exceptions. * set scale_factor to 1 instead of 0 (columnflow#685) * allow skipping preparation_producer in MLEvaluation (columnflow#686) Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Save lepton pair pdg id in gen_dilepton producer. * Add structure for category groups. * Add warning. * Typo. * Add warn flag to CategoryGroup. --------- Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> Co-authored-by: localusers user <juvanden@m0.iihe.ac.be> Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> Co-authored-by: Bogdan-Wiederspan <79155113+Bogdan-Wiederspan@users.noreply.github.com> Co-authored-by: Nathan Prouvost <nathan.prouvost@gmail.com> Co-authored-by: Ana Andrade <99343616+aalvesan@users.noreply.github.com> Co-authored-by: Paul Philipp Gadow <paul.philipp.gadow@cern.ch> Co-authored-by: philippgadow <philipp.gadow@mytum.de> Co-authored-by: Mathis Frahm <mathisfrahm@gmx.de> Co-authored-by: Nathan Prouvost <49162277+nprouvost@users.noreply.github.com> Co-authored-by: JulesVandenbroeck <93740577+JulesVandenbroeck@users.noreply.github.com> Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Johannes Lange <jolange@users.noreply.github.com> Co-authored-by: Philip Daniel Keicher <philip.daniel.keicher@cern.ch> * Add tmp dir checks, add cf_setup_post_install hook. * don't consider empty axis as "missing" * correct json file extension * Hotfix category flattening. * Improve tmp file check. * Update law. * Make jet collection used in DY weights more flexible * Hotfix missing xsecs for stitched weight producer. * Add dfs lookup pattern negation. * Allow skipping parts of post setup. * allow evaluating multiple working points with single electron_weights Producer (columnflow#694) * allow evaluating multiple working points with single electron_weights Producer * Move import. --------- Co-authored-by: Marcel R. <github.riga@icloud.com> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Fix task key lookup. (columnflow#697) * Fix scope issue in seed producer. * typo. * add run to inputs and remove double underscore * improve readability of make_jme_keys * implement data_per_era tag to jec config * docs: add LuSchaller as a contributor for code (columnflow#701) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * added sourcing of cms setup to ensure scram is available (columnflow#699) Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Fix typo in cf_inspect. * Hotfix validation check in stitched normalization weight production. * Improve categorizer calls. (columnflow#702) * Add ml model task pinning. (columnflow#703) * add exception when era aux is missing * apply blinding threshold before process scaling * merge workflow reqs of different variables in CreateDatacards (columnflow#689) * Revert process id check in normalization producer. * [cms] Add note on TEC-to-MET propagation. * Control row group merging in MergeReducedEvents. * Update law. * Improve reduced events merging. * Updata law. * Store dataset_info_inst in params, cap files for reduction stats. * Make merging chunk size configurable. (columnflow#707) * Bump version in __version__ file. * add confirm message to cf_remove_tmp * Use return code in tmp removal. * hotfix: constistent branches reqs between MergeReducedEvents and MergeSelectionStats * lint * fix handling of non_zero_mask in murf_envelope (columnflow#704) Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Fix reduction chunk size control. * Fix/asymmetric syst unc (columnflow#710) * consistent shift diffs * consistent cms label configs --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Update law. * check and remove overlaping processes (columnflow#712) * sort configs by ids for the multi config representation * Optionally bypass branch-level plot requirements. (columnflow#716) * inference model caching (columnflow#714) * inference model caching * Generalize caching of derivables. --------- Co-authored-by: Marcel R. <github.riga@icloud.com> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * pad with nominal if shift source missing in config (columnflow#715) * pad with nominal if shift source missing in config * Update columnflow/tasks/cms/inference.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Update columnflow/tasks/cms/inference.py --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Add option for hook to update dataset_selection_stats in norm weight prod. * fix missing datasets in MultiConfig * Update law. * Add local directory check to cf_remove_tmp. * Hotfix typo. * update met_phi Calibrator to new format (columnflow#719) * update met_phi Calibrator to new format * use npvsGood * add npvsGood to uses as well... * Minor adjustments, apply mask to all inputs. --------- Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Marcel R. <github.riga@icloud.com> * Update law. * Generalize normalization weight producer. (columnflow#718) * Generalize normalization weight producer. * Add pull warning. * Add per-dataset weight norm. * Update. * Optionally log brs. * Improve combinatoric treatment, fix single br calculation. * Helper to fill weight table. * Minor adjustments before review. * Hotfix inclusive dataset lookup in norm producer. * Update boost-histogram version. * Hotfix norm weight logging. * Revert boost-histogram update. * Hotfix combined jets calibrator. * Hotfix inclusive dataset attribute in norm weight producer. * make plotting faster using non-interactive backend * [cms] Make datacard writer class configurable in task. * Hotfix typo in norm weight producer. * Adapt norm weight producer to more generic cases. * Hotfix cf_inspect for root files. * Optimize UniteColumns compression for root files. * Improve treepath detection in cf_insepct. * Hotfix fowarding of known_shifts for instance caching. * Minor consistency change. * Fix brace expansion in ProduceColumnsWrapper. * adapting dy weight producer for custom weights (columnflow#724) * Consistent handling of kwargs in teardown functions. * Hotfix TAF instance method defaults. * Refactor and fix met phi calibration. * add plot function for efficeincy plots (columnflow#723) Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Hotfix parameter group cleaning in inference model. * Hotfix: allow brace patterns in TAF shifts. * Remove year from intrinsic btag weight names. (columnflow#726) * Forward remote claw sandbox. * Add pilot option to MergeShiftedHistograms. * Forward known values to hist hooks. * Hook column union, update law. * Hotfix default version injection into tasks with same family. * add ParameterTransformation for ratifying + envelope if one-sided * allow removing negative contributions per process (columnflow#730) * allow removing negative contributions per process * Update columnflow/plotting/plot_util.py Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> --------- Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> * Apply suggestion from @riga * Apply suggestion from @riga * Sequentialize and optimize datacard writing. (columnflow#728) * Sequentialize and optimize datacard writing. * Adjust tests. * Update law. * Minor nuisance group fix. * Loop over variables, eagerly clear memory. * Refactor datacard parameter transformations. * Update tests. * Add flip_(smaller|larger)_if_one_sided transormations. * Typos. * allow multiple processes per dataset in datacard writer (columnflow#733) * allow multiple processes per dataset in datacard writer * cleanup * add arguments to modify_process_hist hook * Disentangle process-to-dataset mapping. * Update inference model instance cache key. * hotfix flip_if_one_sided * Drop effect_from_shape_if_small in favor of effect_from_shape_if_flat. * Typo. --------- Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Marcel R. <github.riga@icloud.com> * Hotfix process object selection for multi-config datacards. * Hotfix variable shape? type in combine datacard writer. * Hotfix abs eta in cms muon weight producer. * Make default_remote_claw_sandbox configurable via law.cfg. * Hotfix version lookup. * Raise explicit error in reduction on option type masks. * Add req helpers to mixins. (columnflow#738) * Accelerate loading time and imports (columnflow#737) * Accelerate imports, allowing only 3rd party np and ak. * Fix missing sub imports. * Add missing plt import. * Adjust imports in tests. * Add missing coffea sub import. * Add parent_mode flag to create_category_combinations, add change tracking. (columnflow#736) * Add skip_parents flag to create_category_combinations. * Refactor category combinations, add category tracker. * Update columnflow/config_util.py Co-authored-by: Ana Andrade <99343616+aalvesan@users.noreply.github.com> * Update columnflow/config_util.py Co-authored-by: Ana Andrade <99343616+aalvesan@users.noreply.github.com> * Typo. --------- Co-authored-by: Ana Andrade <99343616+aalvesan@users.noreply.github.com> * Update sandboxes, avoid dask_awkward in IO (columnflow#735) * Update sandboxes, add ChunkedParquetReader. * Update docstring. * Remove dask-awkward from columnar requirements. * Adjustable sandbox in cf_inspect. * Avoid using np.bool. * Cleanup venv setup. * Update columnflow/columnar_util.py Co-authored-by: Nathan Prouvost <49162277+nprouvost@users.noreply.github.com> * Update columnflow/columnar_util.py Co-authored-by: Nathan Prouvost <49162277+nprouvost@users.noreply.github.com> --------- Co-authored-by: Nathan Prouvost <49162277+nprouvost@users.noreply.github.com> * Hotfix ChunkedParquetReader. * Hotfix bad import in plot utils. * fix deprecated use of masked_sorted_indices * fix missing make_plot_2d * fix missing make_plot_2d * refactor import of matplotlib, hist, coffea, correctionlib * linting fixes * addition of met_uncertainty_sources parameter for jer calibrator that propagates the jet energy resolution smearing to the met uncertainty variations given to jer. * cleaning of jets.py * add jer return * fix linting * Upstream changing to upstream merge request (#113) * Extend dy weight application to use btag multiplicity. (columnflow#739) * Extend dy weight application to use btag multiplicity. * Update docstring. * Hotfix nbtags variable in dy weight producer. * fix skipping data in CreateDatacards * Add objects for interacting with CMS CAT meta data. (columnflow#740) * Add objects for interacting with CAT meta data. * Remove namespace for now. * Cleanup. * Update fixed law. * Use cf.cms task namespace. * Add CMSDatasetInfo. * Allow pathlib input. * Add dc pog to CATSnapshot. * More flexible POG overrides. * Typo. * Simplify. * Hotfix CAT metadata update check for missing POG dirs. * add subplots_cfg in plot_all (columnflow#742) Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> * Update law. * Refactor generator-level top and top decay product lookup (columnflow#741) * Refactor gen top lookup. * Add theory-based top pt weight method. * Comments. * Comments. * Rename field wDecay -> wChildren. * Update kept fields in gen_particles.py Removed 'status' and 'statusFlags' from kept generator particle fields. * Fix gen part field transformations. * Add suggestion by @jolange * Add gen_higgs_lookup. * Hotfix saving of columns in gen_particle lookups. * Hotfix depth limit of gen particles. * Add gen_dy_lookup. * Hotfix multi-config lookup via patterns. * Hotfix reduction to skip empty chunks. * Hotfix higgs gen lookup, considering effective gluon/photon decays. * Hotfix single shift selection in plotting. * Allow patterns in get_shifts_from_sources. * Hotfix save_div in plot scale factor. * [cms] Update log in CheckCATUpdates task. * Skip string columns in finiteness checks, fixes columnflow#743. * Hotfix repo bunlding, add missing user config. * [cms] Refactor egamma calibrators. (columnflow#745) * docs: add Bogdan-Wiederspan as a contributor for review (columnflow#746) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * docs: add aalvesan as a contributor for review (columnflow#747) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * Add t->w->tau children in gen_top_lookup. * Hotfix typo in gen_top lookup. * Add and use sum_hists helper. * Extend tes versions. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Marcel R. <github.riga@icloud.com> Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * Get upstream changes (#114) * Extend dy weight application to use btag multiplicity. (columnflow#739) * Extend dy weight application to use btag multiplicity. * Update docstring. * Hotfix nbtags variable in dy weight producer. * fix skipping data in CreateDatacards * Add objects for interacting with CMS CAT meta data. (columnflow#740) * Add objects for interacting with CAT meta data. * Remove namespace for now. * Cleanup. * Update fixed law. * Use cf.cms task namespace. * Add CMSDatasetInfo. * Allow pathlib input. * Add dc pog to CATSnapshot. * More flexible POG overrides. * Typo. * Simplify. * Hotfix CAT metadata update check for missing POG dirs. * add subplots_cfg in plot_all (columnflow#742) Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> * Update law. * Refactor generator-level top and top decay product lookup (columnflow#741) * Refactor gen top lookup. * Add theory-based top pt weight method. * Comments. * Comments. * Rename field wDecay -> wChildren. * Update kept fields in gen_particles.py Removed 'status' and 'statusFlags' from kept generator particle fields. * Fix gen part field transformations. * Add suggestion by @jolange * Add gen_higgs_lookup. * Hotfix saving of columns in gen_particle lookups. * Hotfix depth limit of gen particles. * Add gen_dy_lookup. * Hotfix multi-config lookup via patterns. * Hotfix reduction to skip empty chunks. * Hotfix higgs gen lookup, considering effective gluon/photon decays. * Hotfix single shift selection in plotting. * Allow patterns in get_shifts_from_sources. * Hotfix save_div in plot scale factor. * [cms] Update log in CheckCATUpdates task. * Skip string columns in finiteness checks, fixes columnflow#743. * Hotfix repo bunlding, add missing user config. * [cms] Refactor egamma calibrators. (columnflow#745) * docs: add Bogdan-Wiederspan as a contributor for review (columnflow#746) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * docs: add aalvesan as a contributor for review (columnflow#747) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * Add t->w->tau children in gen_top_lookup. * Hotfix typo in gen_top lookup. * Add and use sum_hists helper. * Extend tes versions. * [cms] Hotfix tau energy calibration, skip e-fake mask. * [cms] Hotfix egamma calibrator, use same random numbers for all smearing variations. * Add option to skip auto categories in track_category_changes. * Add n_chunks entry to ChunkPosition. * mutliple fixes regarding empty files or (almost) empty chunks (columnflow#750) * mutliple fixes regarding empty files or (almost) empty chunks * move chunk skip out of variable loop * add AbsScEta to variable_map for backwards compatibility * use last instead of first chunk for empty outputs * Fix broadcasting with empty egamma collection. --------- Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Marcel R. <github.riga@icloud.com> * Add simple column selection to UniteColumns. * Remove unneeded columns in cms tec calibrator. * Add variabble_repr to control paths. (columnflow#751) * Hotfix tec, add back charge. * Log broken parquet file paths. * Cleanup of e/mu id, update law. * Fix cf_inspect script after coffea update. (columnflow#753) * Hotfix electron weight producer with nested working points. * Hotfix attributes added by taf decorators. * Rename max-runtime -> {htcondor,slurm}-runtime. (columnflow#755) * Simplify requiring producers. (columnflow#756) * Simplify requiring producers. * Add same mechanism for calibrators. * Revert pilot decisions. * Add muon_sr calibrator. (columnflow#754) * Hotfix version resolution from config. * Hotfix required producers/calibrators for workflows. * Persistent local files of BundleExternalFiles. (columnflow#752) * Presistent local files of BundleExternalFiles. * Fix files_dir property. * Better caching. * Preserve types. * Ensure clean dir. * Allow unpacking in remote envs. * Pass-through workflow requirements in CreateHistograms. * Feature/histogram user multiconfig (columnflow#709) * make HistogramsUserBase compatible with multi-config * backwards compatibility to single-config * improve flexibility & runtime of helper functions * make shifts a set * add inputs as argument to load_histograms --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> * update hist axis labels during histogram merging (columnflow#705) * update labels during histogram merging * move update_ax_labels to hist_util.py * Linting --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> * Fix variance of fake data in datacard writer, better logs. * Update law. * Fix mamba setup. --------- Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Marcel R. <github.riga@icloud.com> Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: jomatthi <82223346+jomatthi@users.noreply.github.com> Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> * updated law module * missing import of hist * fix pt overflow slowing down b-tagging scale factors --------- Co-authored-by: JulesVandenbroeck <93740577+JulesVandenbroeck@users.noreply.github.com> Co-authored-by: juvanden <jules.vandenbroeck@cern.ch> Co-authored-by: Marcel Rieger <riga@users.noreply.github.com> Co-authored-by: Mathis Frahm <49306645+mafrahm@users.noreply.github.com> Co-authored-by: localusers user <juvanden@m0.iihe.ac.be> Co-authored-by: Philip Keicher <26219567+pkausw@users.noreply.github.com> Co-authored-by: Bogdan-Wiederspan <79155113+Bogdan-Wiederspan@users.noreply.github.com> Co-authored-by: Nathan Prouvost <nathan.prouvost@gmail.com> Co-authored-by: Ana Andrade <99343616+aalvesan@users.noreply.github.com> Co-authored-by: Paul Philipp Gadow <paul.philipp.gadow@cern.ch> Co-authored-by: philippgadow <philipp.gadow@mytum.de> Co-authored-by: Mathis Frahm <mathisfrahm@gmx.de> Co-authored-by: Nathan Prouvost <49162277+nprouvost@users.noreply.github.com> Co-authored-by: Johannes Lange <jolange@users.noreply.github.com> Co-authored-by: Philip Daniel Keicher <philip.daniel.keicher@cern.ch> Co-authored-by: Marcel R. <github.riga@icloud.com> Co-authored-by: Lara <lara.markus@gmx.de> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Lukas Schaller <30951523+LuSchaller@users.noreply.github.com> Co-authored-by: Anas Haddad <hhhaaanas@gmail.com> Co-authored-by: Mathis Frahm <mathis.frahm@uni-hamburg.de> Co-authored-by: jomatthi <82223346+jomatthi@users.noreply.github.com>
1 parent 055cb34 commit f70af83

102 files changed

Lines changed: 5406 additions & 2713 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.all-contributorsrc

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@
7171
"profile": "https://github.com/Bogdan-Wiederspan",
7272
"contributions": [
7373
"code",
74-
"test"
74+
"test",
75+
"review"
7576
]
7677
},
7778
{
@@ -153,16 +154,27 @@
153154
"avatar_url": "https://avatars.githubusercontent.com/u/99343616?v=4",
154155
"profile": "https://github.com/aalvesan",
155156
"contributions": [
156-
"code"
157+
"code",
158+
"review"
157159
]
158-
}, {
160+
},
161+
{
159162
"login": "philippgadow",
160163
"name": "philippgadow",
161164
"avatar_url": "https://avatars.githubusercontent.com/u/6804366?v=4",
162165
"profile": "https://github.com/philippgadow",
163166
"contributions": [
164167
"code"
165168
]
169+
},
170+
{
171+
"login": "LuSchaller",
172+
"name": "Lukas Schaller",
173+
"avatar_url": "https://avatars.githubusercontent.com/u/30951523?v=4",
174+
"profile": "https://github.com/LuSchaller",
175+
"contributions": [
176+
"code"
177+
]
166178
}
167179
],
168180
"commitType": "docs"

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ For a better overview of the tasks that are triggered by the commands below, che
138138
<td align="center" valign="top" width="14.28%"><a href="https://github.com/dsavoiu"><img src="https://avatars.githubusercontent.com/u/17005255?v=4?s=100" width="100px;" alt="Daniel Savoiu"/><br /><sub><b>Daniel Savoiu</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=dsavoiu" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/pulls?q=is%3Apr+reviewed-by%3Adsavoiu" title="Reviewed Pull Requests">👀</a></td>
139139
<td align="center" valign="top" width="14.28%"><a href="https://github.com/pkausw"><img src="https://avatars.githubusercontent.com/u/26219567?v=4?s=100" width="100px;" alt="pkausw"/><br /><sub><b>pkausw</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=pkausw" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/pulls?q=is%3Apr+reviewed-by%3Apkausw" title="Reviewed Pull Requests">👀</a></td>
140140
<td align="center" valign="top" width="14.28%"><a href="https://github.com/nprouvost"><img src="https://avatars.githubusercontent.com/u/49162277?v=4?s=100" width="100px;" alt="nprouvost"/><br /><sub><b>nprouvost</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=nprouvost" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/commits?author=nprouvost" title="Tests">⚠️</a></td>
141-
<td align="center" valign="top" width="14.28%"><a href="https://github.com/Bogdan-Wiederspan"><img src="https://avatars.githubusercontent.com/u/79155113?v=4?s=100" width="100px;" alt="Bogdan-Wiederspan"/><br /><sub><b>Bogdan-Wiederspan</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=Bogdan-Wiederspan" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/commits?author=Bogdan-Wiederspan" title="Tests">⚠️</a></td>
141+
<td align="center" valign="top" width="14.28%"><a href="https://github.com/Bogdan-Wiederspan"><img src="https://avatars.githubusercontent.com/u/79155113?v=4?s=100" width="100px;" alt="Bogdan-Wiederspan"/><br /><sub><b>Bogdan-Wiederspan</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=Bogdan-Wiederspan" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/commits?author=Bogdan-Wiederspan" title="Tests">⚠️</a> <a href="https://github.com/columnflow/columnflow/pulls?q=is%3Apr+reviewed-by%3ABogdan-Wiederspan" title="Reviewed Pull Requests">👀</a></td>
142142
<td align="center" valign="top" width="14.28%"><a href="https://github.com/kramerto"><img src="https://avatars.githubusercontent.com/u/18616159?v=4?s=100" width="100px;" alt="Tobias Kramer"/><br /><sub><b>Tobias Kramer</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=kramerto" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/pulls?q=is%3Apr+reviewed-by%3Akramerto" title="Reviewed Pull Requests">👀</a></td>
143143
</tr>
144144
<tr>
@@ -151,8 +151,9 @@ For a better overview of the tasks that are triggered by the commands below, che
151151
<td align="center" valign="top" width="14.28%"><a href="https://github.com/JulesVandenbroeck"><img src="https://avatars.githubusercontent.com/u/93740577?v=4?s=100" width="100px;" alt="JulesVandenbroeck"/><br /><sub><b>JulesVandenbroeck</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=JulesVandenbroeck" title="Code">💻</a></td>
152152
</tr>
153153
<tr>
154-
<td align="center" valign="top" width="14.28%"><a href="https://github.com/aalvesan"><img src="https://avatars.githubusercontent.com/u/99343616?v=4?s=100" width="100px;" alt="Ana Andrade"/><br /><sub><b>Ana Andrade</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=aalvesan" title="Code">💻</a></td>
154+
<td align="center" valign="top" width="14.28%"><a href="https://github.com/aalvesan"><img src="https://avatars.githubusercontent.com/u/99343616?v=4?s=100" width="100px;" alt="Ana Andrade"/><br /><sub><b>Ana Andrade</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=aalvesan" title="Code">💻</a> <a href="https://github.com/columnflow/columnflow/pulls?q=is%3Apr+reviewed-by%3Aaalvesan" title="Reviewed Pull Requests">👀</a></td>
155155
<td align="center" valign="top" width="14.28%"><a href="https://github.com/philippgadow"><img src="https://avatars.githubusercontent.com/u/6804366?v=4?s=100" width="100px;" alt="philippgadow"/><br /><sub><b>philippgadow</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=philippgadow" title="Code">💻</a></td>
156+
<td align="center" valign="top" width="14.28%"><a href="https://github.com/LuSchaller"><img src="https://avatars.githubusercontent.com/u/30951523?v=4?s=100" width="100px;" alt="Lukas Schaller"/><br /><sub><b>Lukas Schaller</b></sub></a><br /><a href="https://github.com/columnflow/columnflow/commits?author=LuSchaller" title="Code">💻</a></td>
156157
</tr>
157158
</tbody>
158159
</table>

analysis_templates/cms_minimal/__cf_module_name__/plotting/example.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,16 @@
1414
apply_variable_settings,
1515
apply_process_settings,
1616
)
17+
from columnflow.types import TYPE_CHECKING
1718

18-
hist = maybe_import("hist")
1919
np = maybe_import("numpy")
20-
mpl = maybe_import("matplotlib")
21-
plt = maybe_import("matplotlib.pyplot")
22-
mplhep = maybe_import("mplhep")
2320
od = maybe_import("order")
2421

22+
# import hist, matplotlib... for type checking only like this! import them then also locallu.
23+
if TYPE_CHECKING:
24+
hist = maybe_import("hist")
25+
plt = maybe_import("matplotlib.pyplot")
26+
2527

2628
def my_plot1d_func(
2729
hists: OrderedDict[od.Process, hist.Hist],
@@ -45,6 +47,9 @@ def my_plot1d_func(
4547
--plot-function __cf_module_name__.plotting.example.my_plot1d_func \
4648
--general-settings example_param=some_text
4749
"""
50+
import mplhep
51+
import matplotlib.pyplot as plt
52+
4853
# we can add arbitrary parameters via the `general_settings` parameter to access them in the
4954
# plotting function. They are automatically parsed either to a bool, float, or string
5055
print(f"the example_param has been set to '{example_param}' (type: {type(example_param)})")

analysis_templates/cms_minimal/law.cfg

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ default_analysis: __cf_module_name__.config.analysis___cf_short_name_lc__.analys
2727
default_config: run2_2017_nano_v9
2828
default_dataset: st_tchannel_t_4f_powheg
2929

30-
calibration_modules: columnflow.calibration.cms.{jets,met,tau}, __cf_module_name__.calibration.example
30+
calibration_modules: columnflow.calibration.cms.{jets,met,tau,egamma,muon}, __cf_module_name__.calibration.example
3131
selection_modules: columnflow.selection.empty, columnflow.selection.cms.{json_filter,met_filters}, __cf_module_name__.selection.example
3232
reduction_modules: columnflow.reduction.default, __cf_module_name__.reduction.example
33-
production_modules: columnflow.production.{categories,matching,normalization,processes}, columnflow.production.cms.{btag,electron,jet,matching,mc_weight,muon,pdf,pileup,scale,parton_shower,seeds}, __cf_module_name__.production.example
33+
production_modules: columnflow.production.{categories,matching,normalization,processes}, columnflow.production.cms.{btag,electron,jet,matching,mc_weight,muon,pdf,pileup,scale,parton_shower,seeds,gen_particles}, __cf_module_name__.production.example
3434
categorization_modules: __cf_module_name__.categorization.example
3535
hist_production_modules: columnflow.histogramming.default, __cf_module_name__.histogramming.example
3636
ml_modules: columnflow.ml, __cf_module_name__.ml.example
@@ -56,12 +56,16 @@ default_create_selection_hists: False
5656
# wether or not the ensure_proxy decorator should be skipped, even if used by task's run methods
5757
skip_ensure_proxy: False
5858

59+
# the name of a sandbox to use for tasks in remote jobs initially (invoked with claw when set)
60+
default_remote_claw_sandbox: None
61+
5962
# some remote workflow parameter defaults
6063
# (resources like memory and disk can also be set in [resources] with more granularity)
6164
htcondor_flavor: $CF_HTCONDOR_FLAVOR
6265
htcondor_share_software: False
6366
htcondor_memory: -1
6467
htcondor_disk: -1
68+
htcondor_runtime: 3h
6569
slurm_flavor: $CF_SLURM_FLAVOR
6670
slurm_partition: $CF_SLURM_PARTITION
6771

@@ -70,6 +74,9 @@ chunked_io_chunk_size: 100000
7074
chunked_io_pool_size: 2
7175
chunked_io_debug: False
7276

77+
# settings for merging parquet files in several locations
78+
merging_row_group_size: 50000
79+
7380
# csv list of task families that inherit from ChunkedReaderMixin and whose output arrays should be
7481
# checked (raising an exception) for non-finite values before saving them to disk
7582
check_finite_output: cf.CalibrateEvents, cf.SelectEvents, cf.ReduceEvents, cf.ProduceColumns
@@ -98,8 +105,8 @@ lfn_sources: wlcg_fs_t2b_redirector, wlcg_fs_infn_redirector, wlcg_fs_global_red
98105
# output locations per task family
99106
# the key can consist of multple underscore-separated parts, that can each be patterns or regexes
100107
# these parts are used for the lookup from within tasks and can contain (e.g.) the analysis name,
101-
# the config name, the task family, the dataset name, or the shift name
102-
# (see AnalysisTask.get_config_lookup_keys() - and subclasses - for the exact order)
108+
# the config name, the task family, the dataset name, or the shift name, for more info, see
109+
# https://columnflow.readthedocs.io/en/latest/user_guide/best_practices.html#selecting-output-locations
103110
# values can have the following format:
104111
# for local targets : "local[, LOCAL_FS_NAME or STORE_PATH][, store_parts_modifier]"
105112
# for remote targets : "wlcg[, WLCG_FS_NAME][, store_parts_modifier]"
@@ -108,22 +115,22 @@ lfn_sources: wlcg_fs_t2b_redirector, wlcg_fs_infn_redirector, wlcg_fs_global_red
108115
# the "store_parts_modifiers" can be the name of a function in the "store_parts_modifiers" aux dict
109116
# of the analysis instance, which is called with an output's store parts of an output to modify them
110117
# example:
111-
; run3_2023__cf.CalibrateEvents__nomin*: local
112-
; cf.CalibrateEvents: wlcg
118+
; cfg_run3_2023__task_cf.CalibrateEvents__shift_nomin*: local
119+
; task_cf.CalibrateEvents: wlcg
113120

114121

115122
[versions]
116123

117124
# default versions of specific tasks to pin
118125
# the key can consist of multple underscore-separated parts, that can each be patterns or regexes
119126
# these parts are used for the lookup from within tasks and can contain (e.g.) the analysis name,
120-
# the config name, the task family, the dataset name, or the shift name
121-
# (see AnalysisTask.get_config_lookup_keys() - and subclasses - for the exact order)
127+
# the config name, the task family, the dataset name, or the shift name, for more info, see
128+
# https://columnflow.readthedocs.io/en/latest/user_guide/best_practices.html#pinned-versions-in-the-analysis-config-or-law-cfg-file
122129
# note:
123130
# this lookup is skipped if the lookup based on the config instance's auxiliary data succeeded
124131
# example:
125-
; run3_2023__cf.CalibrateEvents__nomin*: prod1
126-
; cf.CalibrateEvents: prod2
132+
; cfg_run3_2023__task_cf.CalibrateEvents__shift_nomin*: prod1
133+
; task_cf.CalibrateEvents: prod2
127134

128135

129136
[resources]
@@ -135,8 +142,8 @@ lfn_sources: wlcg_fs_t2b_redirector, wlcg_fs_infn_redirector, wlcg_fs_global_red
135142
# by the respective parameter instance at runtime
136143
# same as for [versions], the order of options is important as it defines the resolution order
137144
# example:
138-
; run3_2023__cf.CalibrateEvents__nomin*: htcondor_memory=5GB
139-
; run3_2023__cf.CalibrateEvents: htcondor_memory=2GB
145+
; cfg_run3_2023__task_cf.CalibrateEvents__shift_nomin*: htcondor_memory=5GB
146+
; cfg_run3_2023__task_cf.CalibrateEvents: htcondor_memory=2GB
140147

141148

142149
[job]
@@ -159,6 +166,12 @@ remote_lcg_setup_el9: /cvmfs/grid.cern.ch/alma9-ui-test/etc/profile.d/setup-alma
159166
remote_lcg_setup_force: False
160167

161168

169+
[target]
170+
171+
# when removing target collections, use multi-threading
172+
collection_remove_threads: 2
173+
174+
162175
[local_fs]
163176

164177
base: /

analysis_templates/ghent_template/__cf_module_name__/plotting/example.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,16 @@
1414
apply_variable_settings,
1515
apply_process_settings,
1616
)
17+
from columnflow.types import TYPE_CHECKING
1718

18-
hist = maybe_import("hist")
1919
np = maybe_import("numpy")
20-
mpl = maybe_import("matplotlib")
21-
plt = maybe_import("matplotlib.pyplot")
22-
mplhep = maybe_import("mplhep")
2320
od = maybe_import("order")
2421

22+
# import hist, matplotlib... for type checking only like this! import them then also locallu.
23+
if TYPE_CHECKING:
24+
hist = maybe_import("hist")
25+
plt = maybe_import("matplotlib.pyplot")
26+
2527

2628
def my_plot1d_func(
2729
hists: OrderedDict[od.Process, hist.Hist],
@@ -45,6 +47,9 @@ def my_plot1d_func(
4547
--plot-function __cf_module_name__.plotting.example.my_plot1d_func \
4648
--general-settings example_param=some_text
4749
"""
50+
import mplhep
51+
import matplotlib.pyplot as plt
52+
4853
# we can add arbitrary parameters via the `general_settings` parameter to access them in the
4954
# plotting function. They are automatically parsed either to a bool, float, or string
5055
print(f"The example_param has been set to '{example_param}' (type: {type(example_param)})")

analysis_templates/ghent_template/__cf_module_name__/production/default.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,10 @@
1616

1717
np = maybe_import("numpy")
1818
ak = maybe_import("awkward")
19-
coffea = maybe_import("coffea")
20-
maybe_import("coffea.nanoevents.methods.nanoaod")
19+
20+
# do not import coffea globally! Do this inside the function
21+
# coffea = maybe_import("coffea")
22+
# maybe_import("coffea.nanoevents.methods.nanoaod")
2123

2224

2325
@producer(

analysis_templates/ghent_template/__cf_module_name__/selection/default.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,15 +28,21 @@
2828
from __cf_short_name_lc__.selection.stats import __cf_short_name_lc___increment_stats
2929
from __cf_short_name_lc__.selection.trigger import trigger_selection
3030

31+
# only numpy and awkward are okay to import globally
3132
np = maybe_import("numpy")
3233
ak = maybe_import("awkward")
33-
coffea = maybe_import("coffea")
34-
maybe_import("coffea.nanoevents.methods.nanoaod")
34+
35+
# do not import coffea globally! Do this inside the function
36+
# coffea = maybe_import("coffea")
37+
# maybe_import("coffea.nanoevents.methods.nanoaod")
3538

3639
logger = law.logger.get_logger(__name__)
3740

3841

3942
def TetraVec(arr: ak.Array) -> ak.Array:
43+
import coffea
44+
import coffea.nanoevents.methods.nanoaod
45+
4046
TetraVec = ak.zip({"pt": arr.pt, "eta": arr.eta, "phi": arr.phi, "mass": arr.mass},
4147
with_name="PtEtaPhiMLorentzVector",
4248
behavior=coffea.nanoevents.methods.vector.behavior)

analysis_templates/ghent_template/__cf_module_name__/selection/objects.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from columnflow.util import maybe_import, four_vec
1111
from columnflow.columnar_util import set_ak_column
1212
from columnflow.selection import Selector, SelectionResult, selector
13-
from columnflow.reduction.util import masked_sorted_indices
13+
from columnflow.columnar_util import sorted_indices_from_mask
1414

1515
ak = maybe_import("awkward")
1616

@@ -53,7 +53,7 @@ def muon_object(
5353
steps={},
5454
objects={
5555
"Muon": {
56-
"Muon": masked_sorted_indices(mu_mask, muon.pt)
56+
"Muon": sorted_indices_from_mask(mu_mask, muon.pt)
5757
}
5858
},
5959
)
@@ -108,7 +108,7 @@ def electron_object(
108108
steps={},
109109
objects={
110110
"Electron": {
111-
"Electron": masked_sorted_indices(e_mask, electron.pt)
111+
"Electron": sorted_indices_from_mask(e_mask, electron.pt)
112112
}
113113
},
114114
)
@@ -142,7 +142,7 @@ def jet_object(
142142
(dR_mask)
143143
)
144144

145-
jet_indices = masked_sorted_indices(jet_mask, events.Jet.pt)
145+
jet_indices = sorted_indices_from_mask(jet_mask, events.Jet.pt)
146146
n_jets = ak.sum(jet_mask, axis=-1)
147147

148148
return events, SelectionResult(

bin/cf_inspect

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,25 @@
11
#!/bin/sh
22
action () {
3+
# local variables
34
local shell_is_zsh="$( [ -z "${ZSH_VERSION}" ] && echo "false" || echo "true" )"
45
local this_file="$( ${shell_is_zsh} && echo "${(%):-%x}" || echo "${BASH_SOURCE[0]}" )"
56
local this_dir="$( cd "$( dirname "${this_file}" )" && pwd )"
67

8+
# check arguments
79
# [ "$#" -eq 0 ] && {
810
# echo "ERROR: at least one file must be provided"
911
# return 1
1012
# }
1113

12-
cf_sandbox venv_columnar_dev python "${this_dir}/cf_inspect.py" "$@"
14+
# determine the sandbox to use
15+
local cf_inspect_sandbox="${CF_INSPECT_SANDBOX:-venv_columnar_dev}"
16+
17+
# run the inspection script, potentially switching to the sandbox if not already in it
18+
if [ "${CF_VENV_NAME}" = "${cf_inspect_sandbox}" ]; then
19+
python "${this_dir}/cf_inspect.py" "$@"
20+
else
21+
cf_sandbox "${cf_inspect_sandbox}" python "${this_dir}/cf_inspect.py" "$@"
22+
fi
1323
}
1424

1525
action "$@"

0 commit comments

Comments
 (0)