Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hist2workspace regression (or change) #2570

Open
1 task done
kratsg opened this issue Jan 22, 2025 · 9 comments
Open
1 task done

hist2workspace regression (or change) #2570

kratsg opened this issue Jan 22, 2025 · 9 comments
Labels
bug Something isn't working needs-triage Needs a maintainer to categorize and assign

Comments

@kratsg
Copy link
Contributor

kratsg commented Jan 22, 2025

Summary

When one uses a simple workspace/model like so

{
  "channels": [
    {
      "name": "singlechannel",
      "samples": [
        {
          "name": "signal",
          "data": [12.0, 11.0],
          "modifiers": [
            {
              "name": "mu",
              "type": "normfactor",
              "data": null
            }
          ]
        },
        {
          "name": "background",
          "data": [50.0, 52.0],
          "modifiers": [
            {
              "name": "uncorr_bkguncrt",
              "type": "shapesys",
              "data": [3.0, 7.0]
            }
          ]
        }
      ]
    }
  ],
  "measurements": [
    {
      "config": {
        "parameters": [],
        "poi": "mu"
      },
      "name": "measurement"
    }
  ],
  "observations": [
    {
      "data": [51.0, 48.0],
      "name": "singlechannel"
    }
  ],
  "version": "1.0.0"
}

and converts to XML via pyhf xml2json, the output for the singlechannel channel is

<!DOCTYPE Channel SYSTEM '../HistFactorySchema.dtd'>

<Channel Name="singlechannel" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root">
  <Data HistoName="histsinglechannel_data" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root">
    </Data>
  <Sample Name="signal" HistoName="histsinglechannel_signal" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root" NormalizeByTheory="False">
    <NormFactor Name="mu" Val="1" Low="0" High="10">
     </NormFactor>
    </Sample>
  <Sample Name="background" HistoName="histsinglechannel_background" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root" NormalizeByTheory="False">
    <ShapeSys Name="uncorr_bkguncrt" ConstraintType="Poisson" HistoName="histsinglechannel_background_uncorr_bkguncrt">
      </ShapeSys>
    </Sample>
  </Channel>

however, ROOT complains in hist2workspace about some of the above having "declared content" even though it's meant to be empty (weird XML validation going on in ROOT) to the point that one needs to modify the above to

<!DOCTYPE Channel SYSTEM '../HistFactorySchema.dtd'>

<Channel Name="singlechannel" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root">
  <Data HistoName="histsinglechannel_data" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root"></Data>
  <Sample Name="signal" HistoName="histsinglechannel_signal" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root" NormalizeByTheory="False">
    <NormFactor Name="mu" Val="1" Low="0" High="10"></NormFactor>
    </Sample>
  <Sample Name="background" HistoName="histsinglechannel_background" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root" NormalizeByTheory="False">
    <ShapeSys Name="uncorr_bkguncrt" ConstraintType="Poisson" HistoName="histsinglechannel_background_uncorr_bkguncrt"></ShapeSys>
    </Sample>
  </Channel>

where the newlines are just removed -- which is rather odd, as XML is not sensitive to newlines typically... I wonder if ROOT has a particular regression here or somehow became even more strict.

OS / Environment

$ system_profiler -detailLevel mini SPSoftwareDataType | head -n 6 | pbcopy
Software:

    System Software Overview:

      System Version: macOS 14.6.1 (23G93)
      Kernel Version: Darwin 23.6.0

$ root --version
ROOT Version: 6.32.06
Built for macosxarm64 on Sep 21 2024, 18:21:53
From tags/6-32-06@6-32-06

Steps to Reproduce

.

File Upload (optional)

No response

Expected Results

.

Actual Results

.

pyhf Version

.

Code of Conduct

  • I agree to follow the Code of Conduct
@kratsg kratsg added bug Something isn't working needs-triage Needs a maintainer to categorize and assign labels Jan 22, 2025
@matthewfeickert
Copy link
Member

@kratsg Do you know if ROOT v6.32.06 is the first ROOT to have this happen?

cc @guitargeek in the event that the underlying cause of this is something he knows off the top of his head.

@guitargeek
Copy link
Contributor

Hi! There is nothing that happened to the relevant ConfigParser class between 6.30 and 6.32 besides some technical changes to fix compiler warnings:
https://github.com/root-project/root/commits/v6-32-00-patches/roofit/histfactory/src/ConfigParser.cxx

Therefore, it would indeed be useful to know if it works with 6.30.

@matthewfeickert
Copy link
Member

Thanks very much for the super fast feedback, @guitargeek!

@matthewfeickert
Copy link
Member

matthewfeickert commented Jan 22, 2025

With the following pixi manifest pixi.toml:

[project]
authors = ["Matthew Feickert <[email protected]>"]
channels = ["conda-forge"]
description = "Add a short description here"
name = "pyhf-issue-2570"
platforms = ["linux-64", "osx-arm64"]
version = "0.1.0"

[tasks]
convert = """
rm -rf output && \
mkdir -p output && \
pyhf json2xml --output-dir output workspace.json
"""

hist2workspace = """
hist2workspace output/FitConfig.xml
"""

start = { depends-on = ["convert", "hist2workspace"] }

[dependencies]
pyhf = "0.7.6.*"
root_base = "6.30.*"
uproot = ">=5.5.1,<6"

which gives

$ pixi list pyhf
Package  Version  Build         Size       Kind   Source
pyhf     0.7.6    pyhff2d567_5  108.1 KiB  conda  pyhf
$ pixi list root_base
Package    Version  Build            Size       Kind   Source
root_base  6.30.4   py311h1329b8c_0  192.8 MiB  conda  root_base

using the given JSON from #2570 (comment) as workspace.json

pixi run start

gives valid output.

Full output:
$ pixi run start
✨ Pixi task (convert): rm -rf output && mkdir -p output && pyhf json2xml --output-dir output workspace.json


✨ Pixi task (hist2workspace): hist2workspace output/FitConfig.xml

[#2] INFO:HistFactory -- hist2workspace is less verbose now. Use -v and -vv for more details.

[#2] PROGRESS:HistFactory -- Getting histogram /tmp/pyhf-issue-2570/output/data/data.root:/histsinglechannel_data
[#2] PROGRESS:HistFactory -- Getting histogram /tmp/pyhf-issue-2570/output/data/data.root:/histsinglechannel_signal
[#2] PROGRESS:HistFactory -- Getting histogram /tmp/pyhf-issue-2570/output/data/data.root:/histsinglechannel_background
[#2] PROGRESS:HistFactory -- Getting histogram /tmp/pyhf-issue-2570/output/data/data.root:/histsinglechannel_background_uncorr_bkguncrt
[#2] PROGRESS:HistFactory -- Starting to process channel: singlechannel
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Starting to process 'singlechannel' channel with 1 observables
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	import model into workspace
-----------------------------------------

[#2] PROGRESS:HistFactory -- Writing sample: signal
[#2] PROGRESS:HistFactory -- Writing sample: background
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement
[#2] PROGRESS:HistFactory -- Successfully wrote channel to file
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Entering combination
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Importing combined model
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	create toy data for channelCat[singlechannel]
-----------------------------------------

[#2] PROGRESS:HistFactory -- Writing combined workspace to file: output/config/FitConfig_combined_measurement_model.root
[#2] PROGRESS:HistFactory -- Writing combined measurement to file: output/config/FitConfig_combined_measurement_model.root
[#2] PROGRESS:HistFactory -- Writing sample: signal
[#2] PROGRESS:HistFactory -- Writing sample: background
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement

I get the same output if I bump this to

pixi add 'pyhf=0.7.6' uproot 'root_base=6.32'

resulting in

$ pixi list root_base
Package    Version  Build            Size       Kind   Source
root_base  6.32.2   py312h714f7df_3  214.9 MiB  conda  root_base

As the conda-forge ROOT feedstock isn't currently supported by the ROOT team (I would love for this to change as it is currently blocking multiple of the tools I maintain on conda-forge), this is as recent of a ROOT release as we can test with conda-forge.

If there are up-to-date Linux containers for ROOT those can be used to test this, but I'll need to look at that later, or let someone else beat me to it.


@kratsg did I misunderstand what you were trying to do here? Were you just converting with json2xml and then using hist2workspace? Or were you also roundtripping back to JSON with xml2sjson?

@kratsg
Copy link
Contributor Author

kratsg commented Jan 22, 2025

This is what I see if I use

$ cat config/FitConfig_singlechannel.xml <!DOCTYPE Channel SYSTEM '../HistFactorySchema.dtd'>

<Channel Name="singlechannel" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root">
  <Data HistoName="histsinglechannel_data" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root">
    </Data>
  <Sample Name="signal" HistoName="histsinglechannel_signal" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root" NormalizeByTheory="False">
    <NormFactor Name="mu" Val="1" Low="0" High="10">
      </NormFactor>
    </Sample>
  <Sample Name="background" HistoName="histsinglechannel_background" InputFile="/Users/kratsg/pyhs3/tests/test_hifa/foo/data/data.root" NormalizeByTheory="False">
    <ShapeSys Name="uncorr_bkguncrt" ConstraintType="Poisson" HistoName="histsinglechannel_background_uncorr_bkguncrt">
      </ShapeSys>
    </Sample>
  </Channel>

output

$ hist2workspace FitConfig.xml
[#2] INFO:HistFactory -- hist2workspace is less verbose now. Use -v and -vv for more details.

/Users/kratsg/pyhs3/tests/test_hifa/foo/config/FitConfig_singlechannel.xml:5: element Data: validity error : Element Data was declared EMPTY this one has content
    </Data>
           ^
/Users/kratsg/pyhs3/tests/test_hifa/foo/config/FitConfig_singlechannel.xml:8: element NormFactor: validity error : Element NormFactor was declared EMPTY this one has content
      </NormFactor>
                   ^
/Users/kratsg/pyhs3/tests/test_hifa/foo/config/FitConfig_singlechannel.xml:12: element ShapeSys: validity error : Element ShapeSys was declared EMPTY this one has content
      </ShapeSys>
                 ^
[#2] ERROR:HistFactory -- Loading of xml document "/Users/kratsg/pyhs3/tests/test_hifa/foo/config/FitConfig_singlechannel.xml" failed
libc++abi: terminating due to uncaught exception of type RooStats::HistFactory::hf_exc: HistFactory - Exception
Abort trap: 6

no round-tripping. I'm going in this direction to get HS3 files out.

@kratsg
Copy link
Contributor Author

kratsg commented Jan 22, 2025

It's likely due to the EMPTY declarations in the DTD, but this hadn't been an issue in the past as presumably new lines are not considered content... or so I thought. Maybe this behavior changed with one of ROOT's underlying parsers. We're really just relying on xml.etree.ElementTree to convert the XML to string for dumping to file.. so I'm not sure how one solves this tension...

@matthewfeickert
Copy link
Member

@kratsg How are you getting your ROOT distribution? Using the rootproject/root:latest Docker image I can't reproduce your problem with ROOT v6.34.00:

$ docker run --rm -ti rootproject/root:latest bash
# curl -fsSL https://pixi.sh/install.sh | bash
# . /root/.bashrc
# root --version
ROOT Version: 6.34.00
Built for linuxx8664gcc on Nov 14 2024, 06:30:57
From tags/v6-34-00-rc1@v6-34-00-rc1
# mkdir pyhf-issue-2570 && cd pyhf-issue-2570
# cat pixi.toml 
[project]
authors = ["Matthew Feickert <[email protected]>"]
channels = ["conda-forge"]
description = "Add a short description here"
name = "pyhf-issue-2570"
platforms = ["linux-64", "osx-arm64"]
version = "0.1.0"

[tasks]
convert = """
rm -rf output && \
mkdir -p output && \
pyhf json2xml --output-dir output workspace.json
"""

hist2workspace = """
hist2workspace output/FitConfig.xml
"""

start = { depends-on = ["convert", "hist2workspace"] }

[dependencies]
pyhf = "0.7.6.*"
uproot = ">=5.5.1,<6"
# test -f workspace.json
# pixi run start
✨ Pixi task (convert): rm -rf output && mkdir -p output && pyhf json2xml --output-dir output workspace.json


✨ Pixi task (hist2workspace): hist2workspace output/FitConfig.xml

[#2] INFO:HistFactory -- hist2workspace is less verbose now. Use -v and -vv for more details.

[#2] PROGRESS:HistFactory -- Getting histogram /opt/example/output/data/data.root:/histsinglechannel_data
[#2] PROGRESS:HistFactory -- Getting histogram /opt/example/output/data/data.root:/histsinglechannel_signal
[#2] PROGRESS:HistFactory -- Getting histogram /opt/example/output/data/data.root:/histsinglechannel_background
[#2] PROGRESS:HistFactory -- Getting histogram /opt/example/output/data/data.root:/histsinglechannel_background_uncorr_bkguncrt
[#2] PROGRESS:HistFactory -- Starting to process channel: singlechannel
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Starting to process 'singlechannel' channel with 1 observables
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	import model into workspace
-----------------------------------------

[#2] PROGRESS:HistFactory -- Writing sample: signal
[#2] PROGRESS:HistFactory -- Writing sample: background
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement
[#2] PROGRESS:HistFactory -- Successfully wrote channel to file
[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Entering combination
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	Importing combined model
-----------------------------------------

[#2] PROGRESS:HistFactory -- 
-----------------------------------------
	create toy data
-----------------------------------------

[#2] PROGRESS:HistFactory -- Writing combined workspace to file: output/config/FitConfig_combined_measurement_model.root
[#2] PROGRESS:HistFactory -- Writing combined measurement to file: output/config/FitConfig_combined_measurement_model.root
[#2] PROGRESS:HistFactory -- Writing sample: signal
[#2] PROGRESS:HistFactory -- Writing sample: background
[#2] PROGRESS:HistFactory -- Saved all histograms
[#2] PROGRESS:HistFactory -- Saved Measurement

@kratsg
Copy link
Contributor Author

kratsg commented Jan 22, 2025

Homebrewed root.

Lord Stark:~/pyhs3 (main)$ root --version
ROOT Version: 6.32.06
Built for macosxarm64 on Sep 21 2024, 18:21:53
From tags/6-32-06@6-32-06
Lord Stark:~/pyhs3 (main)$ which root
/opt/homebrew/bin/root
Lord Stark:~/pyhs3 (main)$ which hist2workspace
/opt/homebrew/bin/hist2workspace

@matthewfeickert
Copy link
Member

There are no Linux containers for 6.32.06 (at least provided by rootproject/root)

$ crane ls rootproject/root | grep '6\.32'
6.32.00-ubuntu22.04
6.32.00-ubuntu24.04
6.32.02-ubuntu22.04
6.32.02-ubuntu24.04
6.32.04-ubuntu24.04

@kratsg Do you have any way of testing with a different version of ROOT to see if this is a ROOT issue vs. a Homebrew issue? Can you reproduce this with a homebrew installed v6.32.2 version of ROOT (which we've already established in #2570 (comment) works fine with the conda-forge distribution)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Needs a maintainer to categorize and assign
Projects
None yet
Development

No branches or pull requests

3 participants