physics_benchmarks issueshttps://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/issues2021-10-06T22:37:30Zhttps://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/issues/21Fix the DVMP pipeline2021-10-06T22:37:30ZWhitney ArmstrongFix the DVMP pipelineThe `run_many.py` is a pattern I wish not to be repeated. It is much more clear if the [parallel feature](https://docs.gitlab.com/ee/ci/yaml/#parallel) is used.
More importantly, don't redirect any output because that is the only way we ...The `run_many.py` is a pattern I wish not to be repeated. It is much more clear if the [parallel feature](https://docs.gitlab.com/ee/ci/yaml/#parallel) is used.
More importantly, don't redirect any output because that is the only way we can debug via the CI. Or [always upload the artifacts](https://docs.gitlab.com/ee/ci/yaml/#artifactswhen).
Job [#280481](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/jobs/280481) failed for 00b686316e115a87ca78a582259addef2d20cf77:Sylvester JoostenSylvester Joostenhttps://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/issues/8Benchmark definition standard2020-12-23T23:36:39ZWhitney ArmstrongBenchmark definition standardHow can we define each benchmark and the metric on which it succeeds?
For example, detection efficiency might detect 80% of events with some Q2 cut and we want it to fail lower than 95%. Could we just have a json file like the following...How can we define each benchmark and the metric on which it succeeds?
For example, detection efficiency might detect 80% of events with some Q2 cut and we want it to fail lower than 95%. Could we just have a json file like the following?
```
{ "name": "My Q2 cut",
"description":"Some Q2 cut that we expect high eff.",
"quantity":"efficiency",
"benchmark":"0.95",
"value":"0.80"
}
```
Should we think of this as a "benchmark" or a "test"?
I guess a "benchmark" could be comprised of one or more of these "tests"
```
{ benchmark : "DVCS in central",
test_results: [
{ "name": "My Q2 cut",
"description":"Some Q2 cut that we expect high eff.",
"quantity":"efficiency",
"goal_threshold":"0.95",
"value":"0.80",
"weight": "1.0"
},
{ "name": "Coplanarity analysis",
...
},
...
],
performance_limit "4.5"
performance_goal : "4",
performance: "4.1",
successful_goals: "5",
total_goals: "6"
}
```
where `performance_limit` is computed from the weights:
```math
P_{limit} = \sum_{tests}^i w_i
```
and the actual performance includes only passing tests:
```math
P = \sum_{tests passed}^i w_i\
```
This assumes a all tests are pass/fail can probably be relaxed to a measure between [0,1].
Thoughts? @sly2j @cpeng @jihee.kim @PolakovicWhitney ArmstrongWhitney Armstrong2020-12-01