Skip to content

Benchmark definition standard

How can we define each benchmark and the metric on which it succeeds?

For example, detection efficiency might detect 80% of events with some Q2 cut and we want it to fail lower than 95%. Could we just have a json file like the following?

{ "name": "My Q2 cut",
  "description":"Some Q2 cut that we expect high eff.",
  "quantity":"efficiency",
  "benchmark":"0.95",
  "value":"0.80"
}

Should we think of this as a "benchmark" or a "test"?

I guess a "benchmark" could be comprised of one or more of these "tests"

{ benchmark : "DVCS in central",
  test_results: [
         { "name": "My Q2 cut",
           "description":"Some Q2 cut that we expect high eff.",
           "quantity":"efficiency",
           "goal_threshold":"0.95",
           "value":"0.80",
           "weight": "1.0"
         },
         { "name": "Coplanarity analysis",
            ...
         },
         ...
         ],
  performance_limit "4.5"
  performance_goal : "4",
  performance: "4.1",
  successful_goals: "5",
  total_goals: "6"
}

where performance_limit is computed from the weights:

 P_{limit} = \sum_{tests}^i w_i

and the actual performance includes only passing tests:

 P = \sum_{tests passed}^i w_i\

This assumes a all tests are pass/fail can probably be relaxed to a measure between [0,1].

Thoughts? @sly2j @cpeng @jihee.kim @Polakovic