Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
P
physics_benchmarks
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 2
    • Merge Requests 2
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Metrics
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • EIC
  • benchmarks
  • physics_benchmarks
  • Issues
  • #8

Closed
Open
Opened Nov 25, 2020 by Whitney Armstrong@whitOwner

Benchmark definition standard

How can we define each benchmark and the metric on which it succeeds?

For example, detection efficiency might detect 80% of events with some Q2 cut and we want it to fail lower than 95%. Could we just have a json file like the following?

{ "name": "My Q2 cut",
  "description":"Some Q2 cut that we expect high eff.",
  "quantity":"efficiency",
  "benchmark":"0.95",
  "value":"0.80"
}

Should we think of this as a "benchmark" or a "test"?

I guess a "benchmark" could be comprised of one or more of these "tests"

{ benchmark : "DVCS in central",
  test_results: [
         { "name": "My Q2 cut",
           "description":"Some Q2 cut that we expect high eff.",
           "quantity":"efficiency",
           "goal_threshold":"0.95",
           "value":"0.80",
           "weight": "1.0"
         },
         { "name": "Coplanarity analysis",
            ...
         },
         ...
         ],
  performance_limit "4.5"
  performance_goal : "4",
  performance: "4.1",
  successful_goals: "5",
  total_goals: "6"
}

where performance_limit is computed from the weights:

 P_{limit} = \sum_{tests}^i w_i

and the actual performance includes only passing tests:

 P = \sum_{tests passed}^i w_i\

This assumes a all tests are pass/fail can probably be relaxed to a measure between [0,1].

Thoughts? @sly2j @cpeng @jihee.kim @Polakovic

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
Dec 1, 2020
Due date
Dec 1, 2020
Reference: EIC/benchmarks/physics_benchmarks#8