CCA Wiki
CCA Software Resources
Menu [hide]

Provenance BOF 2009-07-24 Notes

Notes from a BOF on provenance in component environments at the Summer 2009 CCA Forum
print PDF
Jeff Daly: Problem solving environment (general, target subsurface)
  • Input files
  • Output files
  • Execution machine
  • When run?
  • Minimum necessary to reproduce same executable
  • Capture paths, names for executable
  • Not version information
  • Best effort only

Ben Allan: Current practice: archive binaries (whole operating
systems) for later execution in VMs. Examples: weapons, environmental
management

Ian Gorton: heading towards versioning of models, inputs, etc.

Need: identify which components (and versions) constituted the
application at any moment in time. Track over time since application
might be dynamic.

Jeff: Sufficient for CCA app to dump file w/ provenance info for ingestion

Jeff: Data model: triple store w/ "standard" terms

Ben: BuilderService? events are not sufficient because can
create SIDL objects

David: BS events give the equivalent of current "application" identity
assuming it consists entirely of CCA components. Recognize
limitations, but accept. Instrumented loader would have a much harder
time giving info that can be connected up to things the user/developer
can recognize.

David: Some groups collect much more detailed info about the build
(compiler flags, versions of key OS elements). Want to have enough
detail to rebuild the same executables. Want to be able to identify if
built code is modified from something they've captured in an SVN
repo. Example: Tech-X

Need a mechanism for component (build) to record provenance details,
and make them available

Dynamically assembled application use case (Wael's approach)
  • Component path gives all available components
  • Runtime provenance/version statement from components as instantiated appears in output stream

Dynamically assembled application use case (David's approach)
  • BuilderService? provides info about components in application
  • Provenance service collects provenance info and reports it in unified form

Kosta's work on data provenance: in HPC contexts probably gives too
much information.

Idea: extract provenance information during build (probably part of
bocca-generated build system), stuff into (a) component metadata (.cca
file), and (b) auto-generated provenance port.

Provenance info
  • Version number/string
  • Some Dublin Core info

Consider XML markup which can be parsed downstream (provenance service
or info management app) to pick out desired info

Look at what other groups are capturing: Open Speedshop, PERI
participants. Want to be sure we can accommodate it in any CCA
approach.

Created by: bernhold last modification: Monday 03 of August, 2009 [12:37:28 UTC] by bernhold


Online users
9 online users