Jeff Daly: Problem solving environment (general, target subsurface)
- Input files
- Output files
- Execution machine
- When run?
- Minimum necessary to reproduce same executable
- Capture paths, names for executable
- Not version information
- Best effort only
Ben Allan: Current practice: archive binaries (whole operating
systems) for later execution in VMs. Examples: weapons, environmental
management
Ian Gorton: heading towards versioning of models, inputs, etc.
Need: identify which components (and versions) constituted the
application at any moment in time. Track over time since application
might be dynamic.
Jeff: Sufficient for CCA app to dump file w/ provenance info for ingestion
Jeff: Data model: triple store w/ "standard" terms
Ben: BuilderService
? events are not sufficient because can
create SIDL objects
David: BS events give the equivalent of current "application" identity
assuming it consists entirely of CCA components. Recognize
limitations, but accept. Instrumented loader would have a much harder
time giving info that can be connected up to things the user/developer
can recognize.
David: Some groups collect much more detailed info about the build
(compiler flags, versions of key OS elements). Want to have enough
detail to rebuild the same executables. Want to be able to identify if
built code is modified from something they've captured in an SVN
repo. Example: Tech-X
Need a mechanism for component (build) to record provenance details,
and make them available
Dynamically assembled application use case (Wael's approach)
- Component path gives all available components
- Runtime provenance/version statement from components as instantiated appears in output stream
Dynamically assembled application use case (David's approach)
- BuilderService? provides info about components in application
- Provenance service collects provenance info and reports it in unified form
Kosta's work on data provenance: in HPC contexts probably gives too
much information.
Idea: extract provenance information during build (probably part of
bocca-generated build system), stuff into (a) component metadata (.cca
file), and (b) auto-generated provenance port.
Provenance info
- Version number/string
- Some Dublin Core info
Consider XML markup which can be parsed downstream (provenance service
or info management app) to pick out desired info
Look at what other groups are capturing: Open Speedshop, PERI
participants. Want to be sure we can accommodate it in any CCA
approach.