shaunc
02/21/2022, 4:03 PMantony.milne
02/21/2022, 4:55 PMSessionStore
is definitely meant to be a configurable piece with a defined API. The way you would customise this is through the SESSION_STORE_CLASS
and SESSION_STORE_ARGS
in your project's settings.py file: https://github.com/kedro-org/kedro/blob/develop/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/settings.py. The fact that SQLiteStore
is defined on kedro-viz is a temporary convenience while the feature is developed - at some point in the future, it should become part of core kedro.
b) That would be https://github.com/kedro-org/kedro/blob/326450b78e676fea440bde645c32637136d1d4cd/kedro/framework/session/store.py#L11, although as you can see from SQLiteSessionStore
, that's not a restrictive API that stops you from adding other pieces: https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/integrations/kedro/sqlite_store.py
c) Basically the version timestamp is used to uniquely identify a`kedro run` . This is recorded in the session store and is the same as the version used for versioning datasets. If a user has specified to load different versions using load_version
then this should also be available in the session store because it's part of the run command arguments.shaunc
02/21/2022, 5:05 PMclass BaseSessionStore(UserDict):
-- will this work with duck typing, or will I need to inherit from this. (Any chance of using a typing.Protocol here?)
2. read -> Dict[str, Any]
is a little opaque as to what the expectations are... 🙂
3. Apropos IDs, could I use the content-based hashes from DVC instead of a timestamp? (Are you using the timestamp as something other than an id? Could you make that configurable if so, perhaps? -- I can also add a timestamp but to coordinate would be nice if the two were optionally not the same thing.)SessionRepository
also going to be configurable?antony.milne
02/21/2022, 9:28 PMSQLiteSessionStore
is really the first "proper" session store we've ever had, and it was developed specifically for experiment tracking and is pretty new. So basically some of the behaviour here hasn't necessarily been fully figured out or has been left deliberately vague and open-ended, to be determined by future requirements and user feedback. Outside experiment tracking I'm sure you're the first person who has considered writing a custom session store. So you’re sort of hitting the limits of what kedro has well-defined here, and any thoughts you have are very much welcomed!
We actually have a ticket in this sprint to try and figure out whether we should have session_id == run_id == dataset save version (the timestamp), as is currently the case - you should definitely take a look here and leave a comment 🙂 e.g. if it would be useful to be able to set a custom run_id or control these properties independently. https://github.com/kedro-org/kedro/issues/1273
1. Currently this works with duck typing, but it's maybe not obvious what methods you need to provide so probably best to just inherit from BaseSessionStore
. Here's the only file where the session store is used I think: https://github.com/kedro-org/kedro/blob/main/kedro/framework/session/session.py. It seems that the requirements for a valid session store are that it can be initialised with certain arguments (see def _init_store
) and it exposes certain methods, some of which are only in UserDict
and not explicitly in BaseSessionStore
or any classes below that (like update
). Note that SQLiteSessionStore
doesn't actually define a read
method (not sure why actually). Defining the requirements properly using typing.Protocol sounds like a good idea, but I guess we'd have to figure out exactly what those requirements are first...RunsRepository
) is defined here: https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/data_access/repositories/runs.py . I don’t think this is likely to become configurable any time soon since it’s part of the data access layer of Kedro-Viz, which doesn’t have a system for injecting custom components like Kedro’s settings.py.shaunc
02/22/2022, 5:55 AMRunsRepository
really need to live in kedro viz? Don't know what other things kedro viz is keeping track of persistently, but could at least this be made into a separate service? Kedro could provide a default, but kedro-dvc
could override it, and kedro-viz wouldn't have to care which it was using.datajoely
02/22/2022, 9:01 AMlimdauto
02/22/2022, 9:16 AMRunsRepository
etc. are all quite specific to Kedro-Viz's needs at the moment.
What we are hoping to achieve is if the SQLite-based session store proves to be useful, we will iterate and stabilise the interface in Kedro-Viz and backport into Kedro in the future as first-class citizen. You are welcome to use it, but expect it to change.shaunc
02/22/2022, 2:10 PMdvc repro
- which updates the state in the working tree, without "marking", but storing results in the "run cache" (https://dvc.org/doc/user-guide/experiment-management#run-cache-automatic-log-of-stage-runs) and (2) via dvc exp run
- (https://dvc.org/doc/user-guide/experiment-management/running-experiments#running-the-pipelines) associates results with a git reference (utilizing git stash
machinery).
@User -- I'm hoping you don't have to go very deep at all, but will be able to rely on DVC! 🙂 You have concentrated on visualization, which isn't part of DVC (being in their premium offering, DVC Studio). If you can establish appropriate hooks, we can use Kedro-DVC to store experiments, and also to fork, publish, share, collect in one repo, etc. -- while still using Kedro to visualize them. But I would presume that, in order to visualize, Kedro-vis needs a mechanism to access stage metrics and plots across experiments. I'd rather not have to hack in a duplicate history, but would prefer an (pluggable) API in order to keep things clean and DRY. For you, this might have the benefit of separation of concerns, regardless of DVC, keeping Kedro-vis from getting bloated.limdauto
02/22/2022, 3:21 PMkedro run
-- and in this entrypoint it's 1 session 1 run. However, people can also start session in other entrypoints such as jupyter notebook and in theory can do many runs per session. Thanks for the link. Let me check it out tonight and will circle backshaunc
02/22/2022, 3:31 PMSESSION_STORE_CLASS
as well? Do you have any plans wrt to the rest of Kedro on how it uses what it consumes from this API? I would guess that, if I override to present a view of DVC experiments I'll end up breaking Kedro-viz at the moment. (?)limdauto
02/22/2022, 4:06 PMread
and save
interface.
Re compatibility with `Kedro-Viz`: yes, if you provide a custom implementation of the session store and ask your users to use it, the experimentatoin tracking tab on Kedro Viz won't work but other features should still be fine. But i think this will be the case at a product level right? Why would people use Kedro-Viz experimentation tracking if they choose to go with DvC?shaunc
02/22/2022, 4:10 PMlimdauto
02/22/2022, 4:16 PMshaunc
02/22/2022, 4:30 PMtrace
with DvC data dependencies.... So actually writing a Kedro-compatible session store for us probably is an issue for May.)
I'm hoping that thinking about it in this manner will also help you in your design.
[Just took a brief look at Neptune integration; seems to be pre-experiment-tracking -- not surprising 🙂 -- which offers comparisons between different pipelines. I also hope for a purely hook-based implementation... which is why I'm asking for hooks.]idanov
02/22/2022, 5:36 PMKedroSession
and the store, so it's hard for us to give you many useful details. We mainly mean to use the store as a way to save the details from each run, so we can visualise it in Viz or simply for investigative reasons. You pairing it with DVC sounds about right, we'd love to keep in touch with what you do with it. As at the moment it's fairly free form, probably just go with what seems most reasonable for you and then share it back with us. We'll consider your usecase when we are advancing the design and hopefully not make too many breaking changes on the way. At the moment Kedro Viz is making some assumptions, which may not hold in the future. We did that just to progress the Experiment Tracking work, but if changes arise, we'll update Viz to work with the new format.shaunc
02/22/2022, 5:49 PMRunsRepository
but I presume that that may be a bit more of a challenge to figure out how to abstract. If you are open to it would be happy to brainstorm at some point on what the requirements and interface should be.limdauto
02/22/2022, 5:54 PMshaunc
02/22/2022, 5:58 PMdvc.repo.Repo
instance to query a git repo and some internal DvC stuff. One question will be whether the interface will be returning -- e.g. -- paths to files with metrics, or the metrics themselves. (I guess the former?)