Hello 🙂 I'm using Kedro to train a model and pred...
# beginners-need-help
a
Hello 🙂 I'm using Kedro to train a model and predict values with this model afterwards. The model is saved versioned after training and for prediction I usually take the lastes version. But if I want to use a specific version, I have to define this in the terminal after kedro run (I read on the Kedro documentation). Is there an option to get the defined version? I want to write a log for my prediction to keep things reproducible. Thanks, Anna
a
Hello! The best way to do this is by defining an
after_catalog_created
hook a bit like this one: https://kedro.readthedocs.io/en/stable/extend_kedro/hooks.html#hook-implementation
load_versions
will be a dictionary of the form
{dataset_name: load_version}
, which you can then log by using
self._logger.info(load_versions)
You might also like to check out the experiment tracking functionality which is exactly for this sort of thing: https://kedro.readthedocs.io/en/stable/tutorial/set_up_experiment_tracking.html
This way you'd save the prediction as a
MetricsDataSet
. Each time you do a kedro run, the run command you use is saved as part of the information that can be showed in kedro-viz. So you can ensure reproducibility that way
a
Thanks a lot! I will try to do this 😊
Hi again, I tried different things but it's still not working for me. I used
after catalog created
but I don‘t understand, how to access the load version… I do have a catalog.yml and some information from my in catalog.yml specified information is added to a logfile. I also have a hooks.py, where I added
after_catalog_created
(some parameters, one of it is
load_versions: Dict[str, str]
and I return
DataCalatlog.from_config(..., load_versions, ...)
. In catalog.yml I predefine a
model
(with type (PickleDataSet), datapath, versioned=True, layer). I start the pipeline with
kedro run --pipeline pr --load-version="model:2022-05-15T05.24.31.017Z
. How can I accedd now this load version (2022.05.15...)?
a
You'd need to do something like this:
Copy code
# hooks.py
class DataCatalogHooks:
    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def after_catalog_created(self, catalog: DataCatalog, load_versions: Dict[str, str]) -> None:
        self._logger.info(load_versions)
And then make sure
DataCatalogHooks()
is in
HOOKS
in settings.py 🙂
a
It worked, thanks a lot! 😊 I didn‘t write load_versions into the log file… 😅