Kastakin
05/13/2022, 7:20 AMkedro docker run
command. This command mounts as volumes the required data/conf/logs folders and then runs kedro inside of it.
All good! But now let's say I would like to migrate my finalised project from my development machine to the machine in my lab where we would run the pipeline directly, what are the steps needed to use the dockerized pipeline there?
The documentation suggest pushing the built docker image to the registry and then pull it on the "production" env but that doesn't bring with it neither the catalog, the folder structure of the data folder nor the Kedro CLI itself.antony.milne
05/13/2022, 8:00 AMkedro docker init
to first generate the Dockerfile and .dockerignore, modify them as you please, and then run kedro docker build
.
As for not including the kedro CLI itself, you have two options. Either pip install kedro-docker
on the machine where you want to do kedro docker run
or just run the raw docker
command itself. kedro docker run
is just a thin wrapper for `docker run`: https://github.com/kedro-org/kedro-plugins/blob/main/kedro-docker/kedro_docker/plugin.py#L210Kastakin
05/13/2022, 9:12 AMdata
and logs
folders myself if I wanted to decouple the container and the python env containing Kedro.
I will probably try to write a python script for my "production" deployment to automatically pull the image with the python Docker SDK and create the required folders before running the container.
I must admit that the deployment section is probably the more confusing part of the docs, I know that DevOps is a difficult and diverse topic but I think that it would be probably better to show how a start-to-finish deployment on a single machine could be carried out with Docker/Packaging/CLI like the Airflow example.antony.milne
05/16/2022, 9:21 AMkedro deploy airflow/prefect/docker
command, but this may be some time off. As you say, it's a tricky area because there's so many different tools out there and we're not necessarily well-versed in them! Even the relatively simple case of kedro-docker isn't very actively maintained and is probably ripe for a refresh. It's also a bit arbitrary which deployments get an official plugin vs. 3rd party plugin vs. a blog post somewhere vs. documentation page.