I've got a question regarding the deployment/porta...
# beginners-need-help
I've got a question regarding the deployment/portability of a pipeline created with Kedro from one machine to another. Idon't really know if this is the correct channel for it but here it goes: Following the corresponding section in the docs and the documentation specific to the Kedro-Docker plugin I've been able to create a Docker image of my Kedro project that I can run with the
kedro docker run
command. This command mounts as volumes the required data/conf/logs folders and then runs kedro inside of it. All good! But now let's say I would like to migrate my finalised project from my development machine to the machine in my lab where we would run the pipeline directly, what are the steps needed to use the dockerized pipeline there? The documentation suggest pushing the built docker image to the registry and then pull it on the "production" env but that doesn't bring with it neither the catalog, the folder structure of the data folder nor the Kedro CLI itself.
Hello! There are some things that are deliberately not added to the docker image. These are described in the .dockerignore file: https://github.com/kedro-org/kedro-plugins/blob/main/kedro-docker/kedro_docker/template/.dockerignore If you wish to include them then the right flow would be to run
kedro docker init
to first generate the Dockerfile and .dockerignore, modify them as you please, and then run
kedro docker build
. As for not including the kedro CLI itself, you have two options. Either
pip install kedro-docker
on the machine where you want to do
kedro docker run
or just run the raw
command itself.
kedro docker run
is just a thin wrapper for `docker run`: https://github.com/kedro-org/kedro-plugins/blob/main/kedro-docker/kedro_docker/plugin.py#L210
Thanks for the tips! I already had a go at modifying the default Dockerfile and .dockerignore and I was quite positive that I would need to provide the require folder structure for the
folders myself if I wanted to decouple the container and the python env containing Kedro. I will probably try to write a python script for my "production" deployment to automatically pull the image with the python Docker SDK and create the required folders before running the container. I must admit that the deployment section is probably the more confusing part of the docs, I know that DevOps is a difficult and diverse topic but I think that it would be probably better to show how a start-to-finish deployment on a single machine could be carried out with Docker/Packaging/CLI like the Airflow example.
@Kastakin thanks for the feedback - I definitely agree. Ultimately we're looking to improve the deployment model as a whole, e.g. maybe through some
kedro deploy airflow/prefect/docker
command, but this may be some time off. As you say, it's a tricky area because there's so many different tools out there and we're not necessarily well-versed in them! Even the relatively simple case of kedro-docker isn't very actively maintained and is probably ripe for a refresh. It's also a bit arbitrary which deployments get an official plugin vs. 3rd party plugin vs. a blog post somewhere vs. documentation page.