We are currently using Kedro to build our pipelines to build our ML models.
The kedro documentation explains how to deploy a Kedro pipeline on AWS Step Functions where every Kedro node is an AWS Lambda function or running the entire pipeline in Sagemaker.
But our team wanted to deploy the pipeline to AWS and have some nodes run as a lambda ,some nodes(like the node that trains the model) as a sagemaker train job, some nodes that take a long time to run in a ECS/Sagemaker process job.
Our team wrote a plugin which manages to do this. The developer just adds a tag to a node(like "lambda"/"ECS"/"Sagemaker_train"). After that if the dev runs a command thats added by the plugin we parse the pipeline and based on the tags we use CDK to deploy the pipeline. The plugin has the CDK code needed to then deploy the pipeline on AWS Step functions
Before the team puts more effort into optimizing/adding more features to the plugin i wanted to check if a solution already exists to do this.
And if our approach to deploy the Kedro pipeline to AWS is correct.
01/26/2022, 1:19 PM
Yes yes yes, a million times yes. Utilising tags to signal custom compute for a node is a brilliant idea. If you could open source this or even blog about this, it would be immensely helpful.
02/01/2022, 3:32 PM
Thanks! Our team mostly has ML engineers focused on building models. So dedicating time to open source this will be difficult. But i'll definitely discuss this with my teammates. We'll try to blog about this.