Hello guys I am working on reformatting a ML project to Kedr Kedro #advanced-need-help

Hello guys, I am working on reformatting a ML proj...

Rjify

04/20/2022, 7:01 PM

Hello guys, I am working on reformatting a ML project to Kedro. Basically in the project I have three pipeline, data engineering, data science and prediction. Along with having main nodes for these pipeline I also have lot of helper/utility functions which need to be reformatted into kedro somewhere. I am unsure how I should structure these helper functions. Whether I should put them down as sub-pipelines or use them as is in the form of helper scripts. I would like to know what's the Kedro standard in this use case. TIA

datajoely

04/20/2022, 7:02 PM

So this is more of an art than a science

datajoely

04/20/2022, 7:02 PM

My view is that you should have very little business logic in your nodes

datajoely

04/20/2022, 7:03 PM

And simply call other packages within them

datajoely

04/20/2022, 7:04 PM

Happy to help you think through in more detail

Rjify

04/20/2022, 7:10 PM

So basically I should keep the helper scripts as is and only put the relevant logic in the nodes. Does it makes sense to convert helper functions to nodes as well and have a sort of helper pipeline?

datajoely

04/20/2022, 7:10 PM

So it does depend on the complexity and contents of your helper scripts

datajoely

04/20/2022, 7:10 PM

If they're pure python functions which don't do any IO then they're ready

datajoely

04/20/2022, 7:11 PM

Especially if they're already tested!

Rjify

04/20/2022, 7:12 PM

Yeah, they are mostly pure python functions

Rjify

04/20/2022, 7:12 PM

and they are already tested

datajoely

04/20/2022, 7:12 PM

So then I'd focus on readability and maintainability

datajoely

04/20/2022, 7:14 PM

Kedro nodes should be simple and in general just string together logic defined in other places

datajoely

04/20/2022, 7:14 PM

So it sounds like you're in a good place

datajoely

04/20/2022, 7:15 PM

Other bits of advice: Feel free to

kedro pipeline create

many single purpose pipelines, they can be combined easily and namespaced for both your mental model and visualisation

Rjify

04/20/2022, 7:17 PM

It makes sense. Thanks for your suggestion @datajoely.

datajoely

04/20/2022, 7:17 PM

Good luck

datajoely

04/20/2022, 7:17 PM

Shout if you need any sense check

datajoely

04/20/2022, 7:17 PM

Rjify

04/26/2022, 8:09 PM

Something like this @Mirko

Previous Next