Hello guys, I am working on reformatting a ML proj...
# advanced-need-help
Hello guys, I am working on reformatting a ML project to Kedro. Basically in the project I have three pipeline, data engineering, data science and prediction. Along with having main nodes for these pipeline I also have lot of helper/utility functions which need to be reformatted into kedro somewhere. I am unsure how I should structure these helper functions. Whether I should put them down as sub-pipelines or use them as is in the form of helper scripts. I would like to know what's the Kedro standard in this use case. TIA
So this is more of an art than a science
My view is that you should have very little business logic in your nodes
And simply call other packages within them
Happy to help you think through in more detail
So basically I should keep the helper scripts as is and only put the relevant logic in the nodes. Does it makes sense to convert helper functions to nodes as well and have a sort of helper pipeline?
So it does depend on the complexity and contents of your helper scripts
If they're pure python functions which don't do any IO then they're ready
Especially if they're already tested!
Yeah, they are mostly pure python functions
and they are already tested
So then I'd focus on readability and maintainability
Kedro nodes should be simple and in general just string together logic defined in other places
So it sounds like you're in a good place
Other bits of advice: Feel free to
kedro pipeline create
many single purpose pipelines, they can be combined easily and namespaced for both your mental model and visualisation
It makes sense. Thanks for your suggestion @datajoely.
Good luck
Shout if you need any sense check
Something like this @Mirko