Hello Everyone,
I'm new to kedro and have had success with a use-case for tracking performance of a traditional ML model with variable architectures, making
changes to input parameters, and saving the reporting results. I'm looking now to use the same data but applied to fundamentally different architectures and with different evaluation criteria.
Specifically, I'd like to be able to use deep learning frameworks and hand crafted algorithms.
The hand crafted algorithms are simple mathematical operations wrapped in a class to be deployed to firmware.
What are the best practices with respect to kedro for this to be
1) easily scaled
2) have facile integration with kedro-mlflow in the future
My current data flow is as follows :
data load -> preprocessing -> feature calculations -> model training -> evaluation
where model training contains the model specifications
As I understand it, I have the following options:
1) route which nodes to use within the model training pipeline using parameters e.g. a parameter that says architecture_type and routes the data flow accordingly
2) determine node logic via parameters which specify the architecture (similar to above)
3) each fundamentally different architecture gets its own pipeline: 1)traditional ML 2) deep learning 3) hand crafted algos
routed at pipeline registry level
4) implement modular pipelines for these 3 cases
My judgement on this, are that options 1 and 2 do not scale well, are not good practice and seem to be ridiculous.
Option 4 is attractive but I don't know whether the modular pipeline framework will be sufficiently flexible. Furthermore it seems from reading other posts in here that this may complicate tracking
runs with mlflow (multiple models being saved within the same run) .
Thus, I'm leaning towards option 3 to start and if I need additional granularity I can make modular pipelines within those 3 categories.
Would really appreciate any kind of feedback, clarification or advice.
Thanks!