Agreed with all of
@User's points above. For a while I've wanted to compile a sort of kedro best practice guide which would cover this sort of stuff. Just to add some points on pipeline and directory structure:
* use modular pipelines and the directory/file structure they give you
* for a sufficiently complex modular pipeline, your nodes.py will grow too big to be maintainable. In this case you should split it into multiple files
* one way to organise this is to have one module (python file) per node. Each node module should expose a top-level node function at the top. Any helper functions specific to that node should be defined in the same file but are private (prefix the function name with
_
)
* any helper functions shared between nodes in the
same modular pipeline go in
utils.py
within that modular pipeline
* any helper functions shared between nodes in the
multiple modular pipeline go in
utils.py
(or even a directory
utils
) in
src/project_name