07/06/2021, 9:37 AM
@User thanks for that question! I would give five tips (or at least things that worked quite well for me): 1. Prefer code over yaml API wherever possible - it really allows you to leverage all the features of pycharm and access certain features that the yaml API doesn't provide. YAML API has readability advantages but being able to dynamically generate pipelines based on configuration files is quite powerful. Also, easier to find errors when using the code api. 2. Use templatedconfiguration files whereever possible! Using jinja syntax in the config files is really helpful! Also, good to have conventions about parameter and catalog files --> better to have many small understandable configs rather than huge config files. Creating an understandable directory structure helps! 3. Master debugging kedro pipelines and reproducing errors --> A colleague may see an error on their side that you're not able to reproduce due to the: - Current state of their code (git working state); maybe a fix was pushed that they haven't retrieved - Find out the current state of their pipeline; maybe they retrieved the fix but didn't rerun the affected nodes --> partial pipeline runs are your friend 4. Use kedro starters for recurring projects! Developing these templates has been a tricky learning path since you're constantly enriching the starter as well as generating a project from that template to see that everything works. I've been using some hacky solutions that work for me but there are probably smarter workflows that I'm not aware of 5. When using notebooks in a kedro project, the first few cells should contain the git commit SHA and any versioning configurations for the catalog entries. It becomes difficult to reproduce notebook results since catalog entries keep changing as the project evolves