What s the naming convention in catalogs when using modular Kedro #beginners-need-help

What's the naming convention in catalogs when usin...

RRoger

02/09/2022, 4:48 AM

What's the naming convention in catalogs when using modular pipelines? In https://github.com/datajoely/modular-spaceflights *

catalog_01_raw.yml

has "companies", "reviews" (no prefixes) *

catalog_02_int.yml

has "data_ingestion.int_typed_companies", "data_ingestion.int_typed_shuttles" (prefix with dot) *

catalog_03_prm.yml

has "prm_shuttle_company_reviews", "prm_spine_table" (prefix with underscore) Or does is it a matter of taste?

ChainYo

02/09/2022, 5:46 AM

You can group everything inside the same catalog, it’s easier to have only one file

ChainYo

02/09/2022, 5:48 AM

I personally prefer to give a name to all my data and give eventually a path to the local file

datajoely

02/09/2022, 10:02 AM

If you ever want to have the data eningeering convention explained - check out this article I wrote a while back: https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71

antony.milne

02/09/2022, 10:09 AM

There's not really a kedro-defined convention here for naming datasets. Personally I try to avoid repetition, i.e. if

prm

is in the name of the catalog file I wouldn't use it in the name of the dataset. But it does depend on how your project is structured, so it might be that repeating the layer in the dataset name makes things easier for you. Note that the "prefix with dot" is actually a pipeline namespace, which has "real meaning" rather than just being the name of the dataset. Doing

layer.dataset_name

is not comparable to doing

layer_dataset_name

datajoely

02/09/2022, 10:23 AM

It's personal/team style to an extent - but I think the rule I try to follow is

write code for someone else to read... even if that person is future you

8 Views

Previous Next