What's the naming convention in catalogs when usin...
# beginners-need-help
r
What's the naming convention in catalogs when using modular pipelines? In https://github.com/datajoely/modular-spaceflights *
catalog_01_raw.yml
has "companies", "reviews" (no prefixes) *
catalog_02_int.yml
has "data_ingestion.int_typed_companies", "data_ingestion.int_typed_shuttles" (prefix with dot) *
catalog_03_prm.yml
has "prm_shuttle_company_reviews", "prm_spine_table" (prefix with underscore) Or does is it a matter of taste?
c
You can group everything inside the same catalog, it’s easier to have only one file
I personally prefer to give a name to all my data and give eventually a path to the local file
d
If you ever want to have the data eningeering convention explained - check out this article I wrote a while back: https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71
a
There's not really a kedro-defined convention here for naming datasets. Personally I try to avoid repetition, i.e. if
prm
is in the name of the catalog file I wouldn't use it in the name of the dataset. But it does depend on how your project is structured, so it might be that repeating the layer in the dataset name makes things easier for you. Note that the "prefix with dot" is actually a pipeline namespace, which has "real meaning" rather than just being the name of the dataset. Doing
layer.dataset_name
is not comparable to doing
layer_dataset_name
.
d
It's personal/team style to an extent - but I think the rule I try to follow is
write code for someone else to read... even if that person is future you
6 Views