Title
#beginners-need-help
r

RRoger

02/09/2022, 4:48 AM
What's the naming convention in catalogs when using modular pipelines? In https://github.com/datajoely/modular-spaceflights *
catalog_01_raw.yml
has "companies", "reviews" (no prefixes)*
catalog_02_int.yml
has "data_ingestion.int_typed_companies", "data_ingestion.int_typed_shuttles" (prefix with dot)*
catalog_03_prm.yml
has "prm_shuttle_company_reviews", "prm_spine_table" (prefix with underscore) Or does is it a matter of taste?
ChainYo

ChainYo

02/09/2022, 5:46 AM
You can group everything inside the same catalog, it’s easier to have only one file
5:48 AM
I personally prefer to give a name to all my data and give eventually a path to the local file
datajoely

datajoely

02/09/2022, 10:02 AM
If you ever want to have the data eningeering convention explained - check out this article I wrote a while back: https://towardsdatascience.com/the-importance-of-layered-thinking-in-data-engineering-a09f685edc71
antony.milne

antony.milne

02/09/2022, 10:09 AM
There's not really a kedro-defined convention here for naming datasets. Personally I try to avoid repetition, i.e. if
prm
is in the name of the catalog file I wouldn't use it in the name of the dataset. But it does depend on how your project is structured, so it might be that repeating the layer in the dataset name makes things easier for you. Note that the "prefix with dot" is actually a pipeline namespace, which has "real meaning" rather than just being the name of the dataset. Doing
layer.dataset_name
is not comparable to doing
layer_dataset_name
.
datajoely

datajoely

02/09/2022, 10:23 AM
It's personal/team style to an extent - but I think the rule I try to follow is
write code for someone else to read... even if that person is future you