anhoang
08/19/2021, 5:40 PMfile_A , file_B, file_C). I want the folder that this pipeline runs in have its own dynamically generated data catalog so other people can go in and inspect the results from the pipeline easily by just
taking the example from https://kedro.readthedocs.io/en/latest/05_data/01_data_catalog.html#configuring-a-data-catalog , is it possible to do this:
python
io = DataCatalog(
{
"bikes": CSVDataSet(filepath="../data/01_raw/bikes.csv"),
"cars": CSVDataSet(filepath="../data/01_raw/cars.csv", load_args=dict(sep=",")),
"cars_table": SQLTableDataSet(
table_name="cars", credentials=dict(con="sqlite:///kedro.db")
),
"scooters_query": SQLQueryDataSet(
sql="select * from cars where gear=4",
credentials=dict(con="sqlite:///kedro.db"),
),
"ranked": ParquetDataSet(filepath="ranked.parquet"),
}
)
and then do io.to_config()? we have io.from_config() but not io.to_config() to generate YAML file from the data catalog objectdatajoely
08/19/2021, 5:41 PMdatajoely
08/19/2021, 5:44 PMdatajoely
08/19/2021, 5:45 PMdatajoely
08/19/2021, 5:45 PMdatajoely
08/19/2021, 5:46 PMdatajoely
08/19/2021, 5:47 PManhoang
08/19/2021, 5:49 PManhoang
08/19/2021, 5:51 PMbikes, cars, cars_table, scooters_query and rank to YAML from pythondatajoely
08/19/2021, 5:51 PMdatajoely
08/19/2021, 5:52 PMDataCatalog object you've created directlyanhoang
08/19/2021, 5:57 PMparam1_X_param2_Y with files [bikes.csv, cars.csv, etc] and a data catalog that documents these datasets. Another folder param1_A_param2_B with the same set of files but content of the files are different
This way, another person can go into a folder and explore these datasets for each parameter combinations and do subsequent analyses without worrying the filepathsanhoang
08/19/2021, 5:59 PMkedro jupyter or initialize a DataCatalog object that points to the folder param1_X_param2_Y and load one set of filesdatajoely
08/19/2021, 5:59 PMdatajoely
08/19/2021, 5:59 PManhoang
08/19/2021, 6:00 PMDataCatalog pointing to param1_A_param2_B will load same dataset names but different filesanhoang
08/19/2021, 6:00 PMdatajoely
08/19/2021, 6:00 PManhoang
08/19/2021, 6:00 PMdatajoely
08/19/2021, 6:00 PMdatajoely
08/19/2021, 6:00 PMdatajoely
08/19/2021, 6:01 PMdatajoely
08/19/2021, 6:01 PMexport KEDRO_ENV=testdatajoely
08/19/2021, 6:01 PMdatajoely
08/19/2021, 6:02 PMdatajoely
08/19/2021, 6:02 PMbase and localdatajoely
08/19/2021, 6:02 PManhoang
08/19/2021, 6:04 PMbase and local, for example when you work in cloud you need to output additional files? I thought that the number of datasets and what they are have to be exactly the same in every environmentsdatajoely
08/19/2021, 6:04 PManhoang
08/19/2021, 6:04 PMDataCatalog in the example above into YAML?datajoely
08/19/2021, 6:05 PMdatajoely
08/19/2021, 6:05 PMdatajoely
08/19/2021, 6:05 PMdatajoely
08/19/2021, 6:05 PMto_yaml mechanismdatajoely
08/19/2021, 6:05 PMdatajoely
08/19/2021, 6:05 PManhoang
08/19/2021, 6:06 PMdatajoely
08/19/2021, 6:06 PMdatajoely
08/19/2021, 6:07 PMdatajoely
08/19/2021, 6:09 PMdatajoely
08/19/2021, 6:09 PManhoang
08/19/2021, 6:09 PMA, B, C when run in environment A but outputs dataset A, B, D, E when run in environment B?datajoely
08/19/2021, 6:09 PMdatajoely
08/19/2021, 6:10 PMdatajoely
08/19/2021, 6:10 PManhoang
08/19/2021, 6:10 PMMemoryDataset for every missing dataset not in the catalog right?datajoely
08/19/2021, 6:10 PMdatajoely
08/19/2021, 6:10 PManhoang
08/19/2021, 6:11 PManhoang
08/19/2021, 6:13 PMDataCatalog.to_yaml somewhere so thought it was a beginner question lol 😆datajoely
08/19/2021, 6:14 PMdatajoely
08/19/2021, 6:14 PM