datajoely
08/11/2021, 9:00 AMBertozzo
08/11/2021, 5:14 PMBertozzo
08/11/2021, 5:16 PMWolVez
08/11/2021, 8:17 PMIgnacio
08/12/2021, 7:11 AMregister_config_loader
hook under src/<package_name>/hooks.py
.
__**Example**__
from pathlib import Path
class ProjectHooks:
"""Project hooks."""
@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
# Force local as ultimate overriding params, regardless of env chosen.
conf_paths.append((Path(CONF_ROOT) / "local"))
return TemplatedConfigLoader(conf_paths, globals_pattern="*globals.yml")
In this case, the inheritance will be base
-> <custom_env>
-> local
. You can append other envs to conf_paths
to customize this behavior.WolVez
08/17/2021, 2:38 PMwaylonwalker
08/17/2021, 4:02 PM0.14.x
project that had a very deeply nested set of pipelines. You can easily change up the keys in your register pipelines module. You can make up rules that make most sense for your team. The core of what find-kedro
does to create pipelines based on name (quite similar to how pytests picks up tests) will still be there, you will just make a slight tweak to make it cleaner for your project.waylonwalker
08/17/2021, 4:13 PMWolVez
08/17/2021, 4:16 PMWolVez
08/17/2021, 4:17 PMwaylonwalker
08/18/2021, 1:24 PMJacobJeppesen
08/19/2021, 7:44 AMdatajoely
08/19/2021, 8:03 AManhoang
08/19/2021, 5:40 PMfile_A
, file_B
, file_C
). I want the folder that this pipeline runs in have its own dynamically generated data catalog so other people can go in and inspect the results from the pipeline easily by just
taking the example from https://kedro.readthedocs.io/en/latest/05_data/01_data_catalog.html#configuring-a-data-catalog , is it possible to do this:
python
io = DataCatalog(
{
"bikes": CSVDataSet(filepath="../data/01_raw/bikes.csv"),
"cars": CSVDataSet(filepath="../data/01_raw/cars.csv", load_args=dict(sep=",")),
"cars_table": SQLTableDataSet(
table_name="cars", credentials=dict(con="sqlite:///kedro.db")
),
"scooters_query": SQLQueryDataSet(
sql="select * from cars where gear=4",
credentials=dict(con="sqlite:///kedro.db"),
),
"ranked": ParquetDataSet(filepath="ranked.parquet"),
}
)
and then do io.to_config()
? we have io.from_config()
but not io.to_config()
to generate YAML file from the data catalog objectwaylonwalker
08/19/2021, 8:20 PMwaylonwalker
08/19/2021, 8:22 PMpython gen_catalog.py > catalog.yml
, End of the day yaml is still all that goes into the project and it's just a quick shortcut for me to generate a bunch of entries quick.anhoang
08/19/2021, 8:31 PManhoang
08/19/2021, 8:32 PMDataCatalog
like the example above into yaml? Would greatly appreciate it!!!waylonwalker
08/19/2021, 8:38 PMwaylonwalker
08/19/2021, 8:39 PManhoang
08/19/2021, 8:39 PMDataCatalog
object yaml, and parameterize the data catalog generating script to generate different number of datasets (potentially with no intersection between the two sets) in different environmentsanhoang
08/19/2021, 8:41 PMkedro create_catalog
as a starting point then manually edit) 🙂waylonwalker
08/19/2021, 8:58 PManhoang
08/19/2021, 9:13 PMmini-kedro
minimal starter and have not needed pipelines yetanhoang
08/19/2021, 9:16 PMpython
import inspect
from kedro.extras.datasets.pandas import CSVDataSet
full_class_name = inspect.getclasstree([CSVDataSet])[-1][0][0] #kedro.extras.datasets.pandas.csv_dataset.CSVDataSet
full_class_name_str = str(full_class_name) #<class 'kedro.extras.datasets.pandas.csv_dataset.CSVDataSet'>
yaml_class_str = full_class_name_str.partition("kedro.extras.datasets.")[-1].strip("'>")
print(yaml_class_str) #pandas.csv_dataset.CSVDataSet
WolVez
08/19/2021, 9:21 PM__main__.py
file attempting to configure the pipelines prior to the session setting the Context with configure_project(Path(__file__).parent.name)
. Because the data files are not necessarily saved within cwd it is causing the pipeline registration to fail. I tried to create a ConfigLoader with the correct location if a session wasn't present, but this just seems to make the entire pipeline hang. Any idea how to get around configure_project
?Lorena
08/20/2021, 9:11 AMconfigure_project
can't/shouldn't be bypassed as that's where settings and pipelines are (lazily) configured, in order to a) be able to import them anywhere in a project, and b) use them in the framework code. If you really really need the parameters, I suggest recreating the configloader logic of fetching the parameters in a helper function that you can call in the node. But generally dynamically generated pipelines are to be avoided if you can, I'm curious what your use case is maybe there's an alternative?WolVez
08/20/2021, 2:06 PMuser
08/22/2021, 11:06 PM