https://kedro.org/ logo
Title
u

user

08/22/2021, 11:06 PM
Hi, I am very new to this. I might be missing something but it seems we can only input as raw data file by file. In the catalog, each entry seems to be only one file. However, my raw data is an entire directory from which I need to load data individually. I was wondering if there was a way to pass into a pipeline a directory as inputs instead of specific catalog entry that are each related to a file? Can we have in the data catalog a directory instead of a file? Sorry if this seems completely obvious...
d

datajoely

08/23/2021, 8:22 AM
Hi @User - If understand correctly you have many files which need to be unioned together?
Our spark datasets do this by default, you can use a
*
character if you want
are all the files of a predictable naming convention?
u

user

08/23/2021, 8:28 AM
Yes that is correct. There are some informations in the the filename (ex: date of the data collection) and the files are all of a predictable naming convention. I am currently using the file name to chose which files I want to load (for example between two given dates). I want to try to use kedro for my analysis and be able to specify such things.
d

datajoely

08/23/2021, 8:29 AM
Okay so there are two ways to do this
1. we support jinja2 in YAML so that if you want to autogenerate the catalog entries you can use a loop and essentially replicate the catalog definition for every dataset
2. You could define a custom dataset that uses something like glob to find all files in a directory and combine them together
u

user

08/23/2021, 8:34 AM
I'll look into this thank you!
Another question, I can't seem to load my custom dataset. I am following this https://kedro.readthedocs.io/en/stable/07_extend_kedro/03_custom_datasets.html and when I load in the python console with context.catalog.load i get "name 'context' is not defined". What do you use to load a custom dataset in the python console ?
d

datajoely

08/23/2021, 5:22 PM
is this after running
kedro ipython
?
u

user

08/23/2021, 5:51 PM
Oh that's it sorry I was just running ipython on its own
Thanks!
d

datajoely

08/23/2021, 5:56 PM
👌