https://kedro.org/ logo
Title
n

nickolas da rocha machado

10/10/2022, 2:47 PM
Is anyone having problems with adlfs + partitioned dataset + parallel runner? Apparently, the dataset can't retrieve partitions from blob storage when using this combination. In my tests, it might be something related to asyncio calls inside adlfs glob function.
python
# pipeline_registry.py
pipelines["partitioned"] = Pipeline([node(print, 'partitioned', None)])
yml
# catalog.yml
partitioned:
  type: PartitionedDataSet
  dataset: pandas.CSVDataSet
  path: abfs://...dfs.core.windows/...
  credentials: lab
  filename_suffix: .csv
log
[10/10/22 14:39:54] INFO     Kedro project
[10/10/22 14:39:55] INFO     Loading data from 'partitioned' (PartitionedDataSet)...