Title
#beginners-need-help
d

Daehyun Kim

12/21/2021, 10:09 PM
saving CachedDataSet in S3
10:09 PM
Hi there, I have a question about saving the dataset in S3. This is a part of
catalog.yml
train_raw_data:
  type: CachedDataSet
  versioned: true
  dataset:
    type: pickle.PickleDataSet
    filepath: s3://test/data/train_raw_data.pickle
if I have proper AWS credentials in
~/.aws/credentials
, it works well(saving the dataset into s3 path I set) But I need to set the credentials in
conf/local/credentials.yml
for some reason. so I removed
~/.aws/credentials
and create
conf/local/credentials.yml
like
aws_access_key_id: AAA
aws_secret_access_key: BBB
aws_session_token: XXX
It doesn't work and I think boto3 print out
Unable to locate credentials
message. I also tried to change the format of credentials.yml with modified
catalog.yml
like
dev_s3:
  aws_access_key_id: AAA
  aws_secret_access_key: BBB
  aws_session_token: XXX
```
train_raw_data:
  type: CachedDataSet
  versioned: true
  credentials: dev_s3
  dataset:
    type: pickle.PickleDataSet
    filepath: s3://test/data/train_raw_data.pickle
It doesn't work either and it shows `DataSet 'train_raw_data' must only contain arguments valid for the constructor of
kedro.io.cached_dataset.CachedDataSet
.`
datajoely

datajoely

12/22/2021, 11:27 AM
So there a two things here - 1. You can use the
.aws
credentials environment variable ahead of the Kedro approach, we just expose it so you can have a way of doing it consistently 2. The cached dataset is a wrapper so you need to push it
credentials
key down one level under
dataset
d

Daehyun Kim

12/22/2021, 7:26 PM
thank you
7:26 PM
it works!!
8:31 PM
One more quick question, Is there a way to set
default credentials
in
conf/local/credentials
? if we can set the default credentials instead named credentials such as
dev_s3
, we may don't need to specify
credentials: dev_s3
for all datasets.
datajoely

datajoely

12/23/2021, 12:05 PM
There isn't but you can do a couple of different things. - For S3 stuff environment variables can simplify this for you outside of Kedro - In Kedro YAML you can use the anchor syntax to reuse the same structure over and over https://blog.daemonl.com/2016/02/yaml.html
d

Daehyun Kim

12/23/2021, 8:32 PM
thank you!
datajoely

datajoely

12/23/2021, 8:33 PM
No problem!