https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    04/28/2022, 9:45 AM
    My gut feeling is that this should work - so anonymised YAML would help
  • z

    Zemeio

    04/28/2022, 9:46 AM
    (reproducing, giving an example with enough information is what I meant by asking properly).
  • z

    Zemeio

    04/28/2022, 9:46 AM
    I will give you the anonymised later Today then, I have to go for now.
  • d

    datajoely

    04/28/2022, 9:46 AM
    You don't need to reproduce the error, just the YAML structure would be helpful
  • z

    Zemeio

    04/28/2022, 9:47 AM
    (my work pc is shutdown, which is why I can't give it to you now)
  • d

    datajoely

    04/28/2022, 9:47 AM
    👍
  • z

    Zemeio

    04/28/2022, 11:49 AM
    @datajoely Some versions:
    s3fs==2022.1.0 
    boto3==1.20.24 
    aiobotocore==2.1.2
    And the catalog:
    yaml
    my.sample.image.int: 
        type: PartitionedDataSet 
        dataset: kedro.extras.datasets.pillow.ImageDataSet 
        path: ${my.bucket}/${folder.int}/sample-images/ 
        filename_suffix: ".jpg"
  • z

    Zemeio

    04/28/2022, 11:50 AM
    Path something like: s3://mybucket/02_intermediate/sample-images
  • d

    datajoely

    04/28/2022, 11:50 AM
    and you get no error?
  • d

    datajoely

    04/28/2022, 11:51 AM
    I actually think you're not configuring that correctly
  • z

    Zemeio

    04/28/2022, 11:51 AM
    I get an error when trying to save, saying that the extension is unknown (the extension is being sent empty to pillow, for some reason)
  • d

    datajoely

    04/28/2022, 11:51 AM
    yaml
     type: PartitionedDataSet
        path: ${my.bucket}/${folder.int}/sample-images/ 
        dataset:
          type: kedro.extras.datasets.pillow.ImageDataSet
        filename_suffix: '.jpg'
  • d

    datajoely

    04/28/2022, 11:51 AM
    don't you need two types?
  • z

    Zemeio

    04/28/2022, 11:59 AM
    Tried like that, same problem
  • d

    datajoely

    04/28/2022, 12:00 PM
    can you post the error?
  • d

    datajoely

    04/28/2022, 12:00 PM
    I'm also going to create a thread
  • z

    Zemeio

    04/28/2022, 12:00 PM
    Exception:
    Exception has occurred: DataSetError (note: full exception trace is shown but execution is paused at: _run_module_as_main) 
    
     
    
    Failed while saving data to data set ImageDataSet(filepath=<hidden>.jpg, protocol=s3, save_args={}). unknown file extension:  
    
    File "\usr\local\lib\python3.8\site-packages\PIL\Image.py", line 2278, in save format = EXTENSION[ext] The above exception was the direct cause of the following exception: File "\usr\local\lib\python3.8\site-packages\kedro\io\core.py", line 210, in save self._save(data) File "\usr\local\lib\python3.8\site-packages\kedro\extras\datasets\pillow\image_dataset.py", line 120, in _save data.save(fs_file, **self._save_args) File "\usr\local\lib\python3.8\site-packages\PIL\Image.py", line 2280, in save raise ValueError(f"unknown file extension: {ext}") from e
  • z

    Zemeio

    04/28/2022, 12:02 PM
    Thanks!
    Traceback (most recent call last):  
    
    File "/usr/local/lib/python3.8/site-packages/PIL/Image.py", line 2278, in save  
    
    format = EXTENSION[ext] KeyError: ''
  • d

    datajoely

    04/28/2022, 12:51 PM
    I'm still looking into it, I'm not sure why it's not working. I don't have a S3 bucket to test against. Can you use a debugger to workout what happens in
    \usr\local\lib\python3.8\site-packages\PIL\Image.py
    ?
  • d

    datajoely

    04/28/2022, 12:53 PM
    because it's hitting this part of
    Image.py
    in PIL
  • z

    Zemeio

    04/28/2022, 12:55 PM
    On this one the ext is ''
  • z

    Zemeio

    04/28/2022, 12:56 PM
    I debugged already = )
  • d

    datajoely

    04/28/2022, 12:56 PM
    are you forgetting the S3 protocol here
    ${my.bucket}/${folder.int}/sample-images/
    ?
  • z

    Zemeio

    04/28/2022, 12:57 PM
    No, the my.bucket is s3://. It even says protocol=s3 on the error
  • z

    Zemeio

    04/28/2022, 12:58 PM
    I was able to reproduce the error. If you want I can create a user for you and delete it after in my personal aws.
  • d

    datajoely

    04/28/2022, 12:58 PM
    so the only thing I can ask if you work out what happens here in this line of the
    PartitionedDataSet
    https://github.com/kedro-org/kedro/blob/676600c4b63eec53c13fc4e2536d0a990dac77ce/kedro/io/partitioned_dataset.py#L244
  • d

    datajoely

    04/28/2022, 12:59 PM
    and maybe try dropping the
    .
    in the suffix just in case
  • z

    Zemeio

    04/28/2022, 1:00 PM
    Wait, _load? Don't you mean _save?
  • d

    datajoely

    04/28/2022, 1:00 PM
    sorry
  • d

    datajoely

    04/28/2022, 1:00 PM
    either
Powered by Linen
Title
d

datajoely

04/28/2022, 1:00 PM
either
View count: 1