https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • n

    noklam

    04/18/2022, 9:27 PM
    For example, say your file is called data.csv If you turn on versioning, it will be data.csv/timestamp/data.csv instead
  • b

    Burn1n9m4n

    04/18/2022, 9:28 PM
    Also where is it fetching the time stamp from?
  • n

    noklam

    04/18/2022, 9:28 PM
    The timestamp will be a default timestamp, each time you run kedro pipeline it generates a timestamp
  • n

    noklam

    04/18/2022, 9:29 PM
    You just need to remove the existing data.cvs which cause name collision
  • b

    Burn1n9m4n

    04/18/2022, 9:31 PM
    So if I make no change to my code, the node I run will have a name collision with versioning on. But if I change the code, the version time stamp changes?
  • n

    noklam

    04/18/2022, 9:32 PM
    Not sure if I am following, what code change are you referring?
  • b

    Burn1n9m4n

    04/18/2022, 9:34 PM
    Ok…let me back up. So I have a node that I’ve made some changes to. I save those and have versioning on. When I run the node the first time it runs fine. But if I make no change and rerun it, I get the error.
  • b

    Burn1n9m4n

    04/18/2022, 9:35 PM
    Perhaps node is incorrect as well. I have a segment of the pipeline I’m running using to_nodes
  • n

    noklam

    04/18/2022, 9:35 PM
    So this is probably because when you run the pipeline first time, you have no version on. It generates a file called path/filename
  • b

    Burn1n9m4n

    04/18/2022, 9:37 PM
    That’s true I didn’t. I enabled it after the fact
  • n

    noklam

    04/18/2022, 9:37 PM
    And now u turn on versioning, it tries to generate a path called path/filename/timestamp/filename, but the existing file collide with the existing file
  • n

    noklam

    04/18/2022, 9:38 PM
    The filesystem does not allow the same name being reused
  • b

    Burn1n9m4n

    04/18/2022, 9:38 PM
    But the timestamp should change with each run right?
  • n

    noklam

    04/18/2022, 9:38 PM
    So simply remove the existing file will make it runs again
  • n

    noklam

    04/18/2022, 9:38 PM
    Yes
  • b

    Burn1n9m4n

    04/18/2022, 9:39 PM
    So I need to remove anything that is outside of the timestamp(s) directory?
  • n

    noklam

    04/18/2022, 9:39 PM
    What file are there now?
  • n

    noklam

    04/18/2022, 9:40 PM
    Do a ls or screenshot will help
  • b

    Burn1n9m4n

    04/18/2022, 9:41 PM
    So I have -data.parquet -data.parquet/
  • b

    Burn1n9m4n

    04/18/2022, 9:41 PM
    And there are timestamps below the data.parquet/
  • n

    noklam

    04/18/2022, 9:42 PM
    Can you also post the error you get?
  • b

    Burn1n9m4n

    04/18/2022, 9:44 PM
    It’s from work so let me strip out the sensitive stuff
  • b

    Burn1n9m4n

    04/18/2022, 9:47 PM
    @noklam DataSetError: Save path
    s3bucketname/project_folder/data/layer/data.parquet/2022-04-18T21.43.00.910Z/data.parquet
    for ParquetDataSet(filepath=s3bucketname/project_folder/data/layer/data.parquet, load_args={}, protocol=s3, save_args={}, version=Version(load=None, save='2022-04-18T21.43.00.910Z')) must not exist if versioning is enabled.
  • b

    Burn1n9m4n

    04/18/2022, 9:47 PM
    That's a pretty good approximation with some of the path stuff abstracted
  • b

    Burn1n9m4n

    04/18/2022, 9:49 PM
    The thing is...shouldn't the timestamp advance and automatically take care of the error?
  • n

    noklam

    04/18/2022, 9:50 PM
    Is there now a file (not a dir) called
    s3bucketname/project_folder/data/layer/data.parquet
    ?
  • b

    Burn1n9m4n

    04/18/2022, 9:51 PM
    yes
    s3bucketname/project_folder/data/layer/data.parquet/
  • n

    noklam

    04/18/2022, 9:51 PM
    can you try removing the file? If version is turned on it should always be
    filename/timestamp/filename
  • b

    Burn1n9m4n

    04/18/2022, 9:52 PM
    that structure exists
  • b

    Burn1n9m4n

    04/18/2022, 9:53 PM
    if you go further it becomes
    s3bucketname/project_folder/data/layer/data.parquet/timestamp/data.parquet
Powered by Linen
Title
b

Burn1n9m4n

04/18/2022, 9:53 PM
if you go further it becomes
s3bucketname/project_folder/data/layer/data.parquet/timestamp/data.parquet
View count: 1