https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    12/09/2021, 3:02 PM
    So I'm not sure we have any examples but all we do is delegate the generic
    **_fs_args
    in our read/write options
  • d

    datajoely

    12/09/2021, 3:02 PM
    Taken from CSVDataSet
    self._fs = fsspec.filesystem(self._protocol, **_credentials, **_fs_args)
  • d

    datajoely

    12/09/2021, 3:13 PM
    Dask also uses fsspec behind the scenes so you should be able to do it this way: https://stackoverflow.com/a/56736195/2010808
  • d

    datajoely

    12/09/2021, 3:13 PM
    I think you can do everything in your URL or use the fs_args if you want to follow best practice in terms of credential management
  • i

    Isaac89

    12/09/2021, 3:14 PM
    So, I would write "sftp://path_to_cool_stuff.csv" and pass to the fs_args all the key:value parameters I would pass to the **kwargs that fsspec sftp requires like host username password etc., am I right?
  • d

    datajoely

    12/09/2021, 3:14 PM
    eactly!
  • d

    datajoely

    12/09/2021, 3:14 PM
    and you can sue the
    credentials
    stuff that Kedro does by default in teh catalog
  • d

    datajoely

    12/09/2021, 3:15 PM
    fsspec
    is a cool project - big fans of how they've approached this
  • i

    Isaac89

    12/09/2021, 3:17 PM
    Fsspec, Kedro Catalog and Kedro in general, becoming a fun too!
  • d

    datajoely

    12/09/2021, 3:17 PM
    before fsspec we used to have to implement S3CSVDataSet, AzureCSVDataSet etc. it really is a game changer
  • d

    datajoely

    12/09/2021, 3:18 PM
    If you get it working - I'd love to include an example in the documentation
  • d

    datajoely

    12/09/2021, 3:18 PM
    we've started including YAML examples in the API docs so it would fit in well https://kedro.readthedocs.io/en/latest/kedro.extras.datasets.pandas.CSVDataSet.html
  • i

    Isaac89

    12/09/2021, 3:51 PM
    # requires paramiko -> pip install paramiko
    # in conf/local/catalog.yml
    test_fsspec:
      type: pandas.CSVDataSet
      filepath: "sftp:///path/to/remote_cluster/cool_data.csv"
      credentials: cluster_credentials
      load_args:
        sep: ","
        index_col: 0
      save_args:
        index: True
        encoding: "utf-8"
    
    # in conf/local/credentials.yml   
    cluster_credentials:
      username: my_username
      host: host_address
      port: 22
      password: password
      
    # in jupyter lab
    catalog.load("test_fsspec")
  • d

    datajoely

    12/09/2021, 4:21 PM
    Awesome!!!!!
  • d

    datajoely

    12/09/2021, 4:21 PM
    Do you fancy submitting a PR and becoming a contributor or shall I do it for you?
  • d

    datajoely

    12/09/2021, 4:21 PM
    It's super easy now with the handy
    .
    character on GitHub
  • i

    Isaac89

    12/09/2021, 4:26 PM
    I will submit a PR later this evening 👍
  • d

    datajoely

    12/09/2021, 4:28 PM
    Probably makes sense to do Example 15 on this page: https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html
  • z

    Zemeio

    12/10/2021, 9:37 AM
    How do you guys document your data (with a description/explanation of the fields/etc..., metadata)? I want to have an explanation of each data, and I want it to show on kedro viz if possible, but I have no idea how to do it.
  • d

    datajoely

    12/10/2021, 9:37 AM
    It's not possible today -it's on the backlog as something I'd like to add. Today I'd recommend using something like GreatExpectations for data docs
  • d

    datajoely

    12/10/2021, 9:38 AM
    If were to allow you to store arbitrary metadata in the catalog would you find that useful?
  • z

    Zemeio

    12/10/2021, 9:39 AM
    Only if it was retrievable somehow in some sort of docs or something. If I can retrieve it in code that should already be good enough I guess.
  • z

    Zemeio

    12/10/2021, 9:39 AM
    Maybe the AbstractDataset could have some sort of additional_info or something
  • z

    Zemeio

    12/10/2021, 9:39 AM
    Thank you for the lightning speed reply, btw
  • d

    datajoely

    12/10/2021, 9:39 AM
    Yeah that's what I was thinking
  • d

    datajoely

    12/10/2021, 9:42 AM
    We also have a cool prototype where the new FastAPI backend could allow for you to add additional pages to Viz. It's not a priority at the moment - but something I really hope to get out once the Experiment Tracking work is finished
  • d

    datajoely

    12/10/2021, 9:43 AM
    So in theory we could get to a point where 3rd parties could build Viz plugins which would be super cool
  • z

    Zemeio

    12/10/2021, 9:45 AM
    Oh, that would be cool!
  • z

    Zemeio

    12/10/2021, 9:52 AM
    Is it easy to extend the properties you can define on the dataset? If it is maybe I can try something
  • z

    Zemeio

    12/10/2021, 9:53 AM
    Btw, if you ever do implement this, I recommend that the documentation be in markdown, but it is just my opinion
Powered by Linen
Title
z

Zemeio

12/10/2021, 9:53 AM
Btw, if you ever do implement this, I recommend that the documentation be in markdown, but it is just my opinion
View count: 1