https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • w

    Wit

    05/10/2022, 8:01 PM
    OK. That's not obvious from the docs :). I think hooks are much simpler.
  • d

    datajoely

    05/10/2022, 8:02 PM
    Hooks will change your life
  • w

    Wit

    05/10/2022, 8:02 PM
    Yep 🙂
  • w

    Wit

    05/10/2022, 8:03 PM
    Hooks are much better design as you are not binded to a specific implementation details
  • w

    Wit

    05/10/2022, 8:03 PM
    Thank you for hints
  • d

    datajoely

    05/10/2022, 8:07 PM
    Yeah it's via a library called pluggy by the pytest folks so it's really robust
  • m

    marioFeynman

    05/11/2022, 12:59 AM
    Hey! Im trying to migrate one of my projects that currently run in kedro version 0.16.X into 0.18.0 but im having a lot of troubles when i need to use the old load_context method... I used that one when I wanted to run a Jupyter Notebook template and loading there the kedro context... how should I continue this mission?
  • m

    marioFeynman

    05/11/2022, 1:01 AM
    And another question... how can I use the running parameters (like KEDRO_ENV) in this New version? In the past i was able to use deprecated hooks and journal that helped me in that task...
  • r

    Rjify

    05/11/2022, 5:36 AM
    Hello all, I am at a stage where I have to deploy a DS project containing kedro template on Databricks. I am wondering what are the different ways of achieving this? I believe as per the documentation there is a way using a notebook, but that's not suggested for productionsation. I am looking for options to deploy kedro pipeline on databricks cluster.
  • i

    inigohrey

    05/11/2022, 8:49 AM
    In 0.18.1 there's a after_context_created hook which might be interesting for you as it allows you to access the context without needing to create it yourself. In my team we've been stuck on 0.17.1 as we were using the context for a few things, but with this hook we might finally be able to move past it.
  • t

    Tsakagur

    05/11/2022, 9:44 AM
    Thanks, I'll have a look!
  • y

    Yetunde

    05/11/2022, 10:25 AM
    Hi @Rjify! We're so excited to see you mention this. We're actually setting up a project to work with the Databricks team to build out their IDE support and figure out best-practice ways of developing Kedro projects on Databricks. In our current sprint, we're fixing some of the bugs we've found while using Kedro on Databricks. We have suggested some workflows and will update our documentation. You can work with Databricks and Kedro with: - Packaging a Kedro project with
    kedro package
    and publishing the package using the Databricks DBFS API - Using Databricks Repos functionality and doing a pipeline run through a Databricks notebook
  • x

    xxavier

    05/11/2022, 12:25 PM
    Hi everyone, I am trying to use the APIDataSet in the catalog.yml file but fail to load some credentials in the headers. What I have tried:
    run_histograms:
      type: api.APIDataSet
      url: https://xxx/
      headers:
        Authorization: Token <token>
    Works without error (which is nice but token is somehow sensitive information). I tried to fill the header using credentials but failed to do so. credentials.yml
    dqm_playground_token:
      - Content-Type: application/json
      - Authorization: Token <token>
    catalog.yml
    run_histograms:
      type: api.APIDataSet
      url: https://xxx/
      # Test 1
      headers: dqm_playground_token
      # Test 2
      headers:
        - dqm_playground_token
      # More tests
    It seems to boil down to the fact that it reads Dict[str, Any] and not Union[Iterable[str], AuthBase]: https://kedro.readthedocs.io/en/stable/_modules/kedro/extras/datasets/api/api_dataset.html#APIDataSet I could probably modify the APIDataSet definition to solve it by having headers = auth but I guess there is a better way. 🙂 Sorry about the naive question. Any help is appreciated.
  • d

    datajoely

    05/11/2022, 12:53 PM
    So the easiest way to debug this is to jump into a jupyter/ipython session and import the APIDataSet in python and get the .load() method working. It will then be simple to work out what the YAML should be
  • d

    datajoely

    05/11/2022, 12:54 PM
    Oh i don't think your solution is bad by the way
  • d

    datajoely

    05/11/2022, 12:55 PM
    Improving credentials in general is on the roadmap
  • m

    marioFeynman

    05/11/2022, 1:30 PM
    So, do you think that maybe exposing it using this method could be the right way?
  • d

    datajoely

    05/11/2022, 1:31 PM
    I need to think about it more, you only get the ConfigLoader at that point, before the catalog is created. So I'm leaning to No rather than Yes.
  • x

    xxavier

    05/11/2022, 2:15 PM
    Thanks for the feedback! I should make it more generic but since the solution was not too bad, I just created a custom dataset (not to mess with Kedro's code) based on the APIDataset. catalog.yml
    run_histograms:
      type: dqm_playground_ds.extras.datasets.tuned_API_dataset.TunedAPIDataSet
      url: https://xxx/
      credentials: dqm_playground_token
      headers: credentials
    credentials.yml
    dqm_playground_token:
      - Authorization: Token <token>
    tuned_API_dataset.py (similar to kedro's api_dataset.py)
    python3
            auth = credentials or auth
    
            # Added the following three lines :)
            if headers == "credentials":
                auth = None
                headers = credentials[0]
    Not great, not terrible. 🙂 Thanks again!
  • m

    marioFeynman

    05/11/2022, 3:51 PM
    Oh, OK, i was trying to use that hook but yes, is not providing me the needed data that i was looking for. Thanks in advance!
  • r

    Rjify

    05/11/2022, 4:26 PM
    Hi @Yetunde , thanks for replying and providing. I am more inclined towards the option of " Using Databricks Repos functionality and doing a pipeline run through a Databricks notebook" . I felt the document for deploying the project using this method was not sufficient. Is there a better example available that you can share with me? It will be of great help. Thanks
  • r

    Rjify

    05/11/2022, 9:04 PM
    Another question, does kedro has any plan to visualize hooks associated with nodes in the visualization generated by kedro-viz?
  • d

    datajoely

    05/11/2022, 9:05 PM
    The hooks aren't really node based, so it would be difficult to pair to a single node. If you have any ideas on how this could look please raise a GH issue
  • d

    datajoely

    05/11/2022, 9:06 PM
    Regarding your Databricks question the docs today are all we have to share as of now, but we're hard at work on their overhaul
  • u

    user

    05/12/2022, 2:33 AM
    Kedro SunPy - Writing Custom Data Set to S3 https://stackoverflow.com/questions/72209505/kedro-sunpy-writing-custom-data-set-to-s3
  • m

    marioFeynman

    05/13/2022, 1:18 AM
    Hey guys! Quick question, how can I dinamically add the kedro enviroment in my parameters in order to be able to run specific stuff during the pipeline run? Im using kedro 0.18.0
    a
    • 2
    • 3
  • a

    antony.milne

    05/13/2022, 7:45 AM
    env in parameters
  • k

    Kastakin

    05/14/2022, 10:55 AM
    For my usecase I have created a custom dataset for importing and exporting data in the mzML format, an open-source format for proteomics/metabolomics analisys using pyOpenMS (https://pyopenms.readthedocs.io/en/latest/index.html). Looking at issues and PR in the GitHub repo I’ve noticed that a decoupling of the main package and datasets is in the works. Should I wait to open an issue + PR to add the new dataset or I’m better off waiting for the aforementioned decoupling first?
  • k

    Kastakin

    05/14/2022, 10:56 AM
    On the same note:is adding a new dataset considered a breaking change?
  • n

    noklam

    05/15/2022, 5:49 PM
    We just release 0.18.1 last week, so it probably takes a few more weeks to release a new version. I think it is ok to open issue and PR in current repository and migrate it to a new package later. The decouple is mainly for speed up the release for datasets since these 3 party package dependencies move much faster than kedro core.
Powered by Linen
Title
n

noklam

05/15/2022, 5:49 PM
We just release 0.18.1 last week, so it probably takes a few more weeks to release a new version. I think it is ok to open issue and PR in current repository and migrate it to a new package later. The decouple is mainly for speed up the release for datasets since these 3 party package dependencies move much faster than kedro core.
View count: 1