https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • p

    pallavi

    07/15/2022, 11:30 AM
    want to route data folder filesin django(want to read data folder files of kedro in django rest api)
  • p

    pallavi

    07/15/2022, 11:30 AM
    sure
  • d

    datajoely

    07/15/2022, 11:31 AM
    So you can: - Talk to an API via
    APIDataSet
    which is a thin wrapper of the
    requests
    library - Read filepaths in most of the other datasets - Read SQL via Pandas or Spark But there isn't any formal integration with Django out of the box
  • p

    pallavi

    07/15/2022, 11:32 AM
    ok....is there any example code available ?
  • d

    datajoely

    07/15/2022, 11:39 AM
    I would encourage you to run through the Spaecflights tutorial, if this is your first time using Kedro https://kedro.readthedocs.io/en/stable/tutorial/spaceflights_tutorial.html
  • l

    LawrenceS

    07/17/2022, 1:44 PM
    Hi everyone, I have a pipeline (Current_Kedro_Pipeline.png) that is pulling from an SQL table and then filtering by a start and end datetime. However I don't really want to pull the entire table just in order to immediately filter it. So I'm wondering: - What is the best way to create an SQL dataset that accepts inputs which you can then use to query? - I see that there is a "load_args" option to pass in data to a dataset but it doesn't seem as though this can be used as options for the query string itself (please do correct me if I'm wrong about this)? - Therefore I'm wondering if the best alternative is to created a "CustomDataset" that utilises raw SQLAlchemy? - Additionally, I don't know how feasible this is but one of the things I really like about Kedro is the traceability of the datasets. However, If I solve the aforementioned problem there will then be no association between "Patient Session Datetime Filtered" and the "Patient Sessions" dataset. I'm wondering if it would be possible to create manual links between datasets to show if one is derived from another, just to keep that traceability? I mocked up an example using a dotted line (Kedro_Pipeline_Dependant_Database_Example.png) to try and illustrate what I mean. Any help is much appreciated! Thank you, Lawrence
  • d

    datajoely

    07/17/2022, 1:46 PM
    So the way to do this is to pass dummy data between the nodes, they don't need to have catalog entries but will drive the execution order too
  • l

    LawrenceS

    07/17/2022, 3:33 PM
    Thank you, could you possibly elaborate a bit more? Or point me to an example that does this? I don't really know how passing dummy data between nodes is achieved.
  • p

    PetitLepton

    07/17/2022, 7:25 PM
    Hi, I had to do something like that in the past. To reuse the SQLDataSet of Kedro, I split the query in two steps. In the first, I was using a TextDataset with parameters that were replaced in one node (for example with a .format). In the second, the node had the filled SQL query and the connection string as inputs, was instantiating a SQLDataSet and returning the result of the load. This node is in fact generic. I liked this implementation because it was very flexible and allowed me to put all my SQL queries in a dedicated folder and to use a linter for the queries.
  • d

    datajoely

    07/18/2022, 8:20 AM
    @PetitLepton @LawrenceS this is very much possible from a custom implementation point of view. The philosophical reasons why we don't make this native functionality are as follows: https://github.com/kedro-org/kedro/issues/904#issuecomment-925969655
  • d

    datajoely

    07/18/2022, 8:21 AM
    Perhaps as more dataframe like abstractions for SQL engines i.e. Snowpark we will be able to approach things that way
  • b

    brewski

    07/21/2022, 7:09 PM
    Hiya- using kedro to try to visualize data pipelines w/ dask and ran into a small hiccup- setting up dataframes and running them locally is great, but when I want to use something like coiled, i'm a little confused as to how to let kedro know I'd like to run the coiled init function before a pipeline starts running. Could I get some pointers?
  • d

    datajoely

    07/21/2022, 7:35 PM
    You could adapt how our Spark set up works https://github.com/kedro-org/kedro-starters/blob/main/pyspark/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/context.py In fact @deepyaman expanded on this in these docs recently https://kedro.readthedocs.io/en/latest/deployment/dask.html
  • b

    brewski

    07/21/2022, 8:31 PM
    is context.py the only entry point needed?
  • b

    brewski

    07/21/2022, 8:31 PM
    e.g. if I hardcode credentials there, will I have something that will run?
  • d

    datajoely

    07/21/2022, 8:32 PM
    I think so - but I don't endorse hardcoding creds!
  • b

    brewski

    07/21/2022, 8:32 PM
    I know, I just need something running asap
  • b

    brewski

    07/21/2022, 8:32 PM
    it doesn't look like it's running the context.py file
  • b

    brewski

    07/21/2022, 8:41 PM
    reload_kedro is giving me
    TypeError: __init__() missing 2 required positional arguments: 'config_loader' and 'hook_manager'
  • b

    brewski

    07/21/2022, 8:41 PM
    is this some newbie mistake I'm making?
  • b

    brewski

    07/21/2022, 8:42 PM
    oh alright then
  • b

    brewski

    07/21/2022, 8:43 PM
    I just removed the kwargs from the constructor and it worked
  • b

    brewski

    07/21/2022, 8:50 PM
    it works!
  • b

    brewski

    07/21/2022, 8:50 PM
    tysm
  • d

    datajoely

    07/21/2022, 8:54 PM
    Amazing!
  • b

    badcollector

    07/22/2022, 9:38 PM
    Hello all! When attempting to run
    kedro viz
    I get the attached error. The kerdro versions I'm running are below and I did
    pip install kedro[all]
    however, still experience the error. kedro==0.18.2 kedro-telemetry==0.2.1 kedro-viz==4.7.1
  • b

    badcollector

    07/22/2022, 11:44 PM
    Disregard, it was a typo needed to capitalize the "S" in DataSet😅
  • n

    noklam

    07/25/2022, 10:51 AM
    You got it right!
  • b

    brewski

    07/26/2022, 3:05 AM
    is there a way to make a variable declared in a ProjectContext accessible during runtime? I noticed that in the documentation all of the parameters provided (https://kedro.readthedocs.io/en/latest/kedro.framework.context.KedroContext.html#kedro.framework.context.KedroContext) are read only, but I need to pass a reference to my session object to code in nodes in my pipeline...
  • d

    datajoely

    07/26/2022, 5:55 AM
    Hi @brewski hooks are the right way to access the session, context etc during the lifecycle of the run. That being said we expect nodes to be pure python functions so, by design, it's not really possible to do this. My general rule is 'if you're trying to create the context directly, you've gone too far' What are you trying to do exactly? We can help come up with a more kedrific solution
Powered by Linen
Title
d

datajoely

07/26/2022, 5:55 AM
Hi @brewski hooks are the right way to access the session, context etc during the lifecycle of the run. That being said we expect nodes to be pure python functions so, by design, it's not really possible to do this. My general rule is 'if you're trying to create the context directly, you've gone too far' What are you trying to do exactly? We can help come up with a more kedrific solution
View count: 1