https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
advanced-need-help
  • d

    datajoely

    04/12/2022, 4:00 PM
    Multiprocessing error
  • f

    FlorianGD

    04/13/2022, 11:46 AM
    Hello, I am migrating an internal lib that we developped to
    kedro==0.18.0
    . We use
    kedro.framework.session.get_current_session
    to get the current session in order to either create a new session if it is None, or use it directly. This function was removed in
    0.18.0
    (with https://github.com/kedro-org/kedro/pull/1138) . What is the new way to find the current active session?
    d
    n
    • 3
    • 35
  • u

    user

    04/13/2022, 11:57 AM
    jsonschema 4.4.0 does not provide the extra 'isoduration' https://stackoverflow.com/questions/71857090/jsonschema-4-4-0-does-not-provide-the-extra-isoduration
  • d

    datajoely

    04/13/2022, 12:06 PM
    Deprecation of Get Current Session
  • g

    gui42

    04/14/2022, 1:20 AM
    Folks has anyone ever built a cataog object for publishing to kafka topics?
  • d

    datajoely

    04/14/2022, 4:16 AM
    I haven't seen one but we'd love a PR if you make one!
  • p

    praveenbisht

    04/14/2022, 2:38 PM
    Hello everyone, I am trying to launch kedro pipeline from another Python script. Load Context seems to be deprecated in 0.18 version. Does anyone know a workaround?
  • n

    noklam

    04/14/2022, 3:03 PM
    session will be the way to go
  • p

    praveenbisht

    04/14/2022, 3:04 PM
    any guide/tutorial/article on it's usage will be really helpful. Also, can KedroContext be used in place of Load Context for the same usage?
  • n

    noklam

    04/14/2022, 3:08 PM
    https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html?highlight=package%20a%20project#:~:text=once%20you%20have%20your%20project%20installed%2C%20you%20can%20run%20your%20pipelines%20from%20any%20python%20code%20by%20simply%20importing%20it%20as%20follows%3A Does this help? You may also have a look at the RELEASE notes here. https://github.com/kedro-org/kedro/releases/tag/0.18.0
  • n

    noklam

    04/14/2022, 3:09 PM
    If you are just trying to run a pipeline. The default way for 0.18.x is just create a session and do
    session.run()
    , you don't need to access context to execute a pipeline. In fact
    context.run
    has been removed.
  • p

    praveenbisht

    04/14/2022, 4:36 PM
    The entire goal is to use the structure of a Kedro project, thereby utlizing the decoupling of input parameters from actual source code. I am also wish to use other Kedro features such as data catalogue, configurations, credentials.yml. But I am trying to just trigger the pipelines from another python script. What do you suggest would be the best way for this?
  • u

    user

    04/18/2022, 6:33 AM
    Kedro documentation does not show up all functions after compiling kedro build-docs https://stackoverflow.com/questions/71908281/kedro-documentation-does-not-show-up-all-functions-after-compiling-kedro-build-d
  • d

    deepyaman

    04/19/2022, 10:52 AM
    A colleague and I looked into this a while back, and created a POC based on spark-streaming. You can check it out on https://github.com/deepyaman/kedro-streaming/blob/develop/conf/base/catalog.yml#L11. No guarantees it works with latest Kedro. 🙂 We also explored using
    faust
    as a backend for this (and being more Python-native), but
    faust
    isn't really maintained anymore (and was lacking some other necessary functionality, like joining streams). There's an overall question of how to better support streaming workflows with Kedro, as Kedro is notoriously batch-oriented.
  • u

    user

    04/20/2022, 5:32 PM
    VersionNotFoundError when writing S3 bucket from a Custom Dataset https://stackoverflow.com/questions/71943716/versionnotfounderror-when-writing-s3-bucket-from-a-custom-dataset
  • r

    Rjify

    04/20/2022, 7:01 PM
    Hello guys, I am working on reformatting a ML project to Kedro. Basically in the project I have three pipeline, data engineering, data science and prediction. Along with having main nodes for these pipeline I also have lot of helper/utility functions which need to be reformatted into kedro somewhere. I am unsure how I should structure these helper functions. Whether I should put them down as sub-pipelines or use them as is in the form of helper scripts. I would like to know what's the Kedro standard in this use case. TIA
    d
    • 2
    • 19
  • o

    Onéira

    04/21/2022, 7:29 AM
    Hello folks! I have a question regarding IncrementalDataSet catalog entry. In the documentation, it is specified that a checkpoint file will be created /at the location/ of the dataset to remember which entries have already been processed or not. My question is: if multiple data scientists are running the pipeline from different computer, will the checkpoint file remember which computer has already process which entries? Or will one user have missing points if another has already processed them on its side?
    d
    • 2
    • 5
  • u

    user

    04/21/2022, 8:50 AM
    Versioned Datasets in Kedro https://stackoverflow.com/questions/71951381/versioned-datasets-in-kedro
  • d

    datajoely

    04/21/2022, 9:11 AM
    Incremental Checkpoints
  • r

    Rafał

    04/21/2022, 10:48 AM
    Hello, I have a question regarding creating the pipeline which collect some other pipeline outputs for provided list of parameters. Consider I have a pipeline
    pipe
    which takes the input param
    params:alpha
    and gives the output
    output
    I would like to run this pipeline with different values of parameter
    alpha
    I know I can pass extra params using CLI
    kedro run --pipeline pipe --params "alpha:5"
    , My problem is that I would like to create the following pipeline: * run
    pipe
    with
    alpha=x
    for each
    x
    in
    params:bunch_of_alphas
    and name the output as
    output_x
    * collect all the outputs from previous step and run the node which takes that collection and creates the single
    report
    output. I am afraid I do not know how to do that in kedro. I am using kedro 0.18 I have already idea to create the pipeline with takes the
    params:bunch_of_alphas
    and calls the function which does
    for
    loop and creates fake
    params:alpha
    in current kedro session..... and then stops since it is too complicated. Is there any clean solution?
  • d

    datajoely

    04/21/2022, 11:10 AM
    I think you want to adapt a
    before_pipeline_run
    hook here
  • r

    Rafał

    04/21/2022, 11:30 AM
    But how? Should I overwrite pipeline's input params? And how to do that using Python iterator over
    params:bunch_of_alphas
    ? The only thing I have in my mind is to create fake kedro's catalog item with new "temporary_alpha_param" Actually I have no how to override the pipeline params in
    before_pipeline_run
    hook. Could you point some example code?
  • r

    Rafał

    04/21/2022, 11:39 AM
    By the way, I think one should not modify pipeline inputs params in
    before_pipeline_run
    hook, right ?
  • d

    datajoely

    04/21/2022, 11:47 AM
    So in truth you're fighting this because we've designed to be generate reproducible results, the this dynamism isn't natively supported
  • d

    datajoely

    04/21/2022, 11:49 AM
    I don't think for loops are necessarily bad, our view on dynamism is that you should be able to do dynamic inputs/outputs, but the pipeline structure should be static
  • d

    datajoely

    04/21/2022, 11:50 AM
    The modular pipeline constructor also allows you instantiate the same pipeline, but with overridden the inputs/outputs, perhaps this is all you need? https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/modular_pipelines.html
  • a

    avan-sh

    04/21/2022, 2:47 PM
    Is there a doc on order of hook execution or how a pipeline is actually executed. Eg: Session creation --> catalog created --> pipeline run triggered -->dataset load..........
  • d

    datajoely

    04/21/2022, 2:50 PM
    https://kedro.readthedocs.io/en/stable/faq/architecture_overview.html and there are sequence diagrams in this article https://medium.com/quantumblack/introducing-kedro-hooks-fd5bc4c03ff5
  • n

    noklam

    04/21/2022, 3:14 PM
    Not in the doc but I have this one recently (Ignore the on_xxx_error it's not in the right order)
  • n

    noklam

    04/21/2022, 3:15 PM
    And we have a new hook candidate coming. https://github.com/kedro-org/kedro/issues/1458
Powered by Linen
Title
n

noklam

04/21/2022, 3:15 PM
And we have a new hook candidate coming. https://github.com/kedro-org/kedro/issues/1458
View count: 1