https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • a

    Ashwin_11

    05/05/2022, 3:20 PM
    Hey, you got anything?
  • i

    inigohrey

    05/06/2022, 3:07 PM
    from this stack it looks like you might not have installed all the kedro dependencies for spark https://kedro.readthedocs.io/en/stable/kedro_project_setup/dependencies.html?highlight=spark#install-dependencies-at-a-group-level
  • a

    Ashwin_11

    05/06/2022, 3:13 PM
    We are using SparkDataSet, I don't think it requires any dependency to be installed
  • j

    JA_next

    05/11/2022, 12:23 AM
    node(
                    func=split_data,
                    inputs=["model_input_table", "params:model_options"],
                    outputs=["X_train", "X_test", "y_train", "y_test"],
                    name="split_data_node",
                )
    I have a question here: here 'model_input_table' is a data frame from data catalog, while how can I know this is not a string?
  • c

    Carlos Bonilla

    05/11/2022, 3:43 AM
    Hello does anyone have an example of a Plotly Dash web app integrated with Kedro?
  • n

    noklam

    05/11/2022, 4:31 PM
    It will always be a dataset. The input/output of nodes are always DataSet, if it is not defined in
    catalog.yml
    , then it will be just a variable ->
    MemoryDataSet
    The string is the name of the DataSet
  • j

    JA_next

    05/11/2022, 6:21 PM
    Thanks!
  • j

    JA_next

    05/11/2022, 10:00 PM
    While I am confused again. here "uk_input1": "uk_input1". the first one is a string and the second one is the name of the dataset, right?
  • n

    noklam

    05/11/2022, 10:05 PM
    Correct, in this case the value is a dataset, the key is just a string literal.
  • n

    noklam

    05/11/2022, 10:13 PM
    What do you mean by integrate with Kedro? What kind of use case are u looking for?
  • j

    JA_next

    05/11/2022, 11:59 PM
    pipeline_steps.append(node(merging_features_to_main_data, inputs = dict(main_df = 'main_df', feature_1_df = 'feature_df1', feature_2_df = 'feature_df2'),
                                                                               outputs='main_feature_df'))
    This is one pipeline, merging 2 feature_df into main_df. but what if I dont know how many feature_dfs in advance, how can I do that?
  • d

    datajoely

    05/12/2022, 5:31 AM
    You can use kwargs
  • k

    Kastakin

    05/13/2022, 7:20 AM
    I've got a question regarding the deployment/portability of a pipeline created with Kedro from one machine to another. Idon't really know if this is the correct channel for it but here it goes: Following the corresponding section in the docs and the documentation specific to the Kedro-Docker plugin I've been able to create a Docker image of my Kedro project that I can run with the
    kedro docker run
    command. This command mounts as volumes the required data/conf/logs folders and then runs kedro inside of it. All good! But now let's say I would like to migrate my finalised project from my development machine to the machine in my lab where we would run the pipeline directly, what are the steps needed to use the dockerized pipeline there? The documentation suggest pushing the built docker image to the registry and then pull it on the "production" env but that doesn't bring with it neither the catalog, the folder structure of the data folder nor the Kedro CLI itself.
    a
    • 2
    • 3
  • a

    antony.milne

    05/13/2022, 8:00 AM
    kedro docker run
  • a

    anna-lea

    05/13/2022, 12:55 PM
    HI Kedro-experts, I have a question for which I can't yet find an answer. In the definition of the saved models (TensorFlowModelDataset) in the catalog, I use the option version=True. The filepath is something along
    06_models/trained_models
    . What I then see is the following file structure:
    06_models/trained_models/2022-05-13T11:12:13/trained_models/
    which is versioned, but not super handy. What I would expect is more something in this direction:
    06_models/trained_models/2022-05-13T11:12:13/
    Do you have an idea how I can get to the second file structure, or is it a "feature" :p Thanks!
  • v

    Valentin DM.

    05/13/2022, 5:45 PM
    Hello, this issue must be simple to fix but at this stage I quit don't understand why kedro can't find the
    viz
    command (I did
    pip install src/requirements.txt
    ) Kedro version :
    0.18.1
    Do you have any idea?
  • n

    noklam

    05/13/2022, 5:47 PM
    Did you do pip install kedro-viz?
  • v

    Valentin DM.

    05/13/2022, 5:48 PM
    Well, this was a quick fix thanks ❤️ Should
    kedro-viz
    be added to
    requirements.txt
    ? Or in the documentation : https://kedro.readthedocs.io/en/0.18.1/tutorial/create_pipelines.html ?
  • k

    Kastakin

    05/13/2022, 6:40 PM
    I think the rationale behind not having it in the default requirements is due to the fact that it's not an a mandatory requirement to get a Kedro project up and running. If you are following the spaceflights tutorial it's added to the requirements during the first steps: https://kedro.readthedocs.io/en/0.18.1/tutorial/tutorial_template.html#install-dependencies
  • d

    datajoely

    05/13/2022, 7:15 PM
    Yes this is exactly right, Viz is technically a plug-in, just a first party one
  • w

    wwliu

    05/16/2022, 9:04 PM
    Hello, I am trying to implement Mlflow using Hooks in Kedro. I got this code snippet:
    class ModelTrackingHooks:
        @hook_impl
        def after_node_run(self, node: Node, outputs: Dict[str, Any], inputs: Dict[str, Any]) -> None:
            if node._func_name == "train_model":
                model = outputs["example_model"]
                mlflow.sklearn.log_model(model, "model")
                mlflow.log_params(inputs["parameters"])
    My question is, I only need to log metrics in this specific
    train_model
    node, while based on my understanding, this function will run every time a node finishes, and there could be a lot of nodes in the whole pipeline. Is there way I could specify which node this hook is hooked to?
  • n

    noklam

    05/16/2022, 9:06 PM
    Are you aware of the kedro-mlflow plugin created by our community? If the logic is only specific to a node, it may better be just included in the node's logic?
  • w

    wwliu

    05/16/2022, 9:12 PM
    Thanks a lot for your reply. I am aware of the Kedro-mlflow plug-in, I haven’t explored much on that side, but I will. Regarding to Hooks, I am curious about the design philosophy behind hooks, or what would you suggest to be included in the Hooks instead of in the nodes?
  • d

    datajoely

    05/16/2022, 9:16 PM
    So hooks leverage the same library that the pytest folks designed to support their plugin ecosystem, so a lot of the thinking is the same. In terms of the split between what's node logic and what's hook logic... I think you should remember that nodes are about dataflow/transformation and hooks are about tapping into different parts of the run lifecycle
  • w

    wwliu

    05/16/2022, 9:36 PM
    Thanks, @datajoely. Could you elaborate more on different parts of the run lifecycle that could possibly use hooks? I am looking at this article https://medium.com/quantumblack/introducing-kedro-hooks-fd5bc4c03ff5, and it uses Mlflow as an example, and there are two
    if
    statements in the scripts.
    if node._func_name == "split_data":
                mlflow.log_params(
                    {"split_data_ratio": inputs["params:example_test_data_ratio"]}
                )
    
            elif node._func_name == "train_model":
                model = outputs["example_model"]
                mlflow.sklearn.log_model(model, "model")
                mlflow.log_params(inputs["parameters"])
    These are node specific functions, Do you think these are better put in the node logic itself instead of hooks or it is a proper user case for hooks? Or as @noklam suggested using kedro-mlflow?
  • d

    datajoely

    05/16/2022, 9:38 PM
    So the plugin is well supported and popular, that example is overly explicit, you would likely make the logging dynamic in practice and not really node specific
  • d

    datajoely

    05/16/2022, 9:38 PM
    Would be quite fragile if you wanted to change the node name down the line
  • w

    wwliu

    05/16/2022, 10:13 PM
    I agree that it would be fragile if we make explicit statement like this. Could you provide an example of how to logging dynamically if it is not too much to ask. Thanks.
  • d

    datajoely

    05/16/2022, 10:23 PM
    I guess I mean you could just log every node._func_name and run the log_model method if the object is of the right type
  • w

    wwliu

    05/16/2022, 10:38 PM
    But in the example, the object required to be logged is node specific, in
    split_data
    node I need to log
    train_split_ratio
    , and in
    train_model
    node, I need to log
    model
    object. Does this mean this senario is kind of against the hooks usage guide?
Powered by Linen
Title
w

wwliu

05/16/2022, 10:38 PM
But in the example, the object required to be logged is node specific, in
split_data
node I need to log
train_split_ratio
, and in
train_model
node, I need to log
model
object. Does this mean this senario is kind of against the hooks usage guide?
View count: 1