Ashwin_11
05/05/2022, 3:20 PMinigohrey
05/06/2022, 3:07 PMAshwin_11
05/06/2022, 3:13 PMJA_next
05/11/2022, 12:23 AMnode(
func=split_data,
inputs=["model_input_table", "params:model_options"],
outputs=["X_train", "X_test", "y_train", "y_test"],
name="split_data_node",
)
I have a question here: here 'model_input_table' is a data frame from data catalog, while how can I know this is not a string?Carlos Bonilla
05/11/2022, 3:43 AMnoklam
05/11/2022, 4:31 PMcatalog.yml
, then it will be just a variable -> MemoryDataSet
The string is the name of the DataSetJA_next
05/11/2022, 6:21 PMJA_next
05/11/2022, 10:00 PMnoklam
05/11/2022, 10:05 PMnoklam
05/11/2022, 10:13 PMJA_next
05/11/2022, 11:59 PMpipeline_steps.append(node(merging_features_to_main_data, inputs = dict(main_df = 'main_df', feature_1_df = 'feature_df1', feature_2_df = 'feature_df2'),
outputs='main_feature_df'))
This is one pipeline, merging 2 feature_df into main_df. but what if I dont know how many feature_dfs in advance, how can I do that?datajoely
05/12/2022, 5:31 AMKastakin
05/13/2022, 7:20 AMkedro docker run
command. This command mounts as volumes the required data/conf/logs folders and then runs kedro inside of it.
All good! But now let's say I would like to migrate my finalised project from my development machine to the machine in my lab where we would run the pipeline directly, what are the steps needed to use the dockerized pipeline there?
The documentation suggest pushing the built docker image to the registry and then pull it on the "production" env but that doesn't bring with it neither the catalog, the folder structure of the data folder nor the Kedro CLI itself.antony.milne
05/13/2022, 8:00 AManna-lea
05/13/2022, 12:55 PM06_models/trained_models
. What I then see is the following file structure:
06_models/trained_models/2022-05-13T11:12:13/trained_models/
which is versioned, but not super handy.
What I would expect is more something in this direction:
06_models/trained_models/2022-05-13T11:12:13/
Do you have an idea how I can get to the second file structure, or is it a "feature" :p
Thanks!Valentin DM.
05/13/2022, 5:45 PMviz
command
(I did pip install src/requirements.txt
)
Kedro version : 0.18.1
Do you have any idea?noklam
05/13/2022, 5:47 PMValentin DM.
05/13/2022, 5:48 PMkedro-viz
be added to requirements.txt
?
Or in the documentation : https://kedro.readthedocs.io/en/0.18.1/tutorial/create_pipelines.html ?Kastakin
05/13/2022, 6:40 PMdatajoely
05/13/2022, 7:15 PMwwliu
05/16/2022, 9:04 PMclass ModelTrackingHooks:
@hook_impl
def after_node_run(self, node: Node, outputs: Dict[str, Any], inputs: Dict[str, Any]) -> None:
if node._func_name == "train_model":
model = outputs["example_model"]
mlflow.sklearn.log_model(model, "model")
mlflow.log_params(inputs["parameters"])
My question is, I only need to log metrics in this specific train_model
node, while based on my understanding, this function will run every time a node finishes, and there could be a lot of nodes in the whole pipeline. Is there way I could specify which node this hook is hooked to?noklam
05/16/2022, 9:06 PMwwliu
05/16/2022, 9:12 PMdatajoely
05/16/2022, 9:16 PMwwliu
05/16/2022, 9:36 PMif
statements in the scripts.
if node._func_name == "split_data":
mlflow.log_params(
{"split_data_ratio": inputs["params:example_test_data_ratio"]}
)
elif node._func_name == "train_model":
model = outputs["example_model"]
mlflow.sklearn.log_model(model, "model")
mlflow.log_params(inputs["parameters"])
These are node specific functions, Do you think these are better put in the node logic itself instead of hooks or it is a proper user case for hooks? Or as @noklam suggested using kedro-mlflow?datajoely
05/16/2022, 9:38 PMdatajoely
05/16/2022, 9:38 PMwwliu
05/16/2022, 10:13 PMdatajoely
05/16/2022, 10:23 PMwwliu
05/16/2022, 10:38 PMsplit_data
node I need to log train_split_ratio
, and in train_model
node, I need to log model
object. Does this mean this senario is kind of against the hooks usage guide?wwliu
05/16/2022, 10:38 PMsplit_data
node I need to log train_split_ratio
, and in train_model
node, I need to log model
object. Does this mean this senario is kind of against the hooks usage guide?