brewski
08/14/2022, 10:39 PMbrewski
08/14/2022, 11:28 PMdatajoely
08/15/2022, 9:57 AMjavier.16
08/15/2022, 12:09 PMdatajoely
08/15/2022, 12:20 PMjavier.16
08/15/2022, 1:17 PMdatajoely
08/15/2022, 1:43 PMantheas
08/15/2022, 2:29 PM%pipe tab_adult.ingest
for e1 in (0.3, 0.5, 0.9, 1.5):
pipe("tab_adult.privbayes.synth", {"alg.e1": e1})
It also converts "alg.e1": e1 to {alg: {e1: e1}} for youdatajoely
08/15/2022, 2:33 PMantheas
08/15/2022, 2:34 PMantheas
08/15/2022, 2:37 PMmlflow_log_model_closure
) to pin the name for the version of the function that will run.
What's the better solution for this?
def mlflow_log_model_results(name: str, res: pd.DataFrame):
if not mlflow.active_run():
return
...
def mlflow_log_model_closure(name: str):
def closure(res: pd.DataFrame):
return mlflow_log_model_results(name, res)
closure.__name__ = f"log_{name}_model_results"
return closure
datajoely
08/15/2022, 3:05 PMbrewski
08/16/2022, 11:34 PMroman
08/17/2022, 8:05 AMroman
08/17/2022, 8:06 AMdatajoely
08/17/2022, 10:30 AMroman
08/18/2022, 7:17 AMdatajoely
08/18/2022, 2:29 PMdatajoely
08/18/2022, 2:29 PMdatajoely
08/18/2022, 2:29 PMroman
08/18/2022, 5:01 PMdatajoely
08/18/2022, 5:05 PMwaylonwalker
08/18/2022, 8:16 PMithomp
08/19/2022, 5:51 PMkedro run --params ...
) with templated configuration of my data catalog so I can specify a project/site name to use as a prefix (subdirectory) on my file paths in the catalog. I'm able to achieve this functionality if I specify the parameter in my globals config, but it appears that runtime parameters provided through the CLI are not available to the TemplateConfigLoader. My goal is to enable execution of the pipeline on different raw datasets while preserving the previous dataset's data directory and without requiring the user to edit the global config file. Is this possible or is there another way I should go about this? Any advice would be greatly appreciated 😀PetitLepton
08/20/2022, 12:12 PMos.environ
in globals_dict to parametrize the paths like https://github.com/kedro-org/kedro/issues/403.PetitLepton
08/21/2022, 1:53 PMaggregates@query_template:
type: text.TextDataSet
filepath: data/01_raw/aggregates_query.sql
aggregates@query_string:
type: text.TextDataSet
filepath: data/02_intermediate/filled_aggregates_query.sql
aggregates@query:
type: pandas.SQLQueryDataSet
filepath: data/02_intermediate/filled_aggregates_query.sql
credentials: aggregates_uri
and the pipeline
def create_pipeline(**kwargs) -> Pipeline:
return Pipeline(
[
node(
parse_parameters,
inputs=[
"params:start_date",
"params:end_date",
"params:metric",
],
outputs="query_parameters",
),
node(
fill_template,
inputs=["aggregates@query_template", "query_parameters"],
outputs="aggregates@query_string",
),
node(
perform_query,
inputs=["aggregates@query"],
outputs="results",
),
]
)
Transcoding ensures that the second node runs before the third node. I like using transcoding in this situation because it makes the link between nodes more transparent than using an extra output/input.
Please let me know what you think about it.datajoely
08/21/2022, 6:10 PMdatajoely
08/21/2022, 6:12 PMantheas
08/21/2022, 7:55 PMruntime_params
property. You can extend the template config loader and just add your path via the self._config_mapping
in __init__()
datajoely
08/21/2022, 7:56 PM