noklam
04/20/2022, 4:12 PMsession.run
is actually the core API for kedro's pipeline now. The use case of debugging is a valid one, especially without the access of a debugger. The current concept of "free outputs" is essentially pipeline's output (node's output that has no consumer, thus intermediate output are not included) - set of catalog entries (anything defined in the catalog).
From my experience, this is not convenient as the intermediate output could be useful for debugging, but I think the main reason is that kedro tries to be memory efficient. I think it's one area we could improve on interactive workflow.Apoorva
04/21/2022, 12:43 PMhist_train_candidates = node(
func=monitor_training_cand,
inputs="training_candidates",
outputs="training_candidates_hist",
name="hist_train_candidates_node")
stitch_train_candidates = node(
func=stitch_training_cand,
inputs=["training_candidates_hist", "stitched_hist"],
outputs="stitched_hist",
name="stitch_train_candidates_node")
create_report = node(
func=create_report_cand,
inputs="stitch_train_candidates",
outputs="stitch_report",
name="create_report_node")
Having output same as your input isn't supported, but I do need it for my usecase. Plus I have to create custom versioned dataset for stitch_hist which is leading to(after applying a hack of different name catalog entry but points to same file location )
*`raise VersionNotFoundError(f"Did not find any versions for {self}")
kedro.io.core.VersionNotFoundError: Did not find any versions for HistogramDataSet(filepath=/Users/Project/data/08_reporting/stitch_train_candidates.json, protocol=file, version=Version(load=None, save='2022-04-21T12.17.11.537Z'))`*
Any suggestion on how to better handle this scenario?datajoely
04/21/2022, 12:44 PMApoorva
04/21/2022, 1:17 PMstitch_train_candidates = node(
func=stitch_training_cand,
inputs=["training_candidates_hist", "stitched_hist"],
outputs="stitch_train_candidates",
name="stitch_train_candidates_node")
stitched_hist isn't available and I am getting this error
raise VersionNotFoundError(f"Did not find any versions for {self}")
kedro.io.core.VersionNotFoundError: Did not find any versions for HistogramDataSet(filepath=/Users/Project/data/08_reporting/stitch_train_candidates.json, protocol=file, version=Version(load=None, save='2022-04-21T12.17.11.537Z'))
How can i fix that?datajoely
04/21/2022, 1:20 PMApoorva
04/21/2022, 1:24 PMclass HistogramDataSet(AbstractVersionedDataSet):
def __init__(self, filepath: str, version: Version = None, credentials: Dict[str, Any] = None):
_credentials = deepcopy(credentials) or {}
protocol, path = get_protocol_and_path(filepath)
self._protocol = protocol
self._fs = fsspec.filesystem(self._protocol, **_credentials)
super().__init__(
filepath=PurePosixPath(path),
version=version,
exists_function=self._fs.exists,
glob_function=self._fs.glob, )
def _load(self):
load_path = get_filepath_str(self._filepath, self._protocol)
log.info(f'load_path: {load_path}')
try:
with self._fs.open(load_path) as f:
return json.load(f)
except FileNotFoundError:
return None
def _save(self, data) -> None:
"""Saves data to the specified filepath."""
save_path = get_filepath_str(self._filepath, self._protocol)
with self._fs.open(save_path, mode="w") as f:
json.dump(data, f, default=dumper)
self._invalidate_cache()
for versioning I am using kedro functionalityavan-sh
04/21/2022, 2:30 PMVersionNotFoundError
instead of FileNotFoundError
.LightMiner
04/21/2022, 7:52 PMhttps://www.youtube.com/watch?v=CIRVpMqWEIs▾
datajoely
04/21/2022, 8:09 PMnd0rf1n
04/25/2022, 2:32 PMkedro run
, I get the following ValueError:
ValueError: Pipeline input(s) {'params:data_science.model_options_experimental', 'params:data_science.active_modelling_pipeline.model_options'} not found in the DataCatalog
datajoely
04/25/2022, 2:33 PMnd0rf1n
04/25/2022, 2:33 PMdatajoely
04/25/2022, 2:33 PMnd0rf1n
04/25/2022, 2:33 PMdatajoely
04/25/2022, 2:33 PMnd0rf1n
04/25/2022, 2:35 PMdatajoely
04/25/2022, 2:38 PMnd0rf1n
04/25/2022, 2:42 PMconf/base/parameters/data_science.yml
is indeed populated with:
model_options:
test_size: 0.2
random_state: 3
features:
- engines
- passenger_capacity
- crew
- d_check_complete
- moon_clearance_complete
- iata_approved
- company_rating
- review_scores_rating
model_options_experimental:
test_size: 0.2
random_state: 8
features:
- engines
- passenger_capacity
- crew
- review_scores_rating
I thought you were referring to the conf/base/parameters.yml
, which is emptydatajoely
04/25/2022, 2:44 PMkedro -V
I think you have accidentally upgrade to 0.18.x which has breaking changesdatajoely
04/25/2022, 2:44 PMnamespace
as the top leveldatajoely
04/25/2022, 2:45 PMnd0rf1n
04/25/2022, 2:47 PMdatajoely
04/25/2022, 2:47 PMnd0rf1n
04/25/2022, 2:49 PMnd0rf1n
04/25/2022, 2:50 PMnd0rf1n
04/25/2022, 3:07 PMdatajoely
04/25/2022, 3:08 PMRafał
04/26/2022, 10:08 AMkedro run -c config.yml
.
I see the official documentation says nothing about --from-nodes
. I am afraid I have a case that kedo 0.18.0
ignores my option provided in run.from-nodes
noklam
04/26/2022, 10:13 AMconfig.yml
looks like?noklam
04/26/2022, 10:13 AMconfig.yml
looks like?