Isaac89
02/10/2022, 3:47 PMIsaac89
02/10/2022, 3:48 PMIsaac89
02/10/2022, 3:55 PMczix
02/11/2022, 1:16 PMget_current_session()
method is removed, how do I get the current session when running?datajoely
02/11/2022, 1:30 PMIsaac89
02/11/2022, 10:41 PMRRoger
02/12/2022, 4:56 AMDATASET_EXPECTATION_MAPPING
is defined in the class itself:
class DataValidationHooks:
# Map expectation to dataset
DATASET_EXPECTATION_MAPPING = {
"companies": "raw_companies_dataset_expectation",
"preprocessed_companies": "preprocessed_companies_dataset_expectation",
}
...
Is it possible to define this in the parameters yml? before_node_run
and after_node_run
doesn't seem to pass in the context
.datajoely
02/13/2022, 5:00 PMcatalog
objectmjmare
02/15/2022, 11:15 AM{% for table in openac_tables %}
{{ table }}:
layer: primary
type: pandas.ParquetDataSet
filepath: data/03_primary/{{table}}.parquet
save_args:
from_pandas:
preserve_index: False
{% endfor %}
{% for table in openac_tables %}
profile_{{ table }}:
layer: qa
type: ac_pipelines.datasets.ProfilingDataSet
filepath: data/08_reporting/profiles/{{table}}.html
{% endfor %}
and then generate nodes in the pipeline:
def create_pipeline(**kwargs):
from kedro.config import ConfigLoader
conf_paths = ["conf/base", "conf/local"]
conf_loader = ConfigLoader(conf_paths)
table_names = conf_loader.get('*globals.yml')['openac_tables']
return Pipeline([
node(func=lambda x: x,
inputs=tn,
outputs=f'profile_{tn}',
name=f'profile_{tn}',
)
for tn in table_names
])
It works. But it feels hacky.
It could be improved if I could get the default config_loader from somewhere. I had some success with:
from kedro.framework.session import get_current_session
session = get_current_session()
context = session.load_context()
table_names = context.config_loader.get('*globals.yml')['openac_tables']
but that confuses Kedro viz (Error: There is no active Kedro session.)
More substantial improvement would be if the Pipeline/Node could be dynamically parametrized (at runtime). Don't know if that is the right term. I want to feed a variable number of Datasets to a pipeline )or node).
I'm probably doing something wrong, so suggestions are welcome.Isaac89
02/15/2022, 12:28 PMtry:
session = get_current_session()
except RuntimeError:
session = KedroSession.create(package_name=package_name, project_path=package_path)
but I don't know wether this is the best way to achieve itdatajoely
02/15/2022, 12:29 PMuser
02/16/2022, 1:50 PMpip install kedro
but I get the following error:
ERROR: Could not find a version that satisfies the requirement kedro (from versions: none)
ERROR: No matching distribution found for kedro
Does anyone know what's causing this?datajoely
02/16/2022, 1:52 PMuser
02/16/2022, 1:53 PMuser
02/16/2022, 1:53 PMuser
02/16/2022, 1:54 PMdatajoely
02/16/2022, 2:00 PMpip install kedro-0.17.6.tar.gz
locally
https://pypi.org/project/kedro/#filesdatajoely
02/16/2022, 2:00 PMuser
02/16/2022, 2:01 PMdatajoely
02/16/2022, 2:03 PMdatajoely
02/16/2022, 2:04 PMuser
02/16/2022, 2:04 PMdatajoely
02/16/2022, 2:04 PMdatajoely
02/16/2022, 2:04 PMdatajoely
02/16/2022, 2:04 PMuser
02/16/2022, 2:05 PMdatajoely
02/16/2022, 2:06 PMdatajoely
02/16/2022, 2:06 PMpandas.ExcelDataSet
requiring openpyxl
not xlrd
user
02/16/2022, 2:07 PMuser
02/16/2022, 2:09 PMuser
02/16/2022, 2:09 PM