datajoely
11/29/2021, 10:39 AMSQLQueryDataSet
+ Jinja or define your own datasetApoorva
11/29/2021, 10:52 AMdatajoely
11/29/2021, 11:54 AMinsert
, upsert
or overwrite
. There isn't really a 'read only' mode, if you only plan on reading you can select any of those and it will never if you never save back to HIVE. If you really want to block saves, you can inherit the dataset and override the save()
method to raise NotImplementedError
sri
11/29/2021, 5:34 PMdatajoely
11/29/2021, 5:36 PMsri
11/29/2021, 7:01 PMcook_breakfast_pipeline = pipeline(
cook_pipeline, parameters={"params:recipe": "params:breakfastrecipe"})
cook_lunch_pipeline = pipeline(
cook_pipeline, parameters={"params:recipe": "params:lunchrecipe"})
Now i want to run the cook_pipeline from command line with two differetnt parameters. The parameters is a long json with many key value pairs.
I try the --config option but it wasnt workingj c h a r l e s
11/30/2021, 8:44 AM{
"name": "Kedro: Run Project",
"type": "python",
"cwd": "${workspaceFolder}/my-working-dir>",
"request": "launch",
"program": "${workspaceFolder}/venv39/bin/kedro",
"args": [
"run",
"--runner=src.<my-package>.runner.DryRunner"
],
"console": "integratedTerminal"
},
datajoely
11/30/2021, 9:22 AMAbstractRunner
which looks like it has a 3rd argument of is_async:bool
.
https://kedro.readthedocs.io/en/stable/_modules/kedro/runner/sequential_runner.html#SequentialRunner
https://kedro.readthedocs.io/en/stable/kedro.runner.AbstractRunner.html
So I think you need to tweak the example code in the docs for DryRunner to looks more like the constructor in SequentialRunner. I'll raise a ticket to fix the docs.datajoely
11/30/2021, 9:23 AMmartinlarsalbert
12/01/2021, 11:37 AMmartinlarsalbert
12/01/2021, 11:57 AMmartinlarsalbert
12/01/2021, 1:19 PMio = DataCatalog()
io.add_feed_dict({'params:<name>:<value>}
)datajoely
12/01/2021, 1:19 PMj c h a r l e s
12/01/2021, 11:36 PMRroger
12/02/2021, 2:10 AMkedro.io.core.DataSetError: <class 'pandas.core.frame.DataFrame'> was not serialized due to: 'AioClientCreator' object has no attribute '_register_lazy_block_unknown_fips_pseudo_regions'
. Not sure what's happening here. Maybe due to AWS credentials?
2. Should I use the local/credentials.yml
file? I added three items:
roger-data-science:
aws_access_key_id: XXXXXX
aws_secret_access_key: XXXXXXX
role_arn: arn:aws:iam::XXXXXXX:role/AmazonSageMaker-ExecutionRole
but Kedro still gives the warning UserWarning: Credentials not found in your Kedro project config.
.
β Actually I just had to move the credendtials.yml
to the conf/sagemaker
directory
3. How does one specify which profile in .aws/credentials
to use with the Kedro project?
[default]
aws_access_key_id = YOUR_AWS_ACCESS_KEY_ID
aws_secret_access_key = YOUR_AWS_SECRET_ACCESS_KEY
[project1]
aws_access_key_id = ANOTHER_AWS_ACCESS_KEY_ID
aws_secret_access_key = ANOTHER_AWS_SECRET_ACCESS_KEY
Rroger
12/02/2021, 2:44 AMAioClientCreator
is due to botocore version. So I downgraded to botocore==1.22.5. But then I get another error: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
.Rroger
12/02/2021, 3:33 AMbotocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
overrides whatever is in conf/credentials.yml
.datajoely
12/02/2021, 9:56 AMdatajoely
12/02/2021, 9:57 AMdatajoely
12/02/2021, 9:57 AMNC
12/02/2021, 3:29 PMon_node_error
hook is implemented? I would like to use the Pandera decorator functions (@check_input
and @check_output
) to validate data inputs/outputs, and use the on_node_error
hook to catch the SchemaError
and save the failed cases if data validation fails. Is the on_node_error
the right approach for this? Thank you loads in advance!datajoely
12/02/2021, 3:34 PMParalellRunner
with Pandera decorators as they can't be serialised.NC
12/02/2021, 3:38 PMon_node_error
is implemented.NC
12/02/2021, 3:43 PMon_node_error
hook?datajoely
12/02/2021, 3:44 PMhooks.py
you do so something like this:
python
class PanderaCheckHooks:
from kedro.pipeline.node import Node
from kedro.io.data_catalog import DataCatalog
@hook_impl
def on_node_error(
error: Exception,
node: Node,
catalog: DataCatalog,
inputs: Dict[str, Any],
is_async: bool,
run_id: str,
):
pass # do something
datajoely
12/02/2021, 3:44 PMsettings.py
import itdatajoely
12/02/2021, 3:45 PMbreakpoint()
in where it says pass and explore what you have avaialblePiesky
12/02/2021, 3:45 PMconf/base/logging.yml
I'm still getting an <repository-root>/info.log
that isn't mentioned in the config file. I tried to search in the code for it but no luck. Any idea where it comes from?NC
12/02/2021, 3:46 PMdatajoely
12/02/2021, 3:47 PMdatajoely
12/02/2021, 3:47 PM