https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    11/29/2021, 10:39 AM
    You can define it in the
    SQLQueryDataSet
    + Jinja or define your own dataset
  • a

    Apoorva

    11/29/2021, 10:52 AM
    Hey Team, In my project pipeline, I have to read data from a Hive table. I have been using Catalog type as spark.SparkHiveDataSet which needs write_mode parameter which is not relevant for my requirement. And I can't find any documentation around read_only mode.Any suggestion to work with this request?
  • d

    datajoely

    11/29/2021, 11:54 AM
    So the valid arguments are
    insert
    ,
    upsert
    or
    overwrite
    . There isn't really a 'read only' mode, if you only plan on reading you can select any of those and it will never if you never save back to HIVE. If you really want to block saves, you can inherit the dataset and override the
    save()
    method to raise
    NotImplementedError
  • s

    sri

    11/29/2021, 5:34 PM
    how to execute a modular pipeline with 2 different config params with "kedro run" in command line?
  • d

    datajoely

    11/29/2021, 5:36 PM
    kedro run --params x:y
    https://kedro.readthedocs.io/en/latest/09_development/03_commands_reference.html#modifying-a-kedro-run
  • s

    sri

    11/29/2021, 7:01 PM
    Thanks for your responses today. I am trying to use modular pipeline as explained in https://kedro.readthedocs.io/en/stable/06_nodes_and_pipelines/03_modular_pipelines.html
    cook_breakfast_pipeline = pipeline(
        cook_pipeline,  parameters={"params:recipe": "params:breakfastrecipe"})
    cook_lunch_pipeline = pipeline(
        cook_pipeline,  parameters={"params:recipe": "params:lunchrecipe"})
    Now i want to run the cook_pipeline from command line with two differetnt parameters. The parameters is a long json with many key value pairs. I try the --config option but it wasnt working
    d
    • 2
    • 2
  • j

    j c h a r l e s

    11/30/2021, 8:44 AM
    Hi Kedro team, I am brand new to kedro and have been trying to figure out what exactly I need to modify such that I only run the parts of the pipeline that are missing. I have seen the multiple pull requests for this feature, which have indicated that one can modify the KedroContext in order to make this happen. Do you happen to have any good examples of someone modifying the KedroContext in this way? I also tried to create a custom runner to solve this issue and when I follow the example to use the DryRunner, I'm getting the following error: "_run() takes 3 positional arguments but 4 were given" I am using this version of Kedro: kedro @ git+https://github.com/quantumblacklabs/kedro.git@7c537f919b84d0cce7aa5a2343554700c28bb2bb My launch.json file looks like this:
    {
                "name": "Kedro: Run Project",
                "type": "python",
                "cwd": "${workspaceFolder}/my-working-dir>",
                "request": "launch",
                "program": "${workspaceFolder}/venv39/bin/kedro",
                "args": [
                    "run",
                    "--runner=src.<my-package>.runner.DryRunner"
                ],
                "console": "integratedTerminal"
            },
  • d

    datajoely

    11/30/2021, 9:22 AM
    Hi @User I think this is some outdated docs. We recently moved our built in runners to inherit from a class called
    AbstractRunner
    which looks like it has a 3rd argument of
    is_async:bool
    . https://kedro.readthedocs.io/en/stable/_modules/kedro/runner/sequential_runner.html#SequentialRunner https://kedro.readthedocs.io/en/stable/kedro.runner.AbstractRunner.html So I think you need to tweak the example code in the docs for DryRunner to looks more like the constructor in SequentialRunner. I'll raise a ticket to fix the docs.
  • d

    datajoely

    11/30/2021, 9:23 AM
    Modular pipeline param override
  • m

    martinlarsalbert

    12/01/2021, 11:37 AM
    What is the best way run many datasets through the same pipeline?
  • m

    martinlarsalbert

    12/01/2021, 11:57 AM
    Is it possible to pass parameters to a SequentialRunner() ? You can pass a catalog of datasets, but where does the parameters go?
  • m

    martinlarsalbert

    12/01/2021, 1:19 PM
    I read a bit and found the "add_feed_dict"
    io = DataCatalog()
    io.add_feed_dict({'params:<name>:<value>}
    )
  • d

    datajoely

    12/01/2021, 1:19 PM
    Hi @User I'll answer all of you questions in this thread
    m
    j
    • 3
    • 90
  • j

    j c h a r l e s

    12/01/2021, 11:36 PM
    Are there any good examples of more complicated modular pipelines? Nonlinear pipelines that loop back and update data from prior steps?
  • r

    Rroger

    12/02/2021, 2:10 AM
    I'm trying to do the example here. 1. Getting error
    kedro.io.core.DataSetError: <class 'pandas.core.frame.DataFrame'> was not serialized due to: 'AioClientCreator' object has no attribute '_register_lazy_block_unknown_fips_pseudo_regions'
    . Not sure what's happening here. Maybe due to AWS credentials? 2. Should I use the
    local/credentials.yml
    file? I added three items:
    roger-data-science:
      aws_access_key_id: XXXXXX
      aws_secret_access_key: XXXXXXX
      role_arn: arn:aws:iam::XXXXXXX:role/AmazonSageMaker-ExecutionRole
    but Kedro still gives the warning
    UserWarning: Credentials not found in your Kedro project config.
    . βœ” Actually I just had to move the
    credendtials.yml
    to the
    conf/sagemaker
    directory 3. How does one specify which profile in
    .aws/credentials
    to use with the Kedro project?
    [default]
    aws_access_key_id = YOUR_AWS_ACCESS_KEY_ID
    aws_secret_access_key = YOUR_AWS_SECRET_ACCESS_KEY
    
    [project1]
    aws_access_key_id = ANOTHER_AWS_ACCESS_KEY_ID
    aws_secret_access_key = ANOTHER_AWS_SECRET_ACCESS_KEY
  • r

    Rroger

    12/02/2021, 2:44 AM
    Actually, according to [SO](https://stackoverflow.com/questions/69994834/attributeerror-aioclientcreator-object-has-no-attribute-register-lazy-block/70028682), the
    AioClientCreator
    is due to botocore version. So I downgraded to botocore==1.22.5. But then I get another error:
    The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.
    .
  • r

    Rroger

    12/02/2021, 3:33 AM
    Seems like
    botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
    overrides whatever is in
    conf/credentials.yml
    .
  • d

    datajoely

    12/02/2021, 9:56 AM
    Hi @User - your ENV variables will take precedence over any creds configured in Kedro. In fact I'd argue it's preferable to keep your creds outside of Kedro - we just provide a mechanism for doing so people don't fall into common traps like accidentally committing to git etc.
  • d

    datajoely

    12/02/2021, 9:57 AM
    I'd be interested to see what error you get at if you remove the credentials from Kedro side
  • d

    datajoely

    12/02/2021, 9:57 AM
    > The unspecified location constraint is incompatible for the region specific endpoint This feels like a AWS config error are you able to talk to Sagemaker using the credentials outside of Kedro?
  • n

    NC

    12/02/2021, 3:29 PM
    Hello! Are there any examples for how the
    on_node_error
    hook is implemented? I would like to use the Pandera decorator functions (
    @check_input
    and
    @check_output
    ) to validate data inputs/outputs, and use the
    on_node_error
    hook to catch the
    SchemaError
    and save the failed cases if data validation fails. Is the
    on_node_error
    the right approach for this? Thank you loads in advance!
  • d

    datajoely

    12/02/2021, 3:34 PM
    Hello @User this will work I think - I don't think we have any examples ready to go, so if you get something working we'd love to include it in the docs. One point of warning - you cannot use
    ParalellRunner
    with Pandera decorators as they can't be serialised.
  • n

    NC

    12/02/2021, 3:38 PM
    Thank you for the reminder, I saw the GIthub issue about that! Good to know that this is a viable approach, but I don't know how to make it work πŸ˜† , which is why I would love to see an example of how
    on_node_error
    is implemented.
  • n

    NC

    12/02/2021, 3:43 PM
    Would you be able to provide an example of how an exception can trigger a certain behaviour in a
    on_node_error
    hook?
  • d

    datajoely

    12/02/2021, 3:44 PM
    So in your
    hooks.py
    you do so something like this:
    python
    class PanderaCheckHooks:
        from kedro.pipeline.node import Node
        from kedro.io.data_catalog import DataCatalog
    
        @hook_impl
        def on_node_error(
            error: Exception,
            node: Node,
            catalog: DataCatalog,
            inputs: Dict[str, Any],
            is_async: bool,
            run_id: str,
        ):
            pass # do something
  • d

    datajoely

    12/02/2021, 3:44 PM
    and then in
    settings.py
    import it
  • d

    datajoely

    12/02/2021, 3:45 PM
    the other thing you can do is put
    breakpoint()
    in where it says pass and explore what you have avaialble
  • p

    Piesky

    12/02/2021, 3:45 PM
    Hey Kedroids, quick question - I wanted to prevent creating any log files but after disabling all relevant handlers in
    conf/base/logging.yml
    I'm still getting an
    <repository-root>/info.log
    that isn't mentioned in the config file. I tried to search in the code for it but no luck. Any idea where it comes from?
  • n

    NC

    12/02/2021, 3:46 PM
    Thank you so much @User ! I will try this and report back. πŸ™‚
  • d

    datajoely

    12/02/2021, 3:47 PM
    I'm pretty sure this is a known issue - and fixed in the version about to be released. Let me check with the team.
Powered by Linen
Title
d

datajoely

12/02/2021, 3:47 PM
I'm pretty sure this is a known issue - and fixed in the version about to be released. Let me check with the team.
View count: 1