https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • f

    FelicioV

    03/21/2022, 12:02 PM
    I believe I just found it. It's on example 15 in the data_catalog docs
  • f

    FelicioV

    03/21/2022, 12:03 PM
    yaml
    dev_abs:
      account_name: accountname
      account_key: key
  • d

    datajoely

    03/21/2022, 12:04 PM
    Yes something like that! This is the advice I gave last time this came up: https://stackoverflow.com/a/69941391/2010808
  • f

    FelicioV

    03/21/2022, 12:08 PM
    I actually found this post and recognised your name on the accepted reply. It does not mention the credentials keys explicitly, if I'm not mistaken. Anyway, It all came to my lack of attention when reading the docs (as it does more frequently than I'd hope for). Thanks for your time, again!
  • d

    datajoely

    03/21/2022, 12:10 PM
    yeah the credentials (whilst we use S3 as an example) is documented here https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html#feeding-in-credentials and as you said we just provide this to fsspec so at that point it's best two play around with their API directly
  • d

    datajoely

    03/21/2022, 12:10 PM
    shout if you have any questions!
  • d

    digitalsam

    03/21/2022, 3:07 PM
    Hi all, I feel this is a simple question about managing environments but can't manage to find a satisfying answer to it. The closest I've found is [this answer](https://github.com/kedro-org/kedro/discussions/909). In brief, if I want to start using Kedro for all my future projects, should I create only a single conda env
    kedro
    (with only Kedro installed) that would take care of all future environment needs in the future no matter the project OR should I still create a separate virtual env for each project with only Kedro installed? (I understand the requirements.txt vs requirements.in, but still can't wrap my head around the meta virtuel env 🥵 ) Thanks!
  • o

    Onéira

    03/21/2022, 3:31 PM
    Hello! Would someone knows if it is possible to create PartionedDataset from an Azure location? A friend of mine is trying to do that and she struggles a bit. She's not sure if it is because it is not possible or because her Azure configuration is wrong...
    d
    • 2
    • 5
  • d

    datajoely

    03/21/2022, 3:33 PM
    Hello - good question! You're also not alone the python venv (and wider packaging) world is confusing and opaque (obligatory XKCD https://xkcd.com/1987/) I can try and speak to what I do: (1) I create a custom virtual environment for each kedro project (usually with the same name as my project) (2) I use conda for a couple of reasons, but mostly because I basically have this command saved
    conda create -n "my-environment" python=3.8 -y
    (3) The
    build-reqs
    and
    install
    is explained here and is more about making sure you can 'fail fast' and ensure all of your project dependencies play nicely together https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/01_dependencies.html#project-specific-dependencies
  • d

    DIVINE

    03/21/2022, 3:55 PM
    hello, I tried to follow the guide to use the wrapper to connect two pipeline but I've got "Failed to map datasets and/or parameters". Is there an hidden requirement?
  • d

    datajoely

    03/21/2022, 3:58 PM
    Are you just running the spaceflights tutorial or is this your own project?
  • d

    DIVINE

    03/21/2022, 3:59 PM
    It's my own project
  • d

    DIVINE

    03/21/2022, 3:59 PM
    basically I've got a few pipeline that are disconnected with output O-1 of pipeline 1 being named differently from input I-2 of pipeline 2. Therefore I created a pipeline with pipe = pipeline1, and inputs={'I-2-name':'O-1-name'}
    d
    • 2
    • 10
  • d

    Daehyun Kim

    03/21/2022, 6:08 PM
    Hi team, I have a quick question about the partial parallel pipeline. I'm thinking of a pipeline: node_1 -> node_2 -> (parallel) node_3_1, node_3_2 is this possible to define in pipeline.py?
    d
    • 2
    • 9
  • d

    digitalsam

    03/21/2022, 6:24 PM
    Perfect, thank you! That was the confirmation I was looking for! 😄 I guess the confusion came from the fact that I usually install all dependencies at once but now it's a two-step process. Thanks again!
  • j

    JensKrijgsman

    03/22/2022, 8:30 AM
    Hello I've been working a bit with kedro and installed it on python 3.9, I have had no problems so far. When loading in custom datasets there is a warning : DeprecationWarning: The transformer API will be deprecated in Kedro 0.18.0.Please use Dataset Hooks to customise the load and save methods. I want to make my code future proof so I was looking in to creating hooks for my custom datasets. I looked at https://kedro.readthedocs.io/en/stable/07_extend_kedro/02_hooks.html#use-hooks-to-customise-the-dataset-load-and-save-methods but it was not clear to me how this can overwrite the save and load method of multiple custom datasets. How would you implement a hook based on a specific datatype?
  • d

    datajoely

    03/22/2022, 9:26 AM
    Hi @User I think there is a chance we're over-warning in this case. Unless you're implementing a hook that overrides the
    load
    or
    save
    dataset methods (like the expanded example here https://github.com/kedro-org/kedro/blob/e78990c6b606a27830f0d502afa0f639c0830950/docs/source/07_extend_kedro/06_transformers.md) I don't think you have to worry
  • j

    JensKrijgsman

    03/22/2022, 9:33 AM
    Ah ok, great. Thanks for the quick response! 😀
  • n

    noestl

    03/22/2022, 3:40 PM
    hello yall, I would like to change path in comment when running 'kedro build-reqs'. In my requirements.txt, I would like than the line begin from my source route (src). Any suggestions so line 5 look like 6 or 7 ?
  • d

    datajoely

    03/22/2022, 4:15 PM
    Hello! Good question and I think there is probably scope for us to make this more configurable
  • d

    datajoely

    03/22/2022, 4:20 PM
    Looking at the code here (https://github.com/kedro-org/kedro/blob/a839d6952a88414234d25d6df09422eaf2260011/kedro/framework/cli/project.py#L73) it looks like we just run the following command (
    -q
    puts it in less verbose mode), but the location is hard coded
    pip-compile -q requirements.in
    I think its easiest to not use the
    kedro
    command wrapper here and use
    pip-compile
    directly, if you really want to you can define your own
    cli.py
    and your local version of the command will take precedence.
  • n

    noestl

    03/22/2022, 5:27 PM
    alright thank you for your answer
  • p

    pypeaday

    03/22/2022, 9:58 PM
    Is there a well-defined pattern for bubbling kedro pipelines up through dev/qa/prod environments? My specific use case is in the intermediate steps where data is serialized after any given node. We save parquet files in s3 however as we move pipelines out of dev and into a qa or prod environment we'd have to change those s3 bucket paths. I'd like to be able to configure the bucket based on an environment variable but this isn't immediately obvious to me as to how to do it without hijacking the config loader. Am I missing a simple answer?
  • d

    datajoely

    03/22/2022, 10:30 PM
    Yes we have exactly that
  • d

    datajoely

    03/22/2022, 10:30 PM
    https://kedro.readthedocs.io/en/latest/04_kedro_project_setup/02_configuration.html#additional-configuration-environments
  • p

    pypeaday

    03/22/2022, 10:34 PM
    I'll read the friendly manual! Thanks! Tagging @waylonwalker for work tomorrow
  • b

    Burn1n9m4n

    03/23/2022, 5:22 PM
    So I have a PatitionedDataSet that’s comprised of ‘.xls’ files. I’m trying to parse a subset of the columns as datetime[ns] types since they are coming in as objects. How would I denote that in the catalog? Presently, I have the data set load_args set with the following: load_args: dtype: {‘a’:’datetime[ns]’} I’m not getting an error but the types aren’t conforming to date times for the columns I’ve specified.
  • d

    datajoely

    03/23/2022, 5:25 PM
    so
    pd.read_excel
    is expecting a python type, but Kedro is providing a
    str
  • d

    datajoely

    03/23/2022, 5:25 PM
    we don't have a great/safe way of providing proper python types from YAML here
  • d

    datajoely

    03/23/2022, 5:26 PM
    so you have two options (1) You can convert types in the node (2) You can define a custom dataset to do this in the YAML definition
Powered by Linen
Title
d

datajoely

03/23/2022, 5:26 PM
so you have two options (1) You can convert types in the node (2) You can define a custom dataset to do this in the YAML definition
View count: 1