https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • b

    Burn1n9m4n

    03/23/2022, 5:27 PM
    I think I’ll go with the former for now. Thanks!
  • d

    datajoely

    03/23/2022, 5:28 PM
    You can use parameters here if you want to keep the definition declarative
  • b

    Burn1n9m4n

    03/23/2022, 7:21 PM
    So if I used the parameters.yml, would it be something like dtype: “params:datetimes”
  • d

    datajoely

    03/23/2022, 7:26 PM
    If you
    datetimes
    key included a list of columns I would map a dictionary of column name to date type pairs and pass those to
    df.astype
  • w

    waylonwalker

    03/23/2022, 7:45 PM
    Maybe it's me being too simple but I have never used load_args.... even before kedro I never used those arguments. It feels like one more thing to know rather than just using astype, which you are going to need to know and use anyways.
  • d

    datajoely

    03/23/2022, 7:46 PM
    So there is a performance benefit in many cases doing it on load, given were using xls it's not going to be significant
  • w

    waylonwalker

    03/23/2022, 9:34 PM
    I could see that, almost all of my pipelines run on a schedule, for pennies per run.... maybe that privilege of never worrying about pipeine perf has me spoiled. readability, maintainablility >> perf for a pipeline that runs for a few minutes when no one is looking.
  • d

    datajoely

    03/23/2022, 9:44 PM
    I think that's valid, but I would also say there is a ton of functionality in the
    load_args
    and
    save_args
    that you should be aware of and in some cases keep declarative and configurable in the catalog
  • i

    idriss__

    03/24/2022, 9:32 AM
    Hi ! actually i'm strugling with a resources issues, because i have to process a lot of images with high resolution and the node processing have to store them in RAM and that raises OutOfMemory issue. so i think of processing that sequentially in the node ( the node process image then save it, not like now all images are processed then saved in catalog). i found that it can be doable with custom runner isn't it ? any tips for that ?
    d
    d
    +2
    • 5
    • 81
  • d

    datajoely

    03/24/2022, 9:51 AM
    Releasing memory deliberately
  • n

    noestl

    03/24/2022, 1:48 PM
    Hello, I tried to install kedro great but I get an error on a lib. It is still maintain ?
  • d

    datajoely

    03/24/2022, 1:55 PM
    sorry we don't maintain that and I don't think it is
  • d

    datajoely

    03/24/2022, 1:56 PM
    if you are only use pandas, I would recommend that you experiment with pandera since that is super easy to investigate
  • n

    noestl

    03/24/2022, 2:53 PM
    I am working with pandas and pyspark dfs, I will try to install GE and configure it on kedro directly
  • d

    datajoely

    03/24/2022, 3:02 PM
    Yeah we're hoping someone from the community makes an open source plug-in like we have for MLFlow
  • j

    jcasanuevam

    03/28/2022, 8:28 AM
    Hello guys!! Is there a way to change dynamically the parameter.yml file based on some preprocessing/feature engineering nodes? I mean, imagine I want to remove certain columns based on the null values they have but these columns will change over time so I don't want to go to the parameter.yml file and rewrite by hand the name of the columns to delete.
  • a

    antony.milne

    03/28/2022, 8:57 AM
    Hi @User, there's not a good way to modify a parameters file directly (it's meant to used strictly for input). The right approach here would be to make a dataset to store those column names in. If you set
    versioned: true
    then you can keep track of the file over time as well. There are types like
    yaml.YAMLDataSet
    available for this sort of thing: https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.html
  • j

    jcasanuevam

    03/28/2022, 9:04 AM
    Thanks! I'll give it a try
  • v

    vivecalindahl

    03/28/2022, 1:18 PM
    Hi! I previously asked about parameterizing the data catalog using an environment variable for the location of a data file: https://discord.com/channels/778216384475693066/846330075535769601/953594739008077825 . I solved that by adding an env var to the globals_dict in hooks.py and then accessing it in data_catalog.yml. Similarly, now I'd now like to access metadata about the file in a node that processes the data. So something like:
    [...]
    node(
    name="process",
    func="process_fcn",
    inputs=dict(
      df="data_at_filepath",
      df_info="${DATA_INFO}"
      ),
    [...]
    )
    where
    DATA_INFO
    would be an environment variable. However, AFAICT I can't inject an environment variable like this, the globals dict is not available (?). The two solutions I see are 1) just using
    os.getenv
    inside of the function
    process_fcn
    or 2) instead make the data info a parameter, refer to it as
    param:data_info
    and pass it in via
    kedro run --params data_info:<something>
    . Or is there a better way? This looks pretty similar to what I'm asking about: https://github.com/kedro-org/kedro/issues/1076
    d
    • 2
    • 5
  • d

    Dhaval

    03/28/2022, 7:12 PM
    How can I load_context of a Kedro Project on kedro 0.17.7? I am planning to use kedro with streamlit
  • d

    datajoely

    03/28/2022, 7:13 PM
    You shouldn't need to - the correct way is via hooks
  • d

    datajoely

    03/28/2022, 7:13 PM
    What are you trying to achieve?
  • d

    Dhaval

    03/28/2022, 7:13 PM

    https://www.youtube.com/watch?v=fYkVtzXUEBE▾

    I am trying to create a dashboard by using this tutorial
  • d

    datajoely

    03/28/2022, 7:13 PM
    Actually in this case you are correct
  • d

    datajoely

    03/28/2022, 7:14 PM
    This video is very old though
  • d

    Dhaval

    03/28/2022, 7:14 PM
    I know there's no load_context now, it has been replaced by KedroContext
  • d

    Dhaval

    03/28/2022, 7:15 PM
    But I don't know how to set the Kedro context so that I can use it on my streamlit application
  • d

    Dhaval

    03/28/2022, 7:15 PM
    I always get this error
    2022-03-29 00:25:09.405 Traceback (most recent call last):
      File "/home/thakkar/anaconda3/envs/basic_vis/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 443, in _run_script
        exec(code, module.__dict__)
      File "/home/thakkar/Work/ramp-zendesk/app.py", line 17, in <module>
        data = context.catalog.list()
      File "/home/thakkar/anaconda3/envs/basic_vis/lib/python3.8/site-packages/kedro/framework/context/context.py", line 320, in catalog
        return self._get_catalog()
      File "/home/thakkar/anaconda3/envs/basic_vis/lib/python3.8/site-packages/kedro/framework/context/context.py", line 356, in _get_catalog
        conf_catalog = self.config_loader.get("catalog*", "catalog*/**", "**/catalog*")
      File "/home/thakkar/anaconda3/envs/basic_vis/lib/python3.8/site-packages/kedro/framework/context/context.py", line 449, in config_loader
        return self._get_config_loader()
      File "/home/thakkar/anaconda3/envs/basic_vis/lib/python3.8/site-packages/kedro/framework/context/context.py", line 432, in _get_config_loader
        raise KedroContextError(
    kedro.framework.context.context.KedroContextError: Expected an instance of `ConfigLoader`, got `NoneType` instead.
  • d

    datajoely

    03/28/2022, 7:16 PM
    You can steal the way kedro-viz does it: https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/integrations/kedro/data_loader.py
  • d

    Dhaval

    03/28/2022, 7:21 PM
    Done! Thanks @User
Powered by Linen
Title
d

Dhaval

03/28/2022, 7:21 PM
Done! Thanks @User
View count: 1