778216384475693066 #advanced-need-help

Channels

advanced-need-help

job-posting

welcome

noklam

06/15/2022, 10:47 PM

For example https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.pandas.SQLQueryDataSet.html

greeeen

06/17/2022, 3:46 PM

hi there, i have a question about pipeline design. i am working on a nlp project where i built several text processing pipelines for english text from different data sources, for example i have: - [env: preprocess] fetch data from source 1 - [env: preprocess] fetch data from source 2 - [env: base] preprocess - [env: base] NER - [env: base] text summarization - ... now i would like to scale to more languages and more data sources, my initial thought is that i may need to duplicate my base env per each language i support and manually update all the catalog/params by myself (although they are just conventions like prefixing by "en" or "ja" or "fr" etc). is there more "Kedro" way to do accomplish my goal?

datajoely

06/17/2022, 3:48 PM

This is a very good question - we tend to prefer explicit versus implicit. But if you have a stable pipeline, you could probably get hooks to work to ensure the catalog is tailored to the language in question

greeeen

06/17/2022, 4:02 PM

thanks @datajoely for the reply. i do see the benefits of explicit over implicit when there are a manageable number of distinct pipelines. but in my case, making one pipeline for each different language/data source implies lots of nearly duplicated codes in the repository, which i hope to avoid. i will look into your suggestion on getting hooks to work on mutating the state of the Kedro catalog.

datajoely

06/17/2022, 4:02 PM

So but alcmake sure you're using the modern modular pipeline technique of overriding inputs/outputs

datajoely

06/17/2022, 4:03 PM

https://kedro.readthedocs.io/en/stable/nodes_and_pipelines/modular_pipelines.html

datajoely

06/17/2022, 4:03 PM

essentially you can use namespaces to do a lot of the heavy lifting

greeeen

06/17/2022, 4:07 PM

this seems to be a feature introduced in 0.18.0, i am still at 0.17.5 let me check the documentation and how much work is needed for migrating to 0.18.x

datajoely

06/17/2022, 4:19 PM

No it should be there: https://kedro.readthedocs.io/en/0.17.7/06_nodes_and_pipelines/03_modular_pipelines.html

datajoely

06/17/2022, 4:20 PM

The 0.17.5 docs are uglier, but the functionality should be the same

Deep

06/20/2022, 3:37 PM

Just encountered this error. Any fix?

Deep

06/20/2022, 3:43 PM

Never mind found the solution

user

06/20/2022, 6:46 PM

Rate limiting Kedro API requests https://stackoverflow.com/questions/72691571/rate-limiting-kedro-api-requests

datajoely

06/21/2022, 10:12 AM

What was the solution? Would you mind adding it to that StackOverflow post

Deep

06/21/2022, 10:22 AM

I replied

Deep

06/21/2022, 10:23 AM

I upgraded the protobuf package pip install --upgrade "protobuf<=3.20.1"

datajoely

06/21/2022, 11:47 AM

Amazing thanks

user

06/21/2022, 12:05 PM

Access configuration in the pipelines definition (not only nodes) https://stackoverflow.com/questions/72700321/access-configuration-in-the-pipelines-definition-not-only-nodes

debbyChan57

06/24/2022, 6:34 AM

Hi all. I migrate an old Kedro project to the new version. In the old version I updated dynamically my conf in the run.py file when I loaded the configloader. I know it’s not a good practice but for the moment I need to do it. Can you tell me what is the « less worse » way to do this in kedro >0.18 ? Hooks ? Thank you very much for your help.

antony.milne

06/24/2022, 7:49 AM

Hello! This depends on exactly what the custom code was doing. Generally speaking there are two routes here: * hooks, especially the new

after_context_created

one * write a custom

ConfigLoader

and reference it in

CONFIG_LOADER_CLASS

in settings.py This is quite a common question so if you can say a bit more about what your custom code does we can probably point you to examples of others who have done very similar!

debbyChan57

06/24/2022, 8:34 AM

Thanks you for your reply ! My code retrieve a date and create a yyyymm variable from this date and insert this new variable in the globals.

debbyChan57

06/24/2022, 8:35 AM

The yyyymm is used in the catalog for my outputs

antony.milne

06/24/2022, 4:27 PM

ok, the simplest way to do this would be to go into settings.py and uncommnent and modify the bit of code about

CONFIG_LOADER_CLASS

. It should look something like this:

Copy code

# Class that manages how configuration is loaded.
from kedro.config import TemplatedConfigLoader
CONFIG_LOADER_CLASS = TemplatedConfigLoader
# Keyword arguments to pass to the `CONFIG_LOADER_CLASS` constructor.
CONFIG_LOADER_ARGS = {
     "globals_dict": {"date": get_date_in_yyyymm_format()},
}

and then you can use

date

as a variable in your catalog.yml file.

user

06/25/2022, 7:36 AM

Kedro: load existing data catalog programmatically https://stackoverflow.com/questions/72752043/kedro-load-existing-data-catalog-programmatically

debbyChan57

06/25/2022, 11:48 AM

Thanks !

Flow

07/06/2022, 11:08 AM

Hi everyone, Is anyone running kedro + airflow in a "production" setting (e.g. batch prediction, automated retraining etc) and would be interested in a knowledge exchange maybe 30-60 min on their experience and lessons learned? 🙂

kradja

07/07/2022, 1:46 PM

Could you point me towards the solution? I have the same read() not implemented for BaseSessionStore as well as save() not implemented for BaseSessionStore.

noklam

07/07/2022, 1:50 PM

That is not an error but just INFO logging.

deepyaman

07/07/2022, 8:25 PM

Does anybody have practical experience deploying Kedro pipelines as Argo Workflows? I have a couple thoughts/questions around the approach currently recommended in https://kedro.readthedocs.io/en/stable/deployment/argo.html: + Based on your experience, are nodes the correct level for containerization? Should it be one modular pipeline per step instead? The whole pipeline in one step? + Did you consider passing data between workflow steps (see https://github.com/argoproj/argo-workflows/blob/master/examples/artifact-passing.yaml )? Would it be an issue if all intermediate data passing happened like this? + Did the suggested approach (or whatever approach you took) not satisfy certain needs? Could some aspects have been easier? Please also feel free to include any other information w.r.t. your experience deploying to Argo Workflows. Thanks!!

Siavash

07/08/2022, 4:22 PM

Hi, In my new project, we are forced to use Kubeflow. Is it possible to separate pipelines into containers without creating new repositories for each pipeline?