https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
plugins-integrations
  • g

    Galileo-Galilei

    05/02/2022, 12:28 PM
    A related discussion which may be useful : https://github.com/Galileo-Galilei/kedro-mlflow/issues/44
  • f

    Flow

    05/02/2022, 5:23 PM
    Are there any known issues with the "Apache Airflow with Astronomer" guide? I followed the guide with minor adjustments (new astro cli) but when running the DAG I get:
    {standard_task_runner.py:92} ERROR - Failed to execute job 9 for task split (maximum recursion depth exceeded while calling a Python object; 337)
    d
    n
    • 3
    • 10
  • d

    Downforu

    05/03/2022, 8:44 AM
    Thank you @Galileo-Galilei for the link to the discussion. The solution to expose a run ID in an environment variable for MLFlow is interesting. However, I presume that it does not work like this for Kedro's run_params["run_id"] variable. Or is it ok if I force that variable with the run_id generated for the first node in the pipeline ?
  • n

    noklam

    05/03/2022, 8:59 AM
    Is the problem because now each Node is run with different session and you want to have 1 ID for the entire Airflow dag?
  • d

    Downforu

    05/03/2022, 9:20 AM
    Yes, that's exactly the problem and what I want to achieve.
    n
    • 2
    • 8
  • f

    Flow

    05/15/2022, 7:25 PM
    In the course of using Airflow and Kedro I had a couple of ideas that I tried to outline in these two issues. Would be happy for some feedback from the community: https://github.com/kedro-org/kedro-plugins/issues/26 https://github.com/kedro-org/kedro-plugins/issues/27
  • n

    noklam

    05/16/2022, 8:21 AM
    Thank you for your contribution! I will take a closer look later, but for now I would say #26 should be good but for #27 it's a change that may required more time to study. Airflow itself evolves quite a bit and I know they also release some newer version which get rid of these operator class.
  • d

    Downforu

    05/17/2022, 8:57 AM
    kedro-airflow unique session_id
  • n

    noklam

    06/15/2022, 10:28 AM
    Hi Kedroids! There is an open issue on
    kedro-airflow
    that suggests adding a new Jinja template, and we would like to get some feedback from you! https://github.com/kedro-org/kedro-plugins/issues/27 There are many different operators/new APIs you can use with airflow, would love to learn from your experience. * What's the current best practice to run Python with Airflow? * Do you run into similar problems with the default?
  • d

    deepyaman

    06/16/2022, 4:25 AM
    No issues--just wanted to say that the little bit of
    kedro-mlflow
    that I've looked into tonight (mostly
    MlflowModelSaverDataSet
    ) is really well-designed, and I appreciate it! Was going to go down the route of wrapping `save_model`/`load_model` and supporting flavors and everything, and so glad a search while looking into it turned up this specific dataset implementation, since it's much better than would have done myself. @Galileo-Galilei 🙂
  • g

    Galileo-Galilei

    06/16/2022, 8:10 PM
    Glad you like it ! If you are interested in custom model to wrap an entire kedro pipeline (i.e. including pre and post processing, and artefacts other then the model like an encoder/a tokenizer), have a look at "kedro mlflow modelify" command and "pipeline_ml_factory" function. I'd love getting feedback on this and find out if people find it useful.
  • d

    datajoely

    06/20/2022, 9:07 AM
    ^ for anyone nosy https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow/io/models/mlflow_model_saver_dataset.py
  • a

    Anarpego

    09/07/2022, 7:15 PM
    Hello guys, I have this problem output when I run the command
    kedro kubeflow compile
    any idea? please
  • a

    Anarpego

    09/07/2022, 7:15 PM
    message has been deleted
  • m

    marrrcin

    09/08/2022, 11:05 AM
    The Kubeflow instance you're trying to reach is probably secured by some authorization mechanism (from the URL I believe it's a Google one). You need to authorize first.
    • 1
    • 1
  • a

    Anarpego

    09/08/2022, 2:18 PM
    Thank you very much, I'll try
  • d

    devintaylor03

    09/12/2022, 9:41 AM
    Hey all! I was hoping someone who had used kedro with Databricks and
    dbx
    could help out. When running a
    dbx execute
    job we cannot get the kedro logs to stream to console, this seems to be an issue with
    dbx
    rather than kedro but hoping someone on here has found a way around this! Link to issue for more context https://github.com/databrickslabs/dbx/issues/463. Thanks!
    y
    • 2
    • 2
  • g

    Goss

    09/12/2022, 7:01 PM
    I have an issue where a
    kedro docker build
    looks like it is running forever:
    Step 7/12 : RUN groupadd -f -g ${KEDRO_GID} kedro_group && useradd -d /home/kedro -s /bin/bash -g ${KEDRO_GID} -u ${KEDRO_UID} kedro
     ---> Running in eacc1a27e787
    But when I use
    docker ps -a
    I notice that the container that is supposedly hanging during that build step actually shows that it already completed:
    eacc1a27e787   c18ff7793d12                          "/bin/sh -c 'groupad…"   2 minutes ago    Exited (0) 2 minutes ago
    Any ideas what is going on here?
  • g

    Goss

    09/12/2022, 7:24 PM
    Actually, I figured out what is happening. That step is not hung. The build has moved on to some tar file operation that runs until it fills up the disk and then errors out:
    Step 7/12 : RUN groupadd -f -g ${KEDRO_GID} kedro_group && useradd -d /home/kedro -s /bin/bash -g ${KEDRO_GID} -u ${KEDRO_UID} kedro
     ---> Running in eacc1a27e787
    Error processing tar file(exit status 1): write /var/log/lastlog: no space left on device
    It's filling up over 300 GB of disk so something is not working right. Filing an issue on Github.
  • y

    Yetunde

    09/13/2022, 2:30 PM
    Hey all I was hoping someone who had
  • g

    Goss

    09/14/2022, 3:28 PM
    kedro-kubeflow question... is there any way to use a K8s PVC for the storage?
  • e

    em-pe

    09/15/2022, 7:39 AM
    Hi @Goss , check
    volume
    and
    extra_volumes
    sections in config https://kedro-kubeflow.readthedocs.io/en/0.7.0/source/02_installation/02_configuration.html
  • r

    Rjify

    09/15/2022, 9:54 PM
    Hello everyone, I am looking to setup a pipeline which involve kedro project, docker and databricks. I am curious to understand in what all different ways we can use these 3 tools . I know one approach where we can create a docker image of a kedro project and then run that docker image using a databricks cluster by authenticating to a container repository. It will be great if others can also chime in what all different approaches have worked and what did not work. Thanks.
  • g

    Goss

    09/19/2022, 4:37 PM
    Thanks! Looks like extra_volumes could work...
  • g

    Goss

    09/19/2022, 5:10 PM
    kedro-kubeflow question... how do I use k8s secret for minio? The data catalog docs show using a credentials.yml file but the kedro-kubeflow .dockeringore prevents this file from going into the image (because it's insecure). The kedro-kubeflow docs show an example using kfp's aws module but it's not clear if this is also supposed to work for minio. There already exists a minio instance in my namespace (kubeflow provides this by default) and I'd like to use it just for simplicity. The secret is named
    mlpipeline-minio-artifact
    with
    accesskey
    and
    secretkey
    . So how can I map through the credentials stored in this secret so that the data catalog can use them?
  • g

    Goss

    09/20/2022, 5:54 PM
    Do I take the lack of response to mean that it is not possible to use Minio with Kedro inside Kubeflow?
  • n

    noklam

    09/20/2022, 5:58 PM
    @Goss I am not familiar with Kubeflow, do you have to use image, is it possible to map a certain volume like Docker?
  • n

    noklam

    09/20/2022, 6:01 PM
    I think this is a generic problem that even if you are not using Kedro, how would you pass these credentials into a docker image and run it with Kubeflow? One way to go with it is using environment variables, AFIAK it's very similar to S3 and you can refer to their doc.
  • g

    Goss

    09/20/2022, 6:32 PM
    The issue is that with generic Kubeflow I could control how the secret stored in Kubernetes maps through with my S3 client--not so much with Kedro where it is mediated by kedro componentry. That said, it does look like the
    kfp.aws
    functionality can be used to access a Minio S3 endpoint and that
    kfp.aws.use_aws_secret
    map the components of the existing Minio secret through into typical AWS credential environment variables... so, I think this can actually work.
  • n

    noklam

    09/20/2022, 7:01 PM
    @Goss that make sense, in that case i think environment variables will work.
Powered by Linen
Title
n

noklam

09/20/2022, 7:01 PM
@Goss that make sense, in that case i think environment variables will work.
View count: 1