https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • f

    fanzipei

    12/19/2021, 3:36 AM
    Hi everyone, I am new to kedro. When I save pandas.CSVDataSet with save_args of compression: gzip defined in the catalog, it seems didn't work (it did not compressed at all and just saved as the plain text file). I tested df.to_csv('test.csv', compression='gzip') and it worked properly. Can anyone help? Thanks.
  • d

    datajoely

    12/19/2021, 10:03 AM
    Can you post your yaml example?
  • f

    fanzipei

    12/19/2021, 1:15 PM
    To reproduce my problem, please use the official iris-example and add this to catalog.yml.
    example_iris_data_gz:
      type: pandas.CSVDataSet
      filepath: data/02_intermediate/iris.csv.gz
      load_args:
        header: null
        compression: gzip
      save_args:
        index: null
        compression: gzip
    and add a node which load the example_iris_data and export the example_iris_data_gz. Here I added the new node
    def compression(df):
        return df
    and added it to the pipeline as:
    node(
                    compression,
                    'example_iris_data',
                    'example_iris_data_gz',
                    name='compression'
                )
    Then run
    kedro run --from-nodes='compression'
    There is a warning message as:
    C:\Users\fanzi\anaconda3\envs\kedro\lib\site-packages\pandas\io\common.py:609: RuntimeWarning: compression has no effect when passing a non-binary object as input.
      ioargs = _get_filepath_or_buffer(
    Finally get a iris.csv.gz file that is actually only a text file.
  • d

    datajoely

    12/19/2021, 7:00 PM
    Very interesting- that shouldn’t be happening I’ll get to the bottom of it tomorrow
  • a

    antony.milne

    12/20/2021, 10:29 AM
    I can't remember the details but this is due to some peculiarity with pandas and the way that it handles different sorts of arguments. Try using
    fs_args
    instead of `save_args`:
    fs_args:
        open_args_save:
          compression: gzip
  • a

    antony.milne

    12/20/2021, 10:29 AM
    What you're doing now will work in kedro 0.18 though 🙂
  • d

    datajoely

    12/20/2021, 10:46 AM
    Just confirmed what @User wrote above this works for me:
    yaml
    
    example_iris_data_gzip:
      type: pandas.CSVDataSet
      filepath: data/01_raw/iris.csv.gzip
      fs_args:
        open_args_save:
          compression: gzip
        open_args_load:
          compression: gzip
  • d

    datajoely

    12/20/2021, 10:46 AM
    not very user-friendly - and will be fixed in the next version of Kedro
  • f

    fanzipei

    12/20/2021, 11:46 AM
    This works for me. Thanks!
  • r

    RRoger

    12/21/2021, 4:42 AM
    I don't know what I did to my environment, but I'm now getting
    Class `pandas.SQLQueryDataSet` not found or one of its dependencieshas not been installed.
    In
    requirements.in
    I have
    kedro[pandas,pickle,yaml,json]==0.17.5
    which should install all pandas DataSets right? I do
    kedro build-reqs
    then
    pip install -r src\requirements.txt
    .
    d
    • 2
    • 17
  • d

    datajoely

    12/21/2021, 10:54 AM
    Installing Kedro requirements
  • d

    Daehyun Kim

    12/21/2021, 10:09 PM
    saving CachedDataSet in S3
    d
    • 2
    • 8
  • d

    Dhaval

    12/22/2021, 10:03 AM
    So, I'm using this file to use the same pipeline for train and test inputs. There's an error that pops up
    Error: Failed to map datasets and/or parameters: train
    I don't know what to do on this front
    d
    j
    • 3
    • 148
  • d

    datajoely

    12/22/2021, 11:28 AM
    Modular pipeline issue
  • r

    RRoger

    12/23/2021, 3:42 AM
    I followed the instructions in https://kedro.readthedocs.io/en/latest/09_development/02_set_up_pycharm.html#configuring-the-kedro-catalog-validation-schema. Anyone know why PyCharm is showing JSON schema error?
  • d

    datajoely

    12/23/2021, 12:01 PM
    Can you give any more background of that error?
  • d

    Daehyun Kim

    12/23/2021, 8:34 PM
    I'm following https://kedro.readthedocs.io/en/stable/10_deployment/11_airflow_astronomer.html tutorial. I don't see the content of localhost:8080 after executing
    astro dev start
    https://kedro.readthedocs.io/en/stable/10_deployment/11_airflow_astronomer.html#step-4-launch-the-local-airflow-cluster-with-astronomer
    (kedro) kepricon@kepricon-G732LXS:~/git/kedro_test/kedro-airflow-iris$ astro dev logs
    Error checking feature flag no context set, have you authenticated to a cluster
    Error checking feature flag no context set, have you authenticated to a cluster
    scheduler_1 | Waiting for host: 0.0.0.0 5432
    webserver_1 | Waiting for host: 0.0.0.0 5432
    and here's the log
    d
    • 2
    • 6
  • d

    Daehyun Kim

    12/23/2021, 8:35 PM
    I'm waiting for astronomer's review of my account. does it cause this problem? or do you see any other issue?
  • r

    RRoger

    12/23/2021, 8:55 PM
    I updated the original question to link to the "Confirguring the Kedro catalog validation schema" instructions. Not much more to add other than that there is no autocompletion/suggestions because PyCharm doesn't see https://github.com/quantumblacklabs/kedro/blob/main/static/jsonschema/kedro-catalog-0.17.json as a correct JSON schema.
  • d

    datajoely

    12/23/2021, 9:23 PM
    Astro Airflow issue
  • d

    Daehyun Kim

    12/28/2021, 11:44 PM
    Is there a quick and simple way to load pickle file that is saved from the pipeline as a
    pickle.PickleDataSet
    type using plain
    pickle
    module? for example, I have
    model_metrics.pickle
    that is PickleDataSet and how can I load it via
    pickle.load()
    ?
    d
    • 2
    • 6
  • d

    datajoely

    12/29/2021, 11:20 AM
    Loading pickles
  • j

    j c h a r l e s

    12/30/2021, 12:00 AM
    Hi kedro team, I have been doing development using the @develop branch of kedro (using python 3.9) and I tried to update my kedro today and am getting the following error: "ERROR: Package 'kedro' requires a different Python: 3.9.0 not in '=3.6'"
  • j

    j c h a r l e s

    12/30/2021, 12:00 AM
    My install command is
    pip install -U git+https://github.com/quantumblacklabs/kedro.git@b10bb69775e519598f4344ed1d2be5cc05a22533
  • j

    j c h a r l e s

    12/30/2021, 12:04 AM
    For context, I have been successfully developing with p ython 3.9+ using this branch
    git+https://github.com/quantumblacklabs/kedro.git@35e78cc5a5d7b64a034ce6561fc90ec579375569
  • j

    j c h a r l e s

    12/30/2021, 1:43 AM
    Another question: Is it possible to install kedro-viz using python 3.9? Some "develop" branch? Seems like pip install kedro-viz does not work for python 3.9? When I try to directly pip install via (pip install git+https://github.com/quantumblacklabs/kedro-viz.git@8da5a164637bfbd9c2f526b4f7a68f7a8a1114f2) I get this error:
    ERROR: git+https://github.com/quantumblacklabs/kedro-viz.git@8da5a164637bfbd9c2f526b4f7a68f7a8a1114f2 does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
  • j

    j c h a r l e s

    12/30/2021, 4:02 AM
    Are there any "kedro office hours"? Or are there places to ask questions that would be more useful for the team & community?
  • j

    j c h a r l e s

    12/30/2021, 9:56 AM
    I'm guessing that I am building a modular pipeline using an anti pattern. I generate a base set of entities and entity attributes in
    conf/base/parameters/<pipeline-name>.yml
    . I use a custom hook that iterates through this yml file and uses
    catalog.add_all(new_entries, replace=True)
    to generate all the downstream datasets. Then in each pipeline, rather than mapping inputs and outputs, I again iterate through this same set of entities in
    conf/base/parameters/<pipeline-name>.yml
    and use functools.partial in order to create the node
    func
    , freezing the input values to be values found in this parameters file.
  • j

    j c h a r l e s

    12/30/2021, 10:07 AM
    Creating pipelines with dynamic inputs has been slightly less intuitive than I expected, however I have not used tools like kedro before. Am hoping that this set-up is a one-time cost that I will not have to incur again. Also have noticed that the validation for input types for kedro itself gives cryptic errors. I think if you pass a list of pipeline objects by accident, it will give an error
    list object has no attribute filter
    , a potentially more useful error might be something that throws
    ValueError, expected a pipeline object and received list
    . Another error I found was when I tried to pass literal values to a node function, the error was something like
    cannot split
    , and could be improved to be
    ValueError: inputs are not allowed to contain literal values (like integers). Please use functools.partial to create a node function with the desired literal argument specified
    . These errors are very hard to debug because they are thrown deep in the kedro library code. Could save a lot of hassle by having better validation errors.
    d
    • 2
    • 57
  • d

    datajoely

    12/30/2021, 10:23 AM
    j charles questions
Powered by Linen
Title
d

datajoely

12/30/2021, 10:23 AM
j charles questions
View count: 1