https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • i

    Isaac89

    02/10/2022, 3:47 PM
    ok I will look into it. thanks for confirming that is working in your case.
  • i

    Isaac89

    02/10/2022, 3:48 PM
    yes
  • i

    Isaac89

    02/10/2022, 3:55 PM
    @User If I run it with the Sequential runner it is ok, but with the --parallel flag I have this problem, Is the parallel runner loading the project multiple times ?
  • c

    czix

    02/11/2022, 1:16 PM
    On the develop branch, the
    get_current_session()
    method is removed, how do I get the current session when running?
  • d

    datajoely

    02/11/2022, 1:30 PM
    What do you want from the session? The answer this question will likely be lifecycle hooks
    c
    • 2
    • 31
  • i

    Isaac89

    02/11/2022, 10:41 PM
    Hi! I'm running the same pipeline on different inputs and scheduling these runs with slurm. Sometimes I'm getting the same runid for different runs, maybe because they are started at the same time. Any suggestion on how to deal with this? Can one change the run id at runtime?
  • r

    RRoger

    02/12/2022, 4:56 AM
    In the data validation hooks example (https://kedro.readthedocs.io/en/stable/07_extend_kedro/02_hooks.html#add-data-validation), the
    DATASET_EXPECTATION_MAPPING
    is defined in the class itself:
    class DataValidationHooks:
    
        # Map expectation to dataset
        DATASET_EXPECTATION_MAPPING = {
            "companies": "raw_companies_dataset_expectation",
            "preprocessed_companies": "preprocessed_companies_dataset_expectation",
        }
        ...
    Is it possible to define this in the parameters yml?
    before_node_run
    and
    after_node_run
    doesn't seem to pass in the
    context
    .
  • d

    datajoely

    02/13/2022, 5:00 PM
    @User you should be able to retrieve the parameters from the
    catalog
    object
  • m

    mjmare

    02/15/2022, 11:15 AM
    How does one go about when one has a bunch of dataset that need the same treatment? Currently I use a template in catalog.yml to create the input and output Datasets, like so:
    {% for table in openac_tables %}
    {{ table }}:
      layer: primary
      type: pandas.ParquetDataSet
      filepath: data/03_primary/{{table}}.parquet
      save_args:
        from_pandas:
          preserve_index: False
    {% endfor %}
    
    {% for table in openac_tables %}
    profile_{{ table }}:
      layer: qa
      type: ac_pipelines.datasets.ProfilingDataSet
      filepath: data/08_reporting/profiles/{{table}}.html
    {% endfor %}
    and then generate nodes in the pipeline:
    def create_pipeline(**kwargs):
        from kedro.config import ConfigLoader
    
        conf_paths = ["conf/base", "conf/local"]
        conf_loader = ConfigLoader(conf_paths)
        table_names = conf_loader.get('*globals.yml')['openac_tables']
    
        return Pipeline([
            node(func=lambda x: x,
                 inputs=tn,
                 outputs=f'profile_{tn}',
                 name=f'profile_{tn}',
                 )
            for tn in table_names
        ])
    It works. But it feels hacky. It could be improved if I could get the default config_loader from somewhere. I had some success with:
    from kedro.framework.session import get_current_session
    
        session = get_current_session()
        context = session.load_context()
        table_names = context.config_loader.get('*globals.yml')['openac_tables']
    but that confuses Kedro viz (Error: There is no active Kedro session.) More substantial improvement would be if the Pipeline/Node could be dynamically parametrized (at runtime). Don't know if that is the right term. I want to feed a variable number of Datasets to a pipeline )or node). I'm probably doing something wrong, so suggestions are welcome.
    d
    i
    +2
    • 5
    • 95
  • i

    Isaac89

    02/15/2022, 12:28 PM
    Hi @User ! I also tried to do something like you did and solved the problem with kedro viz in this way
    try:
            session = get_current_session()
    
        except RuntimeError:
            session = KedroSession.create(package_name=package_name, project_path=package_path)
    but I don't know wether this is the best way to achieve it
  • d

    datajoely

    02/15/2022, 12:29 PM
    Dynamic pipelines
  • u

    user

    02/16/2022, 1:50 PM
    Hi all, I'm trying to install kedro using
    pip install kedro
    but I get the following error:
    ERROR: Could not find a version that satisfies the requirement kedro (from versions: none)
    ERROR: No matching distribution found for kedro
    Does anyone know what's causing this?
  • d

    datajoely

    02/16/2022, 1:52 PM
    Are you in a corporate environment which may have firewalls etc?
  • u

    user

    02/16/2022, 1:53 PM
    ahh, that's probably the issue
  • u

    user

    02/16/2022, 1:53 PM
    Never had that problem when installing something
  • u

    user

    02/16/2022, 1:54 PM
    The team wants to adopt kedro as a standard. Are there any known workarounds?
  • d

    datajoely

    02/16/2022, 2:00 PM
    So two approaches (1) Lobby your IT team to add Kedro to their JFrog like mirror (If needed we can provide some info about how Kedro goes through synk/sonarcube scans) (2) Donwnload the tar file from PyPI (where pip gets it from) and just run
    pip install kedro-0.17.6.tar.gz
    locally https://pypi.org/project/kedro/#files
  • d

    datajoely

    02/16/2022, 2:00 PM
    2 isn't the most scalable but may be the path of least resistance
  • u

    user

    02/16/2022, 2:01 PM
    Awesome, thank you. I'll start with #2 and if things go well and we want to scale it to the whole team I'll do #1. Cheers!
  • d

    datajoely

    02/16/2022, 2:03 PM
    đŸ’Ș nice one - shout if you need any materials to make enterprise Infosec teams happy
  • d

    datajoely

    02/16/2022, 2:04 PM
    https://snyk.io/advisor/python/kedro
  • u

    user

    02/16/2022, 2:04 PM
    It might be a different issue. I tried installing the tar file and I got a version error. I'm on python 3.9.10
  • d

    datajoely

    02/16/2022, 2:04 PM
    ah sorry
  • d

    datajoely

    02/16/2022, 2:04 PM
    you need to be 3.8 or lower
  • d

    datajoely

    02/16/2022, 2:04 PM
    for like 1 month more
  • u

    user

    02/16/2022, 2:05 PM
    Okayy gotcha. I might revisit this in another month then! It'll be more of a pain to downgrade
  • d

    datajoely

    02/16/2022, 2:06 PM
    You can follow these instructions https://github.com/kedro-org/kedro/discussions/1117#discussioncomment-1822667
  • d

    datajoely

    02/16/2022, 2:06 PM
    the only thing that will break will be
    pandas.ExcelDataSet
    requiring
    openpyxl
    not
    xlrd
  • u

    user

    02/16/2022, 2:07 PM
    Ha, the funny thing is I use that for another project but I don't think I'll need it for this one. I'll follow those instructions
  • u

    user

    02/16/2022, 2:09 PM
    That worked! Thank you so much. Time to test it out!
Powered by Linen
Title
u

user

02/16/2022, 2:09 PM
That worked! Thank you so much. Time to test it out!
View count: 1