https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • d

    datajoely

    01/19/2022, 10:10 PM
    oh it looks like it fails at the point the catalog is being created - are you using any custom datasets?
  • d

    datajoely

    01/19/2022, 10:12 PM
    (it is getting quite late here in London, so I'll probably pick up your response in the morning)
  • r

    Rroger

    01/19/2022, 10:38 PM
    ParallelRunner error
  • m

    martinlarsalbert

    01/20/2022, 9:34 AM
    I see that the anyconfig.load also has the parameter
    ac_context=globals
    which would expose the globals to the jinja2, but it also seams that the globals.yml and catalog.yml is loaded together in arbitrary order so that the globals are not known at the time of the jinja2 rendering. I suspect (as mentioned) that
    TemplatedConfigLoader
    needs a major overhaul to change this
  • m

    martinlarsalbert

    01/20/2022, 9:47 AM
    This was my hack to expose the globals to the jinja2:``` class TemplatedConfigLoaderGlobals(TemplatedConfigLoader): def _load_config_file(self, config_file: Path) -> Dict[str, Any]: """Load an individual config file using
    anyconfig
    as a backend. Args: config_file: Path to a config file to process. Returns: Parsed configuration. """ # for performance reasons import anyconfig # pylint: disable=import-outside-toplevel if "globals" in str(config_file): globals = {} else: globals_pattern = "*globals.yml" globals = self.get(globals_pattern) if globals_pattern else {} return { k: v for k, v in anyconfig.load( config_file, ac_template=True, ac_context=globals ).items() if not k.startswith("_") } ```
  • d

    datajoely

    01/20/2022, 10:35 AM
    @User very nice is it working?
  • m

    martinlarsalbert

    01/20/2022, 12:17 PM
    The globals are not exposed to configfiles that defines globals as that gave an infinite recursion which is why I added
    if "globals" in str(config_file):
    This is however not folowing the globals_pattern so it is a bit of a hack, but it works for my project at least
  • d

    datajoely

    01/20/2022, 12:17 PM
    Really nice work - I'm making a note on how we support this natively
  • d

    datajoely

    01/20/2022, 12:18 PM
    If I have my way we'd use Jsonnet instead of Jinja https://jsonnet.org/
  • d

    datajoely

    01/20/2022, 12:18 PM
    Something about using Jinja with a whitespaced language like YAML gives me the ick
  • u

    user

    01/23/2022, 8:43 AM
    Hi everyone. I am new at this community. So that I don't know this feature. Is Kedro support Apache Hive?
  • j

    jaweiss2305

    01/23/2022, 11:06 AM
    Here are the out-of-the-box datasets: https://kedro.readthedocs.io/en/latest/kedro.extras.datasets.html Including: kedro.extras.datasets.spark.SparkHiveDataSet
  • d

    datajoely

    01/23/2022, 12:57 PM
    Thanks @User - yes there is Hive support via the
    spark.SparkHiveDataSet
  • u

    user

    01/23/2022, 3:16 PM
    Thanks @User and @User ๐Ÿ™‚ Nice to meet you
  • l

    lucas.scholl

    01/23/2022, 9:26 PM
    Hello team!! Is there any way we could integrate it with DBT framework?
  • d

    datajoely

    01/23/2022, 9:44 PM
    High Iโ€™ve been putting some thought into it - today itโ€™s simply a matter of declaring Kedro datasets for the dbt models you want to read from. A question for you - how would you want it to work?
  • c

    ChainYo

    01/25/2022, 3:10 PM
    Is there a way to prioritise modular pipelines ? fetching data then preprocessing data is the right order, but the project try to convert the model before the training pipeline. Maybe it's linked to the fact training pipeline and converting pipeline have
    None
    as outputs ?
  • d

    datajoely

    01/25/2022, 3:12 PM
    So you can either make them explicit dependencies (easiest) or get fancy with you CLI commands
    kedro run --pipeline preprocessing && kedro run --model_train
    will run sequentially, or orchestrate these explicitly in a higher level tool like Airflow/Prefect etc
  • c

    ChainYo

    01/25/2022, 3:13 PM
    explicit dependencies
    are done with kedro directly in
    pipeline_registry
    ?
  • c

    ChainYo

    01/25/2022, 3:16 PM
    Because I read this
    The order in which you add the pipelines together is not significant and data_science_pipeline + data_processing_pipeline will result in the same pipeline, since Kedro automatically detects the correct execution order for all the nodes in the resulting pipeline.
  • c

    ChainYo

    01/25/2022, 3:16 PM
    in the spaceflights tutorial, but I don't know how it's done actually ๐Ÿ™‚
  • d

    datajoely

    01/25/2022, 3:18 PM
    Ah what that means is that Kedro works out the execution order via the node inputs and outputs.
  • d

    datajoely

    01/25/2022, 3:18 PM
    By explicit dependencies I mean you make sure you use a dataset outputted from pipeline 1 in pipeline 2
  • d

    datajoely

    01/25/2022, 3:18 PM
    thats the way Kedro knows how to make it happen in the right order
  • d

    datajoely

    01/25/2022, 3:18 PM
    it's called topological sorting
  • c

    ChainYo

    01/25/2022, 3:20 PM
    Ok but in my case the 1st pipeline has no output because pytorch lightning handles alone the model checkpoint save, and the 2nd pipeline has also no output because onnx handles the model conversion alone. They both use a variable written in
    parameters
    , Maybe they could output something useful.
  • c

    ChainYo

    01/25/2022, 3:21 PM
    I mean if I add a parameter as output that is required (something like
    training_done=True
    ) in the second pipeline, that could work ?
  • d

    datajoely

    01/25/2022, 3:26 PM
    It absolutely would, it doesn't technically have to be a meaningful connection!
  • c

    ChainYo

    01/25/2022, 3:49 PM
    it worked fine
  • c

    ChainYo

    01/25/2022, 3:49 PM
    message has been deleted
Powered by Linen
Title
c

ChainYo

01/25/2022, 3:49 PM
message has been deleted
View count: 1