https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • y

    youmaaz

    07/06/2022, 7:48 AM
    Thank you ! forgot to precise it's version 0.15.9 😅 but i saw this was possible on latest version
  • d

    datajoely

    07/06/2022, 7:50 AM
    Here is the 0.15.9 version of the docs https://kedro.readthedocs.io/en/0.15.9/04_user_guide/04_data_catalog.html#transcoding-datasets
  • a

    adrian

    07/06/2022, 8:33 PM
    Hello, I would like to know how I may access the catalog of my project from within a Python script. Providing more context: In a Jupyter notebook, if I use the appropriate kedro magic, I can type catalog.load('foo') to load the dataset named 'foo' which is registered in my project's catalog. I would like to do this in a standalone Python script (a .py file).
  • n

    noklam

    07/06/2022, 8:56 PM
    If you need to do this inside a kedro pipeline, check out the hooks for after_catalog_created or after_context_created
  • a

    adrian

    07/06/2022, 8:57 PM
    Thanks. I was hoping to do it outside of a kedro pipeline
    n
    • 2
    • 6
  • k

    kradja

    07/11/2022, 1:17 AM
    s
  • w

    wise009

    07/11/2022, 11:15 AM
    Error when save using pandas.CSVDataset
    n
    • 2
    • 11
  • a

    adrian

    07/11/2022, 4:06 PM
    Greetings, I have the line app: abcdefgh In the conf/local/credentials.yml file of my kedro project. How do I pass the string 'abcdefgh' as an argument to a node from a modular pipeline? I tried setting the 'inputs' kwarg to ['credentials:app'] but kedro tries to find this in the data catalog.
  • g

    Goss

    07/11/2022, 4:47 PM
    I'm following along the experiment tracking example but after running the pipeline, kedro viz says "You don't have any experiments" and there is no
    session_store.db
    There are, however, metrics-related files under
    09_tracking
    . Did I possibly misconfigure something? Also,
    $ cat src/settings.py
    from kedro_viz.integrations.kedro.sqlite_store import SQLiteStore
    from pathlib import Path
    
    SESSION_STORE_CLASS = SQLiteStore
    SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")}
  • n

    noklam

    07/11/2022, 4:51 PM
    Credentials should be declared as an attribute of dataset in catalog.yml, is there a a specific reason why you want to pass credentials directly as an argument?
  • n

    noklam

    07/11/2022, 4:51 PM
    In general your node shouldn't be aware of the I/O
  • n

    noklam

    07/11/2022, 4:53 PM
    Did you find your sessionstore.db? It should be a file stored locally
  • a

    adrian

    07/11/2022, 4:55 PM
    My node makes an API call, and the API is its own processing step. The credential I wish to access within the node is the API key. Thus, my credential is not about accessing data per se. Do you still recommend I add the credential to one of the input or output datasets?
  • g

    Goss

    07/11/2022, 5:05 PM
    No db files are present. I look specifically for
    ./data/session_store.db
    and did a general find for any *.db files. Nothing was found. In case version info helps:
    kedro                         0.18.1
    kedro-telemetry               0.2.1
    kedro-viz                     4.7.1
  • n

    noklam

    07/11/2022, 5:12 PM
    Does the APIDataSet fit your usecase? Or you could have a custom api dataset that wrap around your logic
  • a

    adrian

    07/11/2022, 5:21 PM
    Thanks! I didn't know about this dataset type. It doesn't fit my use case directly because I actually make a series of API calls within my node, and running the node alone takes a lot of time. I'll think of a way to wrap around my logic as you suggested.
  • k

    kradja

    07/11/2022, 8:20 PM
    I have two pipelines named data_processing and edge_processing. I have one output that I want to use in edge_processing but the pipeline automatically looks for the output as edge_processing.file. Even though I specified the file as an output of data_processing. Is there anyway around this?
  • n

    noklam

    07/11/2022, 9:14 PM
    What do you mean by specified the file as an output of data_processing? What's the command that you used?
  • k

    kradja

    07/12/2022, 2:32 AM
    When I run kedro with
    kedro run
    my first pipeline data_processing has a node which outputs prokka_bins. This data is one of the outputs of the pipeline data_processing.
    namespace="data_processing", 
    inputs=["partition_prokka_faa", "partition_prokka_gff"], 
    outputs=["hypo_prot","prokka_bins"]
    I was wondering how I can access the data from my new pipeline
    edge_processing
    inputs="prokka_bins"
    when this code is within a node in edge_processing, I get the error.
    ValueError: Pipeline input(s) {'edge_processing.prokka_bins'} not found in the DataCatalog
    Other than essentially duplicating an entry in my DataCatalog with
    edge_processing.prokka_bins
    which makes things more messy, is there any way I can access this data in my second pipeline?
  • d

    datajoely

    07/12/2022, 6:05 AM
    This feels like a namespace issue
  • s

    Simon W

    07/12/2022, 1:42 PM
    Hello, are there any publications/papers describing the core architecture of kedro and the main principles? And If so, where can I find them?
  • n

    noklam

    07/12/2022, 1:43 PM
    https://kedro.readthedocs.io/en/stable/faq/architecture_overview.html https://kedro.readthedocs.io/en/stable/faq/kedro_principles.html I don't think we have a sophisticated paper for that, these docs may covers part of it.
  • n

    noklam

    07/12/2022, 1:45 PM
    If you are new to kedro, I suggest just try going through the tutorials and come back to the individual component
  • g

    Goss

    07/12/2022, 6:39 PM
    @noklam Case solved. Documentation bug: https://kedro.readthedocs.io/en/stable/tutorial/set_up_experiment_tracking.html#set-up-the-session-store ...says to modify
    ./src/settings.py
    but it should be
    ./src/{project name}/settings.py
  • n

    noklam

    07/12/2022, 6:44 PM
    Nice catch! The settings file should be generated from the template though, are you migrating from some older project?
  • n

    noklam

    07/12/2022, 6:45 PM
    Otherwise I can take care of it.
  • g

    Goss

    07/12/2022, 6:52 PM
    No, this was starting fresh. I just started editing the file name. Also didn't help that the addition looks like a complete file (imports and all) so it seemed less surprising that it was an empty file.
  • n

    noklam

    07/12/2022, 6:58 PM
    Sorry about the confusion, glad you have found out the source of error at the end.
  • s

    Spikeyhog

    07/13/2022, 11:11 AM
    Hi all, I have created a namespace modular pipeline and specified some outputs which are consumed by other pipelines/nodes. My problem is that this output is also used within the namespace pipeline as an input to other internal nodes. As a result, when I look at kedro-viz, the output dataset is disjointed and not showing as an output of the namespace pipeline. Is this a bug, feature or am I doing something wrong here? Images attached illustrate my point.
  • d

    datajoely

    07/13/2022, 12:29 PM
    So I think you can fix this with nested namespaces. I.e. put everything in one namespace, then your two sub parts in namespace.one and namespace.two
Powered by Linen
Title
d

datajoely

07/13/2022, 12:29 PM
So I think you can fix this with nested namespaces. I.e. put everything in one namespace, then your two sub parts in namespace.one and namespace.two
View count: 1