https://kedro.org/ logo
Join the conversationJoin Discord
Channels
advanced-need-help
announcements
beginners-need-help
introductions
job-posting
plugins-integrations
random
resources
welcome
Powered by Linen
beginners-need-help
  • n

    noklam

    06/27/2022, 5:35 PM
    Kedro run with versioned dataset on S3 storage
  • s

    sjster

    06/27/2022, 5:51 PM
    Are there any examples on wrapping a parquet dataset within a PartitionedDataSet?
  • d

    datajoely

    06/27/2022, 5:53 PM
    https://kedro.readthedocs.io/en/stable/kedro.io.PartitionedDataSet.html
  • d

    datajoely

    06/27/2022, 5:53 PM
    This has a csv example but its the same thing
  • s

    sjster

    06/27/2022, 6:14 PM
    I get the following error now: AttributeError: 'Series' object has no attribute 'to_parquet'
  • d

    datajoely

    06/27/2022, 6:17 PM
    That's because you're returning a series not a pandas dataframe from your node
  • s

    sjster

    06/27/2022, 6:21 PM
    That is strange because printing out the resultant object tells me that it is a Data frame. 2022-06-27 14:11:10,487 - test_pandas_etl.pipelines.etl.nodes - INFO - Length of target is 7551152
  • d

    datajoely

    06/27/2022, 6:23 PM
    If you post the node function syntax we can try and work it out, but double check that the object being returned is definitely a df
    s
    • 2
    • 18
  • s

    sjster

    06/27/2022, 6:43 PM
    Kedro memory error
  • s

    sjster

    06/28/2022, 6:17 PM
    Hello, I get a segmentation fault when running faiss from within a Kedro node, however this does not happen when run as a standalone script
    2022-06-28 13:56:27,146 - kedro.pipeline.node - INFO - Running node: evaluate_mapping_in_embeddings_faiss: run_faiss([es_df,tu_df]) -> [res_faiss]
    zsh: segmentation fault  kedro run --tag run_faiss
  • d

    datajoely

    06/28/2022, 6:18 PM
    Segmentation fault has to be coming from a library that uses C underneath. All I can suggest is that you use a debugger and find out where it fails?
  • n

    noklam

    06/29/2022, 11:42 AM
    Would add sometimes the IDE may contribute to that too, do u have it when running in a terminal?
  • s

    sjster

    06/29/2022, 2:51 PM
    Turns out it was a conda environment issue after all
  • s

    sjster

    06/29/2022, 2:51 PM
    Thanks to @datajoely for all of his help!
  • s

    s.hedayati

    06/30/2022, 11:15 AM
    Hi, how can I call the given environment name in cli, within my script ( this script is a customized data catalog which later be used in project hooks) With many thanks in advance
  • d

    datajoely

    06/30/2022, 11:16 AM
    Could you explain a little more, do you mean
    kedro run --env=prod
    ?
  • s

    s.hedayati

    06/30/2022, 11:18 AM
    Thanks for your prompt response. I want to use 'prod' name within this customized data catalogs script before it gets loaded
  • d

    datajoely

    06/30/2022, 11:19 AM
    as in you want the env name in scope for the
    TemplatedConfigLoader
    ?
  • s

    s.hedayati

    06/30/2022, 11:20 AM
    I am a little new to kedro. Could you please post the code that can do that?
    d
    n
    • 3
    • 43
  • u

    user

    07/01/2022, 3:42 PM
    Hello everyone, I'm in the early stage of my Kedro understanding and I am wondering what the best practice is for visualising data? Typically when using notebooks I'll use packages like matplotlib and seaborn but I'm not entirely sure how they fit into the Kedro workflow? Any advice would be appreciated! Thank you, Lawrence
    a
    • 1
    • 2
  • a

    antony.milne

    07/01/2022, 5:03 PM
    Data viz in kedro
  • g

    gui42

    07/01/2022, 6:21 PM
    Folks, quick question. Is there a way to use a parameter from the yml files in order to build a pipeline object? For example if
    do_step_a=True
    in the yaml add the
    step_a
    node to a pipeline object? My intuition says that this is a counter pattern.
  • a

    antony.milne

    07/01/2022, 9:26 PM
    This is indeed a bit of an anti-pattern, although other people have done similarly. The best way to do it would be using an
    after_context_created
    hook like this: https://github.com/kedro-org/kedro/discussions/1436#discussioncomment-2789761 If it suits your use case, better would be to register two pipelines in `pipeline_registry.py`:
    pipeline_1 = pipeline(...)
    pipeline_2 = pipeline_1 + [node_a]
    and then run the pipeline you want through
    kedro run -p
    Another alternative is to keep the node in the pipeline but just change the logic inside the node function itself to skip using something like on
    if not do_step_a: return
  • y

    youmaaz

    07/04/2022, 8:23 PM
    Hello guys, i'm already facing a problem on using Kedro through Pycharm and SSH connexion to cloudera plateform. I create a project with kedro 0.15.9 on Cloudera Plateform and using spark 2.4 and it works fine on the cloudera plateform (Cloudera Data Science Workbench). Now i'm trying to use the project but with using Pycharm and connecting from ssh tunnel to the CDSW. Now i'm facing a problem when a i run the kedro project and this is the error that i get: 2022-07-04 20:02:03,165 - py4j.java_gateway - ERROR - An error occurred while trying to connect to the Java server (127.0.0.1:35327) Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/py4j/java_gateway.py", line 958, in _get_connection connection = self.deque.pop() IndexError: pop from an empty deque During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/py4j/java_gateway.py", line 1096, in start self.socket.connect((self.address, self.port)) ConnectionRefusedError: [Errno 111] Connection refused
  • d

    datajoely

    07/04/2022, 8:32 PM
    So this is a spark configuration issue, you'll have to play around with spark.yml or environment vars to get it working. Additionally we don't see many 0.15.x projects anymore! So we would encourage you to upgrade
  • y

    youmaaz

    07/04/2022, 8:35 PM
    I thought also it a configuration problem. I have a restriction for the spark version so the only version of kedro that is working with spark 2.4 is the 0.15
  • d

    datajoely

    07/04/2022, 8:35 PM
    Ah gotcha
  • y

    youmaaz

    07/05/2022, 11:43 AM
    It was a memory size of executor problem by increasing it the code work perfectly
  • y

    youmaaz

    07/06/2022, 7:35 AM
    Hello guys, I have a question it's possible to save the output of a pipeline at same time in pandas and spark data frame ? Considering that the input of the pipeline is a spark frame.
  • d

    datajoely

    07/06/2022, 7:43 AM
    Read about transcoding here! https://kedro.readthedocs.io/en/stable/data/data_catalog.html#transcoding-datasets
Powered by Linen
Title
d

datajoely

07/06/2022, 7:43 AM
Read about transcoding here! https://kedro.readthedocs.io/en/stable/data/data_catalog.html#transcoding-datasets
View count: 2