778216384475693066 #beginners-need-help

Channels

advanced-need-help

job-posting

welcome

williamc

06/23/2022, 4:31 PM

Let's say I just cloned my kedro project repo to another machine, and its datasets are versioned and configured to use S3 for storage. If I try to run a pipeline that depends on those datasets I get the infamous

kedro.io.core.VersionNotFoundError

. Bucket has versions all the way up to

2022-06-07T22.04.39.460Z/

and the error says

2022-06-23T16.20.52.945Z

. Is this the intended behavior? Thanks

noklam

06/23/2022, 8:56 PM

How does the file look like before your changes?

noklam

06/23/2022, 8:58 PM

What is the command that you run? Did you specify any version? If not it should just grab whatever latest version you have in the S3 store

lancechua

06/24/2022, 1:02 PM

I'm trying to upgrade from v0.17.4 to v0.18.1, and when I try to do kedro run, it's still trying to look for kedro.versioning

antony.milne

06/24/2022, 4:20 PM

What is the error exactly? If it's something about

journals

I suspect it's because it's still mentioned in your logging.yml file. Easiest way to fix that is just to copy and paste this into your logging.yml file: https://github.com/kedro-org/kedro/blob/0.18.1/kedro/templates/project/%7B%7B%20cookiecutter.repo_name%20%7D%7D/conf/base/logging.yml

lancechua

06/25/2022, 2:47 AM

Yes, it was indeed about some journal logger. Will try that. Thanks!

inigohrey

06/26/2022, 6:49 PM

Hello. Where would I go looking to find where the params file is being read from disk? I'm having an issue with non-ASCII characters on Windows. I am saving my params.yml as UTF-8 encoding but python, taking locale.getpreferredencoding(), is attempting to read it using CP1252 which generates gibberish for characters like ñ, é etc. If I wanted to change the config of a YAMLDataSet I would change it like here: https://github.com/kedro-org/kedro/issues/772#issuecomment-847650332 but I don't know if there is a similar config for the parameters.yml file.

antony.milne

06/27/2022, 8:10 AM

params file encoding

williamc

06/27/2022, 5:01 PM

Sorry for the late response. I just tried

kedro run

, no version specified.

sjster

06/27/2022, 5:18 PM

Hello, running a Kedro pipeline results in my job getting killed. It looks it is running out of memory as it is trying to save the result of a node to a ParquetDataSet. My inputs are about 2G in size. Any solutions or suggestions?

datajoely

06/27/2022, 5:21 PM

So it would be great to get a stack trace to understand what's going on. Could you try wrapping your

ParquetDataSet

in a

PartitionedDataSet

would allow you to write smaller chunks and won't fail

sjster

06/27/2022, 5:33 PM

Will try that

noklam

06/27/2022, 5:35 PM

Kedro run with versioned dataset on S3 storage

sjster

06/27/2022, 5:51 PM

Are there any examples on wrapping a parquet dataset within a PartitionedDataSet?

datajoely

06/27/2022, 5:53 PM

https://kedro.readthedocs.io/en/stable/kedro.io.PartitionedDataSet.html

datajoely

06/27/2022, 5:53 PM

This has a csv example but its the same thing

sjster

06/27/2022, 6:14 PM

I get the following error now: AttributeError: 'Series' object has no attribute 'to_parquet'

datajoely

06/27/2022, 6:17 PM

That's because you're returning a series not a pandas dataframe from your node

sjster

06/27/2022, 6:21 PM

That is strange because printing out the resultant object tells me that it is a Data frame. 2022-06-27 14:11:10,487 - test_pandas_etl.pipelines.etl.nodes - INFO - Length of target is 7551152

datajoely

06/27/2022, 6:23 PM

If you post the node function syntax we can try and work it out, but double check that the object being returned is definitely a df

sjster

06/27/2022, 6:43 PM

Kedro memory error

sjster

06/28/2022, 6:17 PM

Hello, I get a segmentation fault when running faiss from within a Kedro node, however this does not happen when run as a standalone script

Copy code

2022-06-28 13:56:27,146 - kedro.pipeline.node - INFO - Running node: evaluate_mapping_in_embeddings_faiss: run_faiss([es_df,tu_df]) -> [res_faiss]
zsh: segmentation fault  kedro run --tag run_faiss

datajoely

06/28/2022, 6:18 PM

Segmentation fault has to be coming from a library that uses C underneath. All I can suggest is that you use a debugger and find out where it fails?

noklam

06/29/2022, 11:42 AM

Would add sometimes the IDE may contribute to that too, do u have it when running in a terminal?

sjster

06/29/2022, 2:51 PM

Turns out it was a conda environment issue after all

sjster

06/29/2022, 2:51 PM

Thanks to @datajoely for all of his help!

s.hedayati

06/30/2022, 11:15 AM

Hi, how can I call the given environment name in cli, within my script ( this script is a customized data catalog which later be used in project hooks) With many thanks in advance

datajoely

06/30/2022, 11:16 AM

Could you explain a little more, do you mean

kedro run --env=prod

s.hedayati

06/30/2022, 11:18 AM

Thanks for your prompt response. I want to use 'prod' name within this customized data catalogs script before it gets loaded

datajoely

06/30/2022, 11:19 AM

as in you want the env name in scope for the

TemplatedConfigLoader