Greetings I m facing a very similar issue to this one that h Kedro #beginners-need-help

Greetings ! I'm facing a very similar issue to th...

Bertozzo

08/11/2021, 5:14 PM

Greetings ! I'm facing a very similar issue to this one that happened in an older version https://github.com/quantumblacklabs/kedro/issues/291 Basically I can't save my dataset due an encoding error

datajoely

08/11/2021, 5:15 PM

Hi @User

datajoely

08/11/2021, 5:15 PM

have you tried this in your YAML entry?

Copy code

fs_args:
    open_args_load:
        mode: "rb"

datajoely

08/11/2021, 5:17 PM

In general when this comes up this sort of approach tends to work

Copy code

yaml
my_dataset:
  type: pandas.CSVDataSet
  filepath: xxxxx.csv
  fs_args:
    open_args_load:
      mode: "rb"
      encoding: "utf-8"
    open_args_save:
      mode: "w"
      encoding: "utf-8"

Bertozzo

08/11/2021, 5:17 PM

Hey there ! yes, that was my first attempt

datajoely

08/11/2021, 5:17 PM

Could you post your stack trace

Bertozzo

08/11/2021, 5:17 PM

i'll try this one

Bertozzo

08/11/2021, 5:19 PM

now i've a different output

datajoely

08/11/2021, 5:19 PM

Progress!

datajoely

08/11/2021, 5:19 PM

can you post it here

Bertozzo

08/11/2021, 5:20 PM

File "yaml\_yaml.pyx", line 707, in yaml._yaml.CParser.get_single_node File "yaml\_yaml.pyx", line 725, in yaml._yaml.CParser._compose_document File "yaml\_yaml.pyx", line 776, in yaml._yaml.CParser._compose_node File "yaml\_yaml.pyx", line 890, in yaml._yaml.CParser._compose_mapping_node File "yaml\_yaml.pyx", line 776, in yaml._yaml.CParser._compose_node File "yaml\_yaml.pyx", line 892, in yaml._yaml.CParser._compose_mapping_node File "yaml\_yaml.pyx", line 905, in yaml._yaml.CParser._parse_next_event yaml.scanner.ScannerError: mapping values are not allowed in this context in "C:\Users\lbertozz\Downloads\aut-ia-avaliador-de-materias\conf\base\catalog.yml", line 16, column 11

datajoely

08/11/2021, 5:20 PM

okay thats just a bad YAML file

datajoely

08/11/2021, 5:20 PM

there will be a bad indent somewhere

datajoely

08/11/2021, 5:20 PM

line 16 to be exact

Bertozzo

08/11/2021, 5:21 PM

mm, thats interesting

Bertozzo

08/11/2021, 5:21 PM

and where do i find this file ?

Bertozzo

08/11/2021, 5:22 PM

i only know where my xlxs and csv are

datajoely

08/11/2021, 5:22 PM

it's you catalog file

C:\Users\lbertozz\Downloads\aut-ia-avaliador-de-materias\conf\base\catalog.yml

datajoely

08/11/2021, 5:22 PM

line 16

Bertozzo

08/11/2021, 5:23 PM

ok, that was just an ident error 😅

Bertozzo

08/11/2021, 5:24 PM

now i am back to the original one, charmap encode etc etc

datajoely

08/11/2021, 5:24 PM

okay interesting

datajoely

08/11/2021, 5:24 PM

Can you post your YAML entry here?

datajoely

08/11/2021, 5:24 PM

for that dataset

Bertozzo

08/11/2021, 5:25 PM

one moment, i guess i found it

datajoely

08/11/2021, 5:25 PM

no problem

Bertozzo

08/11/2021, 5:25 PM

i need to put this in all csvs right ?

Bertozzo

08/11/2021, 5:26 PM

so i can read them all correctly

datajoely

08/11/2021, 5:26 PM

There is a way to re-use the same pattern over and over

datajoely

08/11/2021, 5:26 PM

but let's get it working for one

datajoely

08/11/2021, 5:27 PM

Copy code

yaml
my_dataset:
  type: pandas.CSVDataSet
  filepath: xxxxx.csv
  load_args:
     on_bad_lines: skip 
     encoding: 'utf-8'

datajoely

08/11/2021, 5:28 PM

the other option is to try different encodings like

utf-8-sig

and

utf-16

Bertozzo

08/11/2021, 5:28 PM

i did the same command to all df, an looks like it advanced, but now its saying that saving none to a dataset is not allowed

Bertozzo

08/11/2021, 5:29 PM

which is pretty logical tbh

datajoely

08/11/2021, 5:29 PM

which command are you talking about?

Bertozzo

08/11/2021, 5:29 PM

this one

datajoely

08/11/2021, 5:29 PM

You will get the

None

error if your

node

returns nothing

datajoely

08/11/2021, 5:30 PM

ah understood

datajoely

08/11/2021, 5:30 PM

So I'm a little confused, did we get it working?

datajoely

08/11/2021, 5:30 PM

or do you still need help

Bertozzo

08/11/2021, 5:31 PM

File "c:\users\lbertozz\appdata\local\programs\python\python37\lib\site-packages\kedro\io\core.py", line 232, in save raise DataSetError("Saving

None

to a

DataSet

is not allowed") kedro.io.core.DataSetError: Saving

None

to a

DataSet

is not allowed

Bertozzo

08/11/2021, 5:31 PM

thats was the last message

datajoely

08/11/2021, 5:31 PM

Okay so can you show me the

Node

you use to process the dataset

datajoely

08/11/2021, 5:32 PM

because it works by Catalog ->

load()

Node(Python function)

save()

-> Catalog

datajoely

08/11/2021, 5:33 PM

If you're getting the

None

error it means your node is returning a None value

datajoely

08/11/2021, 5:33 PM

not a DataFrame

Bertozzo

08/11/2021, 5:33 PM

whats the most weird of all

Bertozzo

08/11/2021, 5:34 PM

is that all my colleagues that use ubuntu can run it

Bertozzo

08/11/2021, 5:34 PM

without any issues

datajoely

08/11/2021, 5:34 PM

hmm

Bertozzo

08/11/2021, 5:34 PM

they dont even need to put the encoding parameters

datajoely

08/11/2021, 5:34 PM

maybe its your Python environment

datajoely

08/11/2021, 5:34 PM

are you running in a virtual env?

Bertozzo

08/11/2021, 5:34 PM

nope

datajoely

08/11/2021, 5:35 PM

so I can't guarantee that will fix things

datajoely

08/11/2021, 5:35 PM

but it sometimes makes things easier not having to worry about multiple Python versions, multiple version of Kedro etc

Bertozzo

08/11/2021, 5:35 PM

thats what i've done, just installed the 37, set as a global variable on my system

Bertozzo

08/11/2021, 5:36 PM

and then as a prymary interpreter on vscode

datajoely

08/11/2021, 5:36 PM

And we're sure it's the same version of the code?

datajoely

08/11/2021, 5:36 PM

this is super weird

datajoely

08/11/2021, 5:36 PM

Can you post a screenshot of the catalog entry and the python node

datajoely

08/11/2021, 5:37 PM

I can't really work it out without seeing them

Bertozzo

08/11/2021, 5:37 PM

one moment pls

Bertozzo

08/11/2021, 5:40 PM

can i call u pls ? and then share the screen ?

datajoely

08/11/2021, 5:41 PM

I can't tonight (I'm also on calls 🤦‍♂️)

datajoely

08/11/2021, 5:41 PM

I could book some time tomorrow?

datajoely

08/11/2021, 5:41 PM

or do it async on here

Bertozzo

08/11/2021, 5:42 PM

Bertozzo

08/11/2021, 5:42 PM

ok, lets start over

datajoely

08/11/2021, 5:43 PM

Would you like to book some time in tomorrow?

datajoely

08/11/2021, 5:43 PM

as it's getting towards end of day here in London

Bertozzo

08/11/2021, 5:44 PM

thats my screen rn

datajoely

08/11/2021, 5:44 PM

okay and I need to see the

.py

file that has the nodes

datajoely

08/11/2021, 5:44 PM

if you scroll a bit further up on the terminal it should tell you the name of the node

Bertozzo

08/11/2021, 5:46 PM

like this ?

Bertozzo

08/11/2021, 5:46 PM

021-08-11 14:43:12,636 - kedro.runner.sequential_runner - WARNING - There are 2 nodes that have not run. You can resume the pipeline run by adding the following argument to your previous command: --from-nodes "download_files,files_to_text" Traceback (most recent call last): File "c:\users\lbertozz\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec)

datajoely

08/11/2021, 5:46 PM

Even further please

datajoely

08/11/2021, 5:47 PM

you can right click the terminal and select all -> copy

datajoely

08/11/2021, 5:50 PM

That's very helpful thank you

datajoely

08/11/2021, 5:50 PM

so let me explain how to look at the logs as it explains whats going on

datajoely

08/11/2021, 5:51 PM

The node you need to look at is called

full_data_clean

datajoely

08/11/2021, 5:51 PM

and that will be in a

.py

file somewhere in your project folder

datajoely

08/11/2021, 5:51 PM

It looks like there is a bad return somewhere in there where a

None

object is being returned

datajoely

08/11/2021, 5:52 PM

sorry my screenshot is run

datajoely

08/11/2021, 5:53 PM

the node to look at is this one

datajoely

08/11/2021, 5:53 PM

download_files

Bertozzo

08/11/2021, 5:55 PM

ok, im checking for it

datajoely

08/11/2021, 5:55 PM

So this is no longer a CSV encoding issue

datajoely

08/11/2021, 5:56 PM

it looks like the encoding tweak fixed things

datajoely

08/11/2021, 5:56 PM

but now is just a badly formed node

datajoely

08/11/2021, 5:56 PM

I'm going to log off for the evening - but feel free to post questions here and I'll pick them up when I'm next online 🙂

datajoely

08/11/2021, 5:56 PM

good luck!

Bertozzo

08/11/2021, 5:57 PM

i see, and why do you think its not working then ? i really thought it was about a windows issue or smt

Bertozzo

08/11/2021, 5:57 PM

sure, thank you so much !

Bertozzo

08/11/2021, 5:57 PM

really helped ! have a good day, see u soon 😄

datajoely

08/11/2021, 5:57 PM

💪

Bertozzo

08/11/2021, 8:58 PM

Passing by to close the issue here, all set and running smoothly now ! Thanks again for your support @datajoely ! Take care !

datajoely

08/11/2021, 8:58 PM

Nice! Well done

2 Views

Previous Next