https://kedro.org/ logo
#beginners-need-help
Title
# beginners-need-help
b

Bertozzo

08/11/2021, 5:14 PM
Greetings ! I'm facing a very similar issue to this one that happened in an older version https://github.com/quantumblacklabs/kedro/issues/291 Basically I can't save my dataset due an encoding error
d

datajoely

08/11/2021, 5:15 PM
Hi @User
have you tried this in your YAML entry?
Copy code
fs_args:
    open_args_load:
        mode: "rb"
In general when this comes up this sort of approach tends to work
Copy code
yaml
my_dataset:
  type: pandas.CSVDataSet
  filepath: xxxxx.csv
  fs_args:
    open_args_load:
      mode: "rb"
      encoding: "utf-8"
    open_args_save:
      mode: "w"
      encoding: "utf-8"
b

Bertozzo

08/11/2021, 5:17 PM
Hey there ! yes, that was my first attempt
d

datajoely

08/11/2021, 5:17 PM
Could you post your stack trace
b

Bertozzo

08/11/2021, 5:17 PM
i'll try this one
now i've a different output
d

datajoely

08/11/2021, 5:19 PM
Progress!
can you post it here
b

Bertozzo

08/11/2021, 5:20 PM
File "yaml\_yaml.pyx", line 707, in yaml._yaml.CParser.get_single_node File "yaml\_yaml.pyx", line 725, in yaml._yaml.CParser._compose_document File "yaml\_yaml.pyx", line 776, in yaml._yaml.CParser._compose_node File "yaml\_yaml.pyx", line 890, in yaml._yaml.CParser._compose_mapping_node File "yaml\_yaml.pyx", line 776, in yaml._yaml.CParser._compose_node File "yaml\_yaml.pyx", line 892, in yaml._yaml.CParser._compose_mapping_node File "yaml\_yaml.pyx", line 905, in yaml._yaml.CParser._parse_next_event yaml.scanner.ScannerError: mapping values are not allowed in this context in "C:\Users\lbertozz\Downloads\aut-ia-avaliador-de-materias\conf\base\catalog.yml", line 16, column 11
d

datajoely

08/11/2021, 5:20 PM
okay thats just a bad YAML file
there will be a bad indent somewhere
line 16 to be exact
b

Bertozzo

08/11/2021, 5:21 PM
mm, thats interesting
and where do i find this file ?
i only know where my xlxs and csv are
d

datajoely

08/11/2021, 5:22 PM
it's you catalog file
C:\Users\lbertozz\Downloads\aut-ia-avaliador-de-materias\conf\base\catalog.yml
line 16
b

Bertozzo

08/11/2021, 5:23 PM
ok, that was just an ident error 😅
now i am back to the original one, charmap encode etc etc
d

datajoely

08/11/2021, 5:24 PM
okay interesting
Can you post your YAML entry here?
for that dataset
b

Bertozzo

08/11/2021, 5:25 PM
one moment, i guess i found it
d

datajoely

08/11/2021, 5:25 PM
no problem
b

Bertozzo

08/11/2021, 5:25 PM
i need to put this in all csvs right ?
so i can read them all correctly
d

datajoely

08/11/2021, 5:26 PM
There is a way to re-use the same pattern over and over
but let's get it working for one
Copy code
yaml
my_dataset:
  type: pandas.CSVDataSet
  filepath: xxxxx.csv
  load_args:
     on_bad_lines: skip 
     encoding: 'utf-8'
the other option is to try different encodings like
utf-8-sig
and
utf-16
b

Bertozzo

08/11/2021, 5:28 PM
i did the same command to all df, an looks like it advanced, but now its saying that saving none to a dataset is not allowed
which is pretty logical tbh
d

datajoely

08/11/2021, 5:29 PM
which command are you talking about?
b

Bertozzo

08/11/2021, 5:29 PM
this one
d

datajoely

08/11/2021, 5:29 PM
You will get the
None
error if your
node
returns nothing
ah understood
So I'm a little confused, did we get it working?
or do you still need help
b

Bertozzo

08/11/2021, 5:31 PM
File "c:\users\lbertozz\appdata\local\programs\python\python37\lib\site-packages\kedro\io\core.py", line 232, in save raise DataSetError("Saving
None
to a
DataSet
is not allowed") kedro.io.core.DataSetError: Saving
None
to a
DataSet
is not allowed
thats was the last message
d

datajoely

08/11/2021, 5:31 PM
Okay so can you show me the
Node
you use to process the dataset
because it works by Catalog ->
load()
->
Node(Python function)
->
save()
-> Catalog
If you're getting the
None
error it means your node is returning a None value
not a DataFrame
b

Bertozzo

08/11/2021, 5:33 PM
whats the most weird of all
is that all my colleagues that use ubuntu can run it
without any issues
d

datajoely

08/11/2021, 5:34 PM
hmm
b

Bertozzo

08/11/2021, 5:34 PM
they dont even need to put the encoding parameters
d

datajoely

08/11/2021, 5:34 PM
maybe its your Python environment
are you running in a virtual env?
b

Bertozzo

08/11/2021, 5:34 PM
nope
d

datajoely

08/11/2021, 5:35 PM
so I can't guarantee that will fix things
but it sometimes makes things easier not having to worry about multiple Python versions, multiple version of Kedro etc
b

Bertozzo

08/11/2021, 5:35 PM
thats what i've done, just installed the 37, set as a global variable on my system
and then as a prymary interpreter on vscode
d

datajoely

08/11/2021, 5:36 PM
And we're sure it's the same version of the code?
this is super weird
Can you post a screenshot of the catalog entry and the python node
I can't really work it out without seeing them
b

Bertozzo

08/11/2021, 5:37 PM
one moment pls
can i call u pls ? and then share the screen ?
d

datajoely

08/11/2021, 5:41 PM
I can't tonight (I'm also on calls 🤦‍♂️)
I could book some time tomorrow?
or do it async on here
b

Bertozzo

08/11/2021, 5:42 PM
np
ok, lets start over
d

datajoely

08/11/2021, 5:43 PM
Would you like to book some time in tomorrow?
as it's getting towards end of day here in London
b

Bertozzo

08/11/2021, 5:44 PM
thats my screen rn
d

datajoely

08/11/2021, 5:44 PM
okay and I need to see the
.py
file that has the nodes
if you scroll a bit further up on the terminal it should tell you the name of the node
b

Bertozzo

08/11/2021, 5:46 PM
like this ?
021-08-11 14:43:12,636 - kedro.runner.sequential_runner - WARNING - There are 2 nodes that have not run. You can resume the pipeline run by adding the following argument to your previous command: --from-nodes "download_files,files_to_text" Traceback (most recent call last): File "c:\users\lbertozz\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec)
d

datajoely

08/11/2021, 5:46 PM
Even further please
you can right click the terminal and select all -> copy
That's very helpful thank you
so let me explain how to look at the logs as it explains whats going on
The node you need to look at is called
full_data_clean
and that will be in a
.py
file somewhere in your project folder
It looks like there is a bad return somewhere in there where a
None
object is being returned
sorry my screenshot is run
the node to look at is this one
download_files
b

Bertozzo

08/11/2021, 5:55 PM
ok, im checking for it
d

datajoely

08/11/2021, 5:55 PM
So this is no longer a CSV encoding issue
it looks like the encoding tweak fixed things
but now is just a badly formed node
I'm going to log off for the evening - but feel free to post questions here and I'll pick them up when I'm next online 🙂
good luck!
b

Bertozzo

08/11/2021, 5:57 PM
i see, and why do you think its not working then ? i really thought it was about a windows issue or smt
sure, thank you so much !
really helped ! have a good day, see u soon 😄
d

datajoely

08/11/2021, 5:57 PM
💪
b

Bertozzo

08/11/2021, 8:58 PM
Passing by to close the issue here, all set and running smoothly now ! Thanks again for your support @datajoely ! Take care !
d

datajoely

08/11/2021, 8:58 PM
Nice! Well done