Greetings ! I'm facing a very similar issue to th...
# beginners-need-help
b
Greetings ! I'm facing a very similar issue to this one that happened in an older version https://github.com/quantumblacklabs/kedro/issues/291 Basically I can't save my dataset due an encoding error
d
Hi @User
have you tried this in your YAML entry?
Copy code
fs_args:
    open_args_load:
        mode: "rb"
In general when this comes up this sort of approach tends to work
Copy code
yaml
my_dataset:
  type: pandas.CSVDataSet
  filepath: xxxxx.csv
  fs_args:
    open_args_load:
      mode: "rb"
      encoding: "utf-8"
    open_args_save:
      mode: "w"
      encoding: "utf-8"
b
Hey there ! yes, that was my first attempt
d
Could you post your stack trace
b
i'll try this one
now i've a different output
d
Progress!
can you post it here
b
File "yaml\_yaml.pyx", line 707, in yaml._yaml.CParser.get_single_node File "yaml\_yaml.pyx", line 725, in yaml._yaml.CParser._compose_document File "yaml\_yaml.pyx", line 776, in yaml._yaml.CParser._compose_node File "yaml\_yaml.pyx", line 890, in yaml._yaml.CParser._compose_mapping_node File "yaml\_yaml.pyx", line 776, in yaml._yaml.CParser._compose_node File "yaml\_yaml.pyx", line 892, in yaml._yaml.CParser._compose_mapping_node File "yaml\_yaml.pyx", line 905, in yaml._yaml.CParser._parse_next_event yaml.scanner.ScannerError: mapping values are not allowed in this context in "C:\Users\lbertozz\Downloads\aut-ia-avaliador-de-materias\conf\base\catalog.yml", line 16, column 11
d
okay thats just a bad YAML file
there will be a bad indent somewhere
line 16 to be exact
b
mm, thats interesting
and where do i find this file ?
i only know where my xlxs and csv are
d
it's you catalog file
C:\Users\lbertozz\Downloads\aut-ia-avaliador-de-materias\conf\base\catalog.yml
line 16
b
ok, that was just an ident error 😅
now i am back to the original one, charmap encode etc etc
d
okay interesting
Can you post your YAML entry here?
for that dataset
b
one moment, i guess i found it
d
no problem
b
i need to put this in all csvs right ?
so i can read them all correctly
d
There is a way to re-use the same pattern over and over
but let's get it working for one
Copy code
yaml
my_dataset:
  type: pandas.CSVDataSet
  filepath: xxxxx.csv
  load_args:
     on_bad_lines: skip 
     encoding: 'utf-8'
the other option is to try different encodings like
utf-8-sig
and
utf-16
b
i did the same command to all df, an looks like it advanced, but now its saying that saving none to a dataset is not allowed
which is pretty logical tbh
d
which command are you talking about?
b
this one
d
You will get the
None
error if your
node
returns nothing
ah understood
So I'm a little confused, did we get it working?
or do you still need help
b
File "c:\users\lbertozz\appdata\local\programs\python\python37\lib\site-packages\kedro\io\core.py", line 232, in save raise DataSetError("Saving
None
to a
DataSet
is not allowed") kedro.io.core.DataSetError: Saving
None
to a
DataSet
is not allowed
thats was the last message
d
Okay so can you show me the
Node
you use to process the dataset
because it works by Catalog ->
load()
->
Node(Python function)
->
save()
-> Catalog
If you're getting the
None
error it means your node is returning a None value
not a DataFrame
b
whats the most weird of all
is that all my colleagues that use ubuntu can run it
without any issues
d
hmm
b
they dont even need to put the encoding parameters
d
maybe its your Python environment
are you running in a virtual env?
b
nope
d
so I can't guarantee that will fix things
but it sometimes makes things easier not having to worry about multiple Python versions, multiple version of Kedro etc
b
thats what i've done, just installed the 37, set as a global variable on my system
and then as a prymary interpreter on vscode
d
And we're sure it's the same version of the code?
this is super weird
Can you post a screenshot of the catalog entry and the python node
I can't really work it out without seeing them
b
one moment pls
can i call u pls ? and then share the screen ?
d
I can't tonight (I'm also on calls 🤦‍♂️)
I could book some time tomorrow?
or do it async on here
b
np
ok, lets start over
d
Would you like to book some time in tomorrow?
as it's getting towards end of day here in London
b
thats my screen rn
d
okay and I need to see the
.py
file that has the nodes
if you scroll a bit further up on the terminal it should tell you the name of the node
b
like this ?
021-08-11 14:43:12,636 - kedro.runner.sequential_runner - WARNING - There are 2 nodes that have not run. You can resume the pipeline run by adding the following argument to your previous command: --from-nodes "download_files,files_to_text" Traceback (most recent call last): File "c:\users\lbertozz\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec)
d
Even further please
you can right click the terminal and select all -> copy
That's very helpful thank you
so let me explain how to look at the logs as it explains whats going on
The node you need to look at is called
full_data_clean
and that will be in a
.py
file somewhere in your project folder
It looks like there is a bad return somewhere in there where a
None
object is being returned
sorry my screenshot is run
the node to look at is this one
download_files
b
ok, im checking for it
d
So this is no longer a CSV encoding issue
it looks like the encoding tweak fixed things
but now is just a badly formed node
I'm going to log off for the evening - but feel free to post questions here and I'll pick them up when I'm next online 🙂
good luck!
b
i see, and why do you think its not working then ? i really thought it was about a windows issue or smt
sure, thank you so much !
really helped ! have a good day, see u soon 😄
d
💪
b
Passing by to close the issue here, all set and running smoothly now ! Thanks again for your support @datajoely ! Take care !
d
Nice! Well done
2 Views