Error when save using pandas CSVDataset Kedro #beginners-need-help

Join Discord

Error when save using pandas.CSVDataset

# beginners-need-help

wise009

07/11/2022, 11:15 AM

Error when save using pandas.CSVDataset

wise009

07/11/2022, 11:15 AM

Hello I want to know what is the cause of my problem when I save the DF output to CSV, because when I tried to save it using pandas.ExcelDataset everything works fine here is my catalog

Copy code

yaml
word_cloud_en:
  type: pandas.CSVDataSet
  filepath: data/05_model_input/word_cloud_en.csv

and here is the nodes

Copy code

python
def create_word_cloud(df: pd.DataFrame, params: Dict[str, str]) -> pd.DataFrame:
    title_col = params["title"]
    overview_col = params["overview"]
    mem_col = params["member_id"]
    lang = params["lang"]
    en_select = [title_col, overview_col, mem_col]
    df = df[en_select]
    memids = list()
    content = list()
    df.dropna(subset=[title_col, overview_col], how='all', inplace=True)
    for title, overview, mem_id in zip(df[title_col], df[overview_col], df[mem_col]):
        if title != title or title is None : title = ''
        if overview !=overview or overview is None: overview = ''
        content.append(title+overview)
        memids.append(mem_id)
    freq = [0]*(len(content))
    df = pd.DataFrame({'content': content, 'memberID': memids, 'freq':freq})
    df = df.head(200)
    if(lang == "en"):
        df['content'] = df['content'].apply(en_clean)
    elif(lang == "jp"):
        df['content'] = df['content'].apply(jp_clean)
    df = df.explode('content')
    df['freq'] = df.groupby(['content', 'memberID'])['content'].transform('count')
    df = df.sort_values(by='freq', ascending=False)
    df.drop_duplicates(inplace=True)
    df.dropna(inplace=True)
    df['content'] = df['content'].apply(no_num)
    df.dropna(inplace=True)
    df = df.sort_values(by='freq', ascending=False)
    df.reset_index(drop=True, inplace=True)
    print(df.head())
    return df

wise009

07/11/2022, 11:24 AM

And this is the error that i got

Copy code

python
File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/kedro/io/core.py", line 217, in save
    raise DataSetError(message) from exc
kedro.io.core.DataSetError: Failed while saving data to data set CSVDataSet(filepath=/Users/alifian/Documents/GitHub/kobe-u-ml-maintenance/data/05_model_input/word_cloud_en.csv, load_args={}, protocol=file, save_args={'index': False}).
a bytes-like object is required, not 'str'

noklam

07/11/2022, 1:54 PM

@wise009 Hi, what version of Kedro are you using?

noklam

07/11/2022, 1:56 PM

if you put an extra

df.to_csv(something)

before the

return df

, does it work?

wise009

07/11/2022, 7:47 PM

thank you for answer @noklam i use kedro 0.18.1

wise009

07/11/2022, 7:48 PM

o alright will try it once

noklam

07/11/2022, 7:51 PM

I am asking this because the error looks quite strange to me, the DataSet itself is a thin wrapper on df.to_csv, and csvDataSet is one of the most popular one.

wise009

07/11/2022, 8:12 PM

yeah me too because in another node if i use CSVDataset it works well and the rows is not much is just around 500 rows

noklam

07/11/2022, 8:14 PM

So it will be good to know if calling pandas api works or not.

wise009

07/11/2022, 8:26 PM

oh it works when i use df.to_csv

wise009

07/11/2022, 8:28 PM

This is also part of the above error maybe it helps

Copy code

python
Traceback (most recent call last):
  File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/kedro/io/core.py", line 210, in save
    self._save(data)
  File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/kedro/extras/datasets/pandas/csv_dataset.py", line 171, in _save
    data.to_csv(path_or_buf=buf, **self._save_args)
  File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/pandas/core/generic.py", line 3167, in to_csv
    formatter.save()
  File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 206, in save
    self._save()
  File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 314, in _save
    self._save_header()
  File "/Users/alifian/opt/miniconda3/envs/kedro-kobe/lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 283, in _save_header
    writer.writerow(encoded_labels)
TypeError: a bytes-like object is required, not 'str'

The above exception was the direct cause of the following exception:

2 Views

Previous Next