778216384475693066 #beginners-need-help

Channels

advanced-need-help

job-posting

welcome

beats-like-a-helix

02/25/2022, 6:47 PM

Ah, I assumed he was part of the team or something. That's great!

wulfcrona

02/28/2022, 12:43 PM

I have more of a conceptual question, for my latest project one of the features are scraped from a react-web app. To do this I need a path to chrome driver in addition to some python libraries. What is the Kedro way to solve this? Should I just add them to the catalog and feed to the nodes for create a special folder and have the path in the parameters?

lbonini

02/28/2022, 2:19 PM

Hello people! Could someone give me a suggestion if there is a simple way to persist SQLQueryDataSet into parquet and create a parameter to use the persisted or non-persisted dataset? (Without duplicating the entries on catalog.yml)

datajoely

02/28/2022, 2:23 PM

So you would have to create an output dataset that does the persisting - simple node that accepts the data and then outputs to a new dataset that gets persisted. What you can do is an

after_pipeline_created

hook that replaces the dataset with a MemoryDataSet dynamically based on the parameter or env variable

lbonini

02/28/2022, 2:26 PM

Thank you for your response @User ! Do you have any code example or video that I can use to understand it better?

beats-like-a-helix

03/04/2022, 7:17 PM

I'm looking at the documentation for matplotlibwriter regarding saving a dictionary of plots: https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.matplotlib.MatplotlibWriter.html However when the images are saved, they do not have a format, even if a format is specified in the save_args dict as "format": "". I recon I'm misunderstanding something. Anyone got any advice?

beats-like-a-helix

03/04/2022, 8:32 PM

Found a workaround, which is to specify the format in the dictionary key, but this doesn't feel right, as it's not what the documentation suggests:

Copy code

python
plots_dict = dict()
for colour in ["blue", "green", "red"]:
    plots_dict[f"{colour}.pdf"] = plt.figure()
    plt.plot([1, 2, 3], [4, 5, 6], color=colour)
plt.close("all")
dict_plot_writer = MatplotlibWriter(
    filepath="matplotlib_dict",
    save_args={
        # "format": "pdf",
        "dpi": 300,
        "bbox_inches": "tight",
    },
)
dict_plot_writer.save(plots_dict)

pypeaday

03/04/2022, 9:37 PM

Would a PartitionedDataSet made up of MatplotlibWriter datasets work/make sense? I'm admittedly totally unfamiliar with the MatplotlibWriter one but we used Partitioned and IncrementalDataSets kind of a lot and they're super nice

desrame

03/04/2022, 10:14 PM

very new to kedro - the first time I ran kedro new, it generated a Project folder, with a repo folder inside, and a package under source... almost every time since then as I explore and learn, it seems to be generating only the repo and package level

desrame

03/04/2022, 10:15 PM

will this end up causing any issues?

beats-like-a-helix

03/04/2022, 10:17 PM

Ah, I didn't know that MatplotlibWriter could be treated as a dataset in catalog.yml! That makes my job easier, since I'm actually trying to create plots for each file in an existing PartitionedDataset. But I'm still experiencing the same problem of not having a file format by default!

beats-like-a-helix

03/04/2022, 10:31 PM

Crisis averted, just had to specify things properly in catalog.yml. In my case:

Copy code

yml
power_spectrum_figures:
  type: PartitionedDataSet
  path: data/07_model_output/figures
  dataset:
    type: matplotlib.MatplotlibWriter
    save_args:
      format: pdf
      dpi: 300
      bbox_inches: tight
  filename_suffix: ".pdf"

beats-like-a-helix

03/04/2022, 10:37 PM

top level of a new project should always look something like this I believe:

Copy code

sh
.
├── README.md
├── conf
├── data
├── docs
├── info.log
├── logs
├── notebooks
├── pyproject.toml
├── setup.cfg
└── src

beats-like-a-helix

03/04/2022, 10:42 PM

Another general question, what is the accepted directory in which to place any generated figures? Do they "belong" in one of the later data layer folders, or should I just create a new damn folder? Not that it matters much, but I'm trying to learn the jedi way

desrame

03/04/2022, 10:44 PM

oh cool, i must have mis-remembered something when i populated my first project

desrame

03/04/2022, 10:45 PM

thanks beats 🙂

desrame

03/05/2022, 3:25 AM

another newb question: running into the following error, and after reading the docs im not sure what catalog and credentials mismatch exists

desrame

03/05/2022, 3:26 AM

KeyError: "Unable to find credentials 'sql1': check your data catalog and credentials configuration.

desrame

03/05/2022, 3:28 AM

Copy code

# catalog.yml definition
lstm_base:
    type: pandas.SQLTableDataSet
    table_name: sometable
    credentials: sql1

# credentials.yml in local
sql1:
    con: mssql+pyodbc:///?odbc_connect=DRIVER={ODBC+Driver+17+for+SQL+Server};SERVER=someserver;DATABASE=somedatabase;UID=someuser;PWD=somepwd

# .py version working in .py script
conn = \
    'DRIVER={ODBC Driver 17 for SQL Server};SERVER=someserver;DATABASE=somedatabase;UID=someuser;PWD=somepassword'
quoted = quote_plus(conn)
new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
engine = create_engine(new_con, fast_executemany=True, connect_args={'timeout': 100})

desrame

03/05/2022, 3:28 AM

based on the docs, it feels like catalog.yml should be able to reference the credentials in credentials.ym

datajoely

03/05/2022, 11:27 AM

Hi @User that looks right - are you still having this issue?

datajoely

03/05/2022, 11:27 AM

would you mind running the catalog.yml and credentials.yml through something like http://www.yamllint.com/ to make sure they are valid?

desrame

03/05/2022, 6:08 PM

thank you for reaching out, i just ran it through yamllint and they were both valid 😦

desrame

03/05/2022, 6:10 PM

when i run:

Copy code

from kedro.config import ConfigLoader, MissingConfigException

conf_paths = ["conf/base", "conf/local"]
conf_loader = ConfigLoader(conf_paths)

try:
    credentials = conf_loader.get("credentials*", "credentials*/**")
except MissingConfigException:
    credentials = {}
    
credentials

desrame

03/05/2022, 6:11 PM

it is able to find it as well and to print the creds, from both a conf/local and a conf/base

datajoely

03/05/2022, 6:13 PM

Okay so that has to be a working directory issue

desrame

03/05/2022, 6:13 PM

i think so, this is my first real foray into doing MLOps types of things

desrame

03/05/2022, 6:13 PM

so its a fun journey 🙂

datajoely

03/05/2022, 6:13 PM

Okay!

datajoely

03/05/2022, 6:13 PM

We were here once!