Does the new catalog lazy loading somehow hit the ...
# beginners-need-help
w
Does the new catalog lazy loading somehow hit the describe functionality of a dataset now, prior to the dataset being initialized?
I have the following custom dataset:
d
I’m not entirely sure what you’re looking to achieve
Also you can use the regular SQL datasets in Kedro with snowflake as long as you install https://pypi.org/project/snowflake-sqlalchemy/
w
ya still trying to type everything out
the issue is the _describe function is being hit during the initialization prior to the self.sql = sql
d
And is the problem speed or something
w
No its that the run fails due to catalog load error because the dataset cannot be instantiated.
d
Oh got you
I mean you can return an empty dictionary in the describe method
But I would encourage you to use the regular pandas.SqlQueryDataSet with the snowflake engine library as it’s tested
w
@User we basically are doing what you are saying, but just managing credentials inside of a singleton in another package. Reguardless, I am baffeled as to why _describe is being hit prior to __init__ or even how that works.
d
Yeah it’s happening somewhere as the catalog is built up
w
Also, note this worked until upgrading from 17.2 to 17.5.
d
I need to finish for the day - but I will look into this first thing
w
Sounds good! Thanks for the help @User!
I found the problem. Super() is hitting the AbstractDataSet class which now has the following definition for
__str__
which calls self.describe() hit upon creation.
This seems a bit dangerous to me, given that the AbstractDataset is kinda designed to be inherited in the same way my dataset is created. But anyone including the input params in the same way that I did (super first then self.x being created) will hit this same error.
l
Hi @User ! AbstractDataSet doesn't have a constructor, so when extending it you don't need to call
super().__init__
. I assume you're trying to pass these to
Snowflake
, does that call super on
AbstractDataSet
or sth? I'm not sure why you'd hit
str()
before anything else happens, can you show us your Snowflake class?
Or I'd also make use of
mro()
for debugging when using multiple inheritance like here.
w
@User, I am not entirely following what you are saying. From what I can tell, to have a custom datasets, it has to be an inheritance from the abstract class such that when the catalog is built and passed to the AbstractDataset it is able to build, if you remove the AbstractDataset from the inheritance, it no longer functions.. __str__ is called like __init__ during the instantiation of the class. So what is happening is that in my class is defining _describe with an eliment which is not defined yet, as it is defined after the super(). Simply moving super() down resolves it. I suppose a viable solution would be to focus super() the Snowflake class, but again I suspect a lot of people will run into this same issue with how it is currently setup.
d
Can we see the
Snowflake
class to make sense of what's going on?
w
@User There are more to these classes, but I have shorten them to what you would care about.
d
Thanks @User we will get back to you shortly
2 Views