brewski
01/03/2022, 10:45 AMdatajoely
01/03/2022, 4:13 PMbrewski
01/04/2022, 2:08 PM@project_root>> ls conf/local/credentials/ client-cert.pem
client-key.pem server-ca.pem
credentials.yml
datajoely
01/04/2022, 2:08 PMbrewski
01/04/2022, 2:08 PMdatajoely
01/04/2022, 2:08 PMbrewski
01/04/2022, 2:22 PM(head -n1 && tail -n1) <client-key.pem -----BEGIN RSA PRIVATE KEY-----
<BASE64 ENCODED KEY HERE>
-----END RSA PRIVATE KEY-----%
(head -n1 && tail -n1) <client-cert.pem -----BEGIN CERTIFICATE-----
<MORE BASE64>
-----END CERTIFICATE-----%
(head -n1 && tail -n1) <server-ca.pem
-----BEGIN CERTIFICATE-----
<BASE64 AGAIN>
-----END CERTIFICATE-----%
datajoely
01/04/2022, 2:26 PMbrewski
01/04/2022, 2:26 PMfrom sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
import re
from sqlalchemy.sql.sqltypes import Text
from sqlalchemy import or_
global_link = "postgresql://postgres:passwd@ip/user"
ROOT = "/absolute/path/to/project/root"
engine = create_engine(
global_link,
connect_args={
"sslmode": "verify-ca",
"sslrootcert": f"{ROOT}/conf/local/server-ca.pem",
"sslcert": f"{ROOT}/conf/local/client-cert.pem",
"sslkey": f"{ROOT}/conf/local/client-key.pem",
},
)
Base = automap_base()
Base.prepare(engine, reflect=True)
session = Session(engine)
datajoely
01/04/2022, 2:27 PMbrewski
01/04/2022, 2:27 PMdatajoely
01/04/2022, 2:27 PMpd.read_sql_table
functionload()
methodpd.read_sql_table
method working outside of Kedro?brewski
01/04/2022, 2:31 PMdatajoely
01/04/2022, 2:32 PMbrewski
01/04/2022, 2:32 PMdatajoely
01/04/2022, 2:33 PMbrewski
01/04/2022, 2:33 PMpd.read_sql(query.statement, query.session.bind)
for any query via sqlalchemydatajoely
01/04/2022, 2:36 PMssl
librarybrewski
01/04/2022, 2:37 PMdatajoely
01/04/2022, 2:38 PMbrewski
01/04/2022, 2:39 PMdatajoely
01/04/2022, 2:40 PMbrewski
01/04/2022, 2:41 PMdatajoely
01/04/2022, 2:41 PMbrewski
01/04/2022, 2:42 PMantony.milne
01/04/2022, 3:27 PMSQLTableDataSet
and SQLQueryDataSet
are wrappers for pandas pd.read_sql_table(**load_args)
and pd.read_sql_query(**load_args)
. These call create_engine
at some point (which is why we don't explicitly in the kedro datasets).
These pandas functions take an argument con
, which as well being a string can be a SQLAlchemy connection https://pandas.pydata.org/docs/reference/api/pandas.read_sql_query.html
Hence what you need to do is inject into load_args["con"]
the appropriate connection object. This would require a custom dataset but should be easy to implement. Subclass SQLQueryDataSet
and then do something like this:# catalog.yml
dataset_name:
type: path.to.your.CustomSQLQueryDataSet
credentials: ssl_credentials
# credentials.yml
ssl_credentials:
url: postgresql://postgres:passwd@ip/user
connect_args:
sslmode: verify-ca
sslrootcert: client-cert.pem
...
SQLQueryDataSet.__init__
currently does
self._load_args["con"] = self._save_args["con"] = credentials["con"]
Your custom one would instead need to do something like
._load_args["con"] = self._save_args["con"] = create_engine(**credentials)
sslrootcert:/absolute/path/to/project/root/conf/local/credentials/client-cert.pem
as you do in your example. The problem with this is that it's a bit hacky because:
1. it hard codes the environment to local
. This is where credentials should always reside anyway so not such a huge problem, just a bit ugly
2. it hard codes your project path in. This you can easily get around it by just specifying a relative path in your credentials yaml file and doing some path manipulation with something like pathlib.Path.cwd()
in your custom datasetssl
library instead of using kedro's credentials.yml file - this avoids the above dirtiness with filepathsbrewski
01/10/2022, 12:11 AMcreate_engine
to accept "ssl_context" as a arg for "con". I'm on sqlalchemy v1.4.29 and I'm getting a programming error. Could I get some pointers with regards to this?antony.milne
01/10/2022, 7:41 AM