datajoely
12/02/2021, 4:09 PMdatajoely
12/02/2021, 4:09 PMPiesky
12/02/2021, 4:31 PMinfo.log
is not created until program termination, either successful or not. It looks like streamhandler is flushed at th end to this file maybe?datajoely
12/02/2021, 4:33 PMPiesky
12/02/2021, 4:34 PMantony.milne
12/02/2021, 5:02 PMkedro
command **before conf/base/logging.yml is read **kedro sets some default logging according to this config: https://github.com/quantumblacklabs/kedro/blob/master/kedro/config/logging.yml. As you can see this includes info_file_handler
which is what writes to info.log
If you're surprised by that, you're not the only one π I only just realised this a couple of weeks ago and I don't think anyone else on the team was aware of it either! See https://github.com/quantumblacklabs/kedro/pull/1024datajoely
12/02/2021, 5:02 PMantony.milne
12/02/2021, 5:07 PMantony.milne
12/02/2021, 6:19 PMkedro.init
entrypoint, which means you'll need to make a pip-installable plugin.
Here's a minimal example: https://github.com/AntonyMilneQB/kedro-disable-logging
You can install by pip install git+https://github.com/AntonyMilneQB/kedro-disable-logging.git
This example is very brute force in that it calls logging.disable
. You can definitely make it less aggressive and just remove the handlers you don't want instead in plugin.disable_logging
Ian Whalen
12/03/2021, 12:12 AMAPIDataSet
and am having some issues getting the auth
keyword argument working.
My definition looks like this:
yaml
my_api:
type: api.APIDataSet
url: ${API_URL}
auth:
- "${USERNAME}"
- "${PASSWORD}"
but requests expects auth to be a tuple
or HTTPBasicAuth
. Sending in a list like this gives back 'list' object not callable
.
I also tried giving auth !!python/tuple ["${USERNAME}, ${PASSWORD}"]
but no dice there either since pyyaml is using the safe loader and tuples aren't allowed.
Any ideas?datajoely
12/03/2021, 8:58 AMantony.milne
12/03/2021, 10:32 AMAPIDataSet
that handles that. If you don't want to subclass it then you could hack together a hook together that converts dataset._request_args["auth"]
to a tuple alsoantony.milne
12/03/2021, 10:39 AMbase_pipeline = Pipeline([node(func, "input_data", "output_data")])
# in reality base_pipeline would have many nodes
all_pipelines = {}
for year in range(2020, 2030):
all_pipelines[f"year_{year}"] = pipeline(
base_pipeline,
outputs={"output_data": f"year_{year+1}.input_data"},
namespace=f"year_{year}"
)
all_pipelines["all_years"] = sum(all_pipelines.values())
dmb23
12/03/2021, 12:58 PMantony.milne
12/03/2021, 2:22 PMConfigLoader
there's no reason to be restricted to yaml if you don't want to)
* environment variables are also quite good for this sort of thing, e.g. you'd do YEAR_START = 2020 YEAR_END = 2030 kedro run
and then in the pipeline_registry you'd do range(os.getenv("YEAR_START"), os.getenv("YEAR_END"))
dmb23
12/03/2021, 2:28 PMdmb23
12/03/2021, 2:29 PMantony.milne
12/03/2021, 2:41 PMIan Whalen
12/03/2021, 6:33 PMyaml
api:
type: MyAPIDataSet
url: ...
auth:
- ${username}
- ${password}
where my credentials has:
yaml
username: me
password: my_password
I also tried:
catalog.yml
yaml
api:
type: MyAPIDataSet
url: ...
auth: my_auth
credentials.yml
yaml
my_auth:
- me
- my_password
Am I not understanding how credentials.yml
is supposed to work? Or is it just wonky when working with APIDataSet?Ian Whalen
12/03/2021, 7:07 PMcredentials
keys. I fixed this by adding a credentials
kwarg to my child class.
Here's my class if anyone is interested:
python
from typing import Any, Dict, Iterable, List, Union
from requests.auth import AuthBase
from kedro.extras.datasets.api import APIDataSet
class AuthorizableAPIDataSet(APIDataSet):
def __init__(
self,
url: str,
method: str = "GET",
data: Any = None,
params: Dict[str, Any] = None,
headers: Dict[str, Any] = None,
auth: Union[Iterable[str], AuthBase] = None,
json: Union[List, Dict[str, Any]] = None,
timeout: int = 60,
credentials: Union[Iterable[str], AuthBase] = None,
) -> None:
if credentials is not None and auth is not None:
raise ValueError("Cannot specify both auth and credentials.")
auth = credentials or auth
if isinstance(auth, Iterable):
auth = tuple(auth)
super().__init__(
url=url,
method=method,
data=data,
params=params,
headers=headers,
auth=auth,
json=json,
timeout=timeout,
)
T.Komikado
12/06/2021, 6:57 AMantony.milne
12/06/2021, 9:42 AMoverwrite
option for `PartitionedDataSet`which does exactly that. By default this will be set to false
(current behaviour) but if you set it to true
in your catalog.yml file it will delete all the old files before saving new dataantony.milne
12/06/2021, 9:43 AMMyPartitionedDataSet
class which has this code in it: https://github.com/quantumblacklabs/kedro/blob/ae80b129cb4f1973a554af964d48d8af0c355bb9/kedro/io/partitioned_data_set.pyantony.milne
12/06/2021, 9:43 AMT.Komikado
12/06/2021, 11:25 AMIsaac89
12/09/2021, 2:49 PMdatajoely
12/09/2021, 3:02 PM**_fs_args
in our read/write optionsdatajoely
12/09/2021, 3:02 PMself._fs = fsspec.filesystem(self._protocol, **_credentials, **_fs_args)
datajoely
12/09/2021, 3:13 PMdatajoely
12/09/2021, 3:13 PM