Hi! I previously asked about parameterizing the da...
# beginners-need-help
v
Hi! I previously asked about parameterizing the data catalog using an environment variable for the location of a data file: https://discord.com/channels/778216384475693066/846330075535769601/953594739008077825 . I solved that by adding an env var to the globals_dict in hooks.py and then accessing it in data_catalog.yml. Similarly, now I'd now like to access metadata about the file in a node that processes the data. So something like:
Copy code
[...]
node(
name="process",
func="process_fcn",
inputs=dict(
  df="data_at_filepath",
  df_info="${DATA_INFO}"
  ),
[...]
)
where
DATA_INFO
would be an environment variable. However, AFAICT I can't inject an environment variable like this, the globals dict is not available (?). The two solutions I see are 1) just using
os.getenv
inside of the function
process_fcn
or 2) instead make the data info a parameter, refer to it as
param:data_info
and pass it in via
kedro run --params data_info:<something>
. Or is there a better way? This looks pretty similar to what I'm asking about: https://github.com/kedro-org/kedro/issues/1076
d
Hi @User yes the issue you raise is part of a bigger piece of work that we need to think through before releasing
in terms of accessing this sort of information I think hooks may get you what you need
if you really want to you can get hooks to update the catalog too
v
Ok, great. Then I know I'm not missing something (like last time :)).