bgereke
05/25/2022, 12:23 AMdatajoely
05/25/2022, 8:56 AMkedro run --pipeline initial_pipeline --env local_cluster
and then
kedro run --pipeline second_piepline --env emr_cluster
antony.milne
05/25/2022, 9:29 AMafter_context_created
hook I wonder if there's a better way of doing it now... Here's a rough demo of how you could do it: https://gist.github.com/AntonyMilneQB/792a748b0d921e2f9f78cc7dd9c13c97.
The advantage of this are:
* no need for a custom KedroContext
at all, since all the spark stuff is done in hooks
* you can still use run environments as you currently do, no need to create a separate run environment for each spark config (although you still can do so if you like)KedroContext
for spark initialisation. See https://github.com/kedro-org/kedro/issues/1563bgereke
05/25/2022, 7:32 PMdatajoely
05/25/2022, 7:37 PMantony.milne
05/25/2022, 8:15 PMbgereke
05/25/2022, 10:08 PMantony.milne
05/25/2022, 10:17 PMspark-submit --conf spark.hadoop.fs.s3.canned.acl=... spark.sql.adaptive.enabled=...
bgereke
05/25/2022, 10:26 PMantony.milne
05/25/2022, 10:34 PMspark-submit
feels very awkward to me and I'm surprised there's no --config-file
option already where you can input some file in a standardised format of key-value pairs. Like doesn't your spark-submit
command get huge if you want to specify 100 options? Or does that never really happen?bgereke
05/25/2022, 11:47 PMantony.milne
05/26/2022, 3:13 PM