datajoely
10/25/2022, 6:24 PMdatajoely
10/25/2022, 6:24 PMdatajoely
10/25/2022, 6:25 PMThiago Poletto
10/25/2022, 6:35 PMdatajoely
10/25/2022, 6:36 PMdatajoely
10/25/2022, 6:36 PMThiago Poletto
10/25/2022, 6:43 PMThiago Poletto
10/25/2022, 6:44 PMrafael.gildin
10/25/2022, 7:05 PMdatajoely
10/25/2022, 7:37 PMdatajoely
10/25/2022, 7:38 PMThiago Poletto
10/25/2022, 7:43 PMdatajoely
10/25/2022, 7:59 PMdatajoely
10/25/2022, 7:59 PMThiago Poletto
10/25/2022, 8:01 PMThiago Poletto
10/25/2022, 8:04 PMSeth
10/28/2022, 12:59 PMnoklam
10/28/2022, 1:24 PMpackage.common
or package.utils
module. Reusing a function in multiple nodes is not uncommon.Seth
10/28/2022, 1:25 PMVici
11/01/2022, 8:59 AMdatajoely
11/01/2022, 9:00 AMVici
11/01/2022, 10:52 AMPYDEV DEBUGGER WARNING:
sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check:
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
File "c:\Users\my_project_directory\env\lib\site-packages\coverage\collector.py", line 292, in _installation_trace
sys.settrace(None)
I'm in the works of googling this... A bit confused, though, why this problem shows up.datajoely
11/01/2022, 10:57 AMVici
11/01/2022, 12:37 PM[tool.pytest.ini_options] \n addopts=...
, it works like a charm 🥰datajoely
11/01/2022, 1:49 PMdatajoely
11/01/2022, 1:50 PMfilpa
11/03/2022, 3:29 PMdask.ParquetDataSet from s3 -> MemoryDataSet -> dask.ParquetDataSet to s3
I run this pipeline from my local workstation for testing purposes. My Dask Cluster is then deployed on AWS EC2 (Scheduler+Workers) and they communicate privately. I noticed that on the last node, the MemoryDataSet -> dask.ParquetDataSet to s3
causes the data to be transferred to my local machine where the Kedro pipeline is being run, and then transferred back to s3. Needless to say this introduces costs and lag and is not what I intended.
Can I tell the workers to write this data directly to s3? If not, what is the intended way to do this? I read through the documentation, and there is some very good information on getting the Pipeline to run as either step functions (https://kedro.readthedocs.io/en/stable/deployment/aws_step_functions.html) or on AWS Batch (https://kedro.readthedocs.io/en/stable/deployment/aws_batch.html), but this is not quite the deployment flow I had in mind. Is it intended for the pipeline to be run on the same infrastructure where the workers are deployed?filpa
11/03/2022, 3:31 PMAVallarino
11/06/2022, 5:41 PMdatajoely
11/06/2022, 5:43 PM