datajoely
05/21/2022, 3:04 AMRRoger
05/21/2022, 6:16 AMoutput
to a list of length 2000, i.e. ["senate_2006-03-30", "senate_2006-03-31", ...]
, i.e. a 2000-line pipeline.py
? Or is there some sort of clever templating?datajoely
05/21/2022, 6:35 AMRRoger
05/21/2022, 11:46 AMMackson
05/24/2022, 12:37 AMdatajoely
05/24/2022, 1:10 AMMackson
05/24/2022, 8:37 AMMackson
05/24/2022, 8:38 AMdatajoely
05/24/2022, 8:50 AMMackson
05/24/2022, 8:51 AMMackson
05/24/2022, 8:56 AMnoklam
05/24/2022, 10:02 AMMackson
05/24/2022, 10:54 AMMackson
05/24/2022, 10:55 AMnoklam
05/24/2022, 11:00 AMnoklam
05/24/2022, 11:02 AMmap
, but for loop is fine too.noklam
05/24/2022, 11:04 AMpandas
is that it is memory hungry, especially during I/O and certain operations. Using the chunk
args helps to mitigate this problem by only loading & processing small batch of data and stitch them by at the end.
If the new dataset already iterate through the entire dataset before you start applying any transformation logic, then it doesn't help your memory problem.noklam
05/24/2022, 11:18 AMdatajoely
05/24/2022, 11:22 AMMackson
05/24/2022, 11:24 AMdatajoely
05/24/2022, 11:25 AMdatajoely
05/24/2022, 11:25 AMnoklam
05/24/2022, 11:29 AMMackson
05/24/2022, 11:30 AMnoklam
05/24/2022, 11:31 AMMackson
05/24/2022, 11:32 AMdatajoely
05/24/2022, 2:32 PMnoklam
05/24/2022, 2:44 PMdatajoely
05/24/2022, 2:46 PMnoklam
05/24/2022, 2:54 PM