noklam
04/26/2022, 11:00 AMyaml
from_nodes: node1,node2
Rafał
04/26/2022, 11:00 AMfrom_inputs:
- vision_test_bunch
yields Pipeline does not contain data_sets named ["['vision_test_bunch']"]
Rafał
04/26/2022, 11:01 AMRafał
04/26/2022, 11:01 AMparams
section can be YAML dict and not string
Rafał
04/26/2022, 11:03 AMlv
structure ? 😉noklam
04/26/2022, 11:03 AMkedro run --config=config.yml
does not work as expectedBurn1n9m4n
04/26/2022, 6:27 PMBurn1n9m4n
04/26/2022, 6:29 PMnoklam
04/26/2022, 6:32 PMBurn1n9m4n
04/26/2022, 6:34 PMdatajoely
04/26/2022, 7:03 PMBurn1n9m4n
04/26/2022, 7:04 PMBurn1n9m4n
04/26/2022, 7:05 PMdatajoely
04/26/2022, 7:05 PMdatajoely
04/26/2022, 7:05 PMBurn1n9m4n
04/26/2022, 7:12 PMIncrementalDataSet
it'll pass in nothing into the function right?Burn1n9m4n
04/26/2022, 7:12 PMnoklam
04/26/2022, 7:16 PMBurn1n9m4n
04/26/2022, 7:40 PMdatajoely
04/26/2022, 7:41 PMBurn1n9m4n
04/26/2022, 7:47 PMIncrementalDataSet
(2) Run logic within function. If dataset is empty, it will just return an empty DataFrame
.
(3) That DataFrame
gets saved as a parquet within S3.
(4) That parquet gets loaded within the next node.
Its (4) that I'm not sure about. That would load a an empty parquet which would require some subsequent handling, I suspect.Burn1n9m4n
04/26/2022, 7:47 PMdatajoely
04/26/2022, 7:48 PMBurn1n9m4n
04/26/2022, 7:54 PMPartitionedDataSet
, which we setup this way because it would allow us to load the information that we need from the directory. There are checks that are performed to ensure that the file is the right one (checking filename, etc). At the end, after all the processing, the function takes each DataFrame
and adds it to a list which is then concatenated into a single DataFrame
.Burn1n9m4n
04/26/2022, 7:55 PMDataFrame
datajoely
04/26/2022, 7:55 PMdatajoely
04/26/2022, 7:56 PMBurn1n9m4n
04/26/2022, 8:00 PMDataFrame
is empty, could I setup logic to use the previous checkpoint instead?datajoely
04/26/2022, 8:00 PMdatajoely
04/26/2022, 8:00 PM