williamc
03/04/2022, 11:16 PMtf.data.experimental.make_csv_dataset
. In this particular case I have a node at the end of one of my pipelines saving the dataframe to s3 (spark.SparkDataFrame
), and I have written a custom dataset (essentially copied most of the code from TensorFlowModelDataset
) that does the reading at the beginning of the next pipeline. The maddening issue I haven't been able to solve is that, if I run both pipelines with the --from-node option, the run fails as my call to self._fs.get()
returns an empty result. I have verified that the dataframe is being correctly written to my s3 bucket, but a call to self._fs.ls(load_path)
comes back empty as well.
If after my failed run, I run just the second pipeline, everything works as expected, self._fs.get()
returns my csv files and I'm able to load my data into a TF dataset and train my model without issue.
Does anybody have any idea about what I'm doing wrong?