Hi, I've held on Petastorm cause in their examples...
# advanced-need-help
w
Hi, I've held on Petastorm cause in their examples you've got to work within a context manager (https://petastorm.readthedocs.io/en/latest/readme_include.html#spark-dataset-converter-api), while I'd like to encapsulate the loading logic inside a custom dataset and cleanly return the resulting
tf.data.DataSet
object. According to their docs "when exiting the context, the reader of the dataset will be closed". RE breakpoints: unfortunately I'm working with an old version of Jupyter Lab and can't readily update it nor install plugins. I'd rather use vscode but I've had some trouble setting up the ssh + Docker integration (my dev env is a Docker container running on an EC2 instance). I'll keep trying things to isolate the error further. Thanks for the pointers