Sounds like partitioned dataset is the proper use ...
# advanced-need-help
a
Sounds like partitioned dataset is the proper use case. Provided you can figure out how you'd shard your initial dataset If you use a compressed format afterwards would you get a performance benefit? then I'd be keen to dump the partitions on that format so they're available locally and faster afterwards. Some food for thought