user
03/11/2022, 7:38 AMWalber Moreira
03/11/2022, 2:30 PMdatajoely
03/11/2022, 2:33 PMWalber Moreira
03/11/2022, 2:34 PMdatajoely
03/11/2022, 2:35 PMThreadRunner
is for spark and other remote execution worklaodsWalber Moreira
03/11/2022, 2:40 PMWalber Moreira
03/11/2022, 2:40 PMdatajoely
03/11/2022, 2:44 PMDeep
03/11/2022, 2:46 PMdatajoely
03/11/2022, 2:47 PMWalber Moreira
03/11/2022, 3:44 PMboazmohar
03/11/2022, 4:07 PMdatajoely
03/11/2022, 4:15 PMboazmohar
03/11/2022, 4:21 PMdatajoely
03/11/2022, 4:23 PMyaml
my_data:
type: MyCustomImageClass
data_location: path/to/files
metadata_location: path/to/sepc.xml
datajoely
03/11/2022, 4:23 PMmetadata_location
need to be dynamic?datajoely
03/11/2022, 4:25 PMdatajoely
03/11/2022, 4:26 PMboazmohar
03/11/2022, 4:28 PMdata_location
so these is not even a need to a xml path... Here is part of the class
python
class DaskAlpha3TifsDataset(AbstractDataSet):
def __init__(self, filepath: str, params: Dict[str, Any] = None):
# parse the path and protocol (e.g. file, http, s3, etc.)
protocol, path = get_protocol_and_path(filepath)
self._protocol = protocol
self._filepath = PurePosixPath(path)
self._fs = fsspec.filesystem(self._protocol)
load_path = get_filepath_str(self._filepath, self._protocol)
self.xml = get_meta_alpha3(load_path)
self.xml['filters'] = params
self.xml['ch_names'] = [params[i] for i in self.xml['filter_order']]
def _load(self) -> da.Array:
"""Loads data from the image file.
Returns:
Data from the image file as a numpy array
"""
# using get_filepath_str ensures that the protocol and path are appended correctly for different filesystems
file_shapes = self.xml['file_shapes']
base_dir = self.xml['base_dir']
files = glob.glob(os.path.join(base_dir, 'Raw', '*.tiff'))
logger.info(f'Found {len(files)} files in {base_dir} Raw folder')
sizes = [(file_shapes[3], file_shapes[4])] * len(files)
delay = [dask.delayed(load_tiff_stack)(fn) for fn in files]
both = list(zip(delay, sizes))
slices = [slice(i, i+file_shapes[2]) for i in range(0, len(both), file_shapes[2])]
lazy_arrays = [da.from_delayed(x, shape=y, dtype=np.uint16) for x, y in both]
lazy_arrays_conZ = [ da.stack(lazy_arrays[s], axis=0) for s in slices]
lazy_arrays_conTileCh = da.stack(lazy_arrays_conZ, axis=0).reshape(file_shapes[[5,0,1,2,3,4]])
return lazy_arrays_conTileCh
boazmohar
03/11/2022, 4:29 PMget_meta_alpha3
knows how to find the xml based on the pathdatajoely
03/11/2022, 4:29 PMboazmohar
03/11/2022, 4:29 PMnode
datajoely
03/11/2022, 4:30 PMparams
however would be in the catalog definition NOT the nodeboazmohar
03/11/2022, 4:30 PMyml
gel3_round1:
type: alpha3_expand.extra.datasets.dask_alpha3_tifs_dataset.DaskAlpha3TifsDataset
filepath: /nrs/svoboda/moharb/ExM/Alpha3/20220310_YFP_ANM1_Gel3_R1_v2/Basal/
params:
2: YFP
3: PSD95
4: GluA1
datajoely
03/11/2022, 4:30 PMyaml
my_data:
type: DaskAlpha3TifsDataset
filepath: path/to/files
params:
xml: path/to/sepc.xml
boazmohar
03/11/2022, 4:31 PMDaskArray
but how do I access self.xml
boazmohar
03/11/2022, 4:35 PMDaskAlpha3TifsDataset._load
could be a dict with the raw data and metadata?boazmohar
03/11/2022, 4:39 PMpython
def _load(self) -> Dict[str, Any]:
"""Loads data from the image file.
Returns:
Data from the image file as a numpy array
"""
# using get_filepath_str ensures that the protocol and path are appended correctly for different filesystems
file_shapes = self.xml['file_shapes']
base_dir = self.xml['base_dir']
files = glob.glob(os.path.join(base_dir, 'Raw', '*.tiff'))
logger.info(f'Found {len(files)} files in {base_dir} Raw folder')
sizes = [(file_shapes[3], file_shapes[4])] * len(files)
delay = [dask.delayed(load_tiff_stack)(fn) for fn in files]
both = list(zip(delay, sizes))
slices = [slice(i, i+file_shapes[2]) for i in range(0, len(both), file_shapes[2])]
lazy_arrays = [da.from_delayed(x, shape=y, dtype=np.uint16) for x, y in both]
lazy_arrays_conZ = [ da.stack(lazy_arrays[s], axis=0) for s in slices]
lazy_arrays_conTileCh = da.stack(lazy_arrays_conZ, axis=0).reshape(file_shapes[[5,0,1,2,3,4]])
return {'data': lazy_arrays_conTileCh, 'meta':self.xml}
datajoely
03/11/2022, 6:39 PMboazmohar
03/11/2022, 8:36 PMboazmohar
03/11/2022, 8:36 PM