Conversation
| else: | ||
| raise ValueError(f"Unsupported core beamline: {core_beamline}") | ||
|
|
||
| def _validate_h5_files(self, config, h5_paths: list[Path]) -> list[Path]: |
There was a problem hiding this comment.
This validation was previously in BufferFilePaths and we had a discussion to move it from there. I find this location better (also was necessary due to restructure)
| # TODO: move to config | ||
| MULTI_INDEX = ["trainId", "pulseId", "electronId"] | ||
| PULSE_ALIAS = MULTI_INDEX[1] | ||
| FORMATS = ["per_electron", "per_pulse", "per_train"] |
There was a problem hiding this comment.
these have now been moved to config/config model
c206130 to
788d189
Compare
Pull Request Test Coverage Report for Build 13419398366Details
💛 - Coveralls |
There was a problem hiding this comment.
Copilot reviewed 10 out of 11 changed files in this pull request and generated no comments.
Files not reviewed (1)
- .cspell/custom-dictionary.txt: Language not supported
Comments suppressed due to low confidence (2)
src/sed/loader/flash/dataframe.py:423
- The docstring for the df_train property refers to channels of type [per pulse], but the implementation uses 'per_train'. Please update the comment to match the code and maintain clarity.
Returns a pandas DataFrame for given channel names of type [per pulse]
tests/data/loader/flash/config.yaml:57
- [nitpick] For consistency and to avoid potential YAML parsing issues, consider quoting the index values as strings (e.g. ['trainId', 'pulseId', 'electronId']).
index: [trainId, pulseId, electronId]
… as it is not available right now anyways
zain-sohail
left a comment
There was a problem hiding this comment.
Just few comments. Is this local metadata scheme also important for flash loader? Because then the code also needs to be updated there.
|
|
||
| raw: DirectoryPath | ||
| processed: Optional[Union[DirectoryPath, NewPath]] = None | ||
| meta: Optional[Union[DirectoryPath, NewPath]] = None |
There was a problem hiding this comment.
Instead of adding a new entry to the config model, I'd suggest we just allow directory paths in
sed/src/sed/core/config_model.py
Line 327 in 4a6ec53
what do you think?
There was a problem hiding this comment.
Fine for me. I just thought as it anyway would be one of the main folders inside the beamtime folder.
| self._config["core"]["paths"].get("processed", raw_dir.joinpath("processed")), | ||
| ) | ||
| meta_dir = Path( | ||
| self._config["core"]["paths"].get("meta", raw_dir.joinpath("meta")), |
There was a problem hiding this comment.
The path logic is confusing right now as there is too many possibilities. I'd put the default as archiver_url in lab default config, and one automatic option.
To me its not clear if the meta path is 'meta/' or 'meta/fabtrack/' right now
There was a problem hiding this comment.
This part is also confusing for me, as don't really see how you can get from raw_dir to e.g. processed_dir with raw_dir.joinpath("processed") - because this will give you beamtime_dir/raw_dir/processed instead of beamtime_dir/processed, or?
Currently, meta path is 'meta/fabtrack/' as it comes from Fabiano's code, but probably can be changed just to 'meta/' as soon as it will be accepted/generalized by IT guys.
src/sed/loader/cfel/loader.py
Outdated
| self.metadata.update(self.parse_local_metadata()) | ||
| else: | ||
| print("Metadata taken from SciCat") | ||
| self.metadata.update(self.parse_scicat_metadata(token) if collect_metadata else {}) |
There was a problem hiding this comment.
Not necessarily a big issue but the parse_scicat_metadata is called twice in case it exists, once during if and once during else.
One way could be:
scicat_metadata = self.parse_scicat_metadata(token) if collect_metadata else {})
self.metadata.update(scicat_metadata)
if len(scicat_metadata) == 0:
print("No SciCat metadata available, checking local folder")
self.metadata.update(self.parse_local_metadata())There was a problem hiding this comment.
Fine for me. Just wanted to implement check if SciCat entries available then go for it, if not then check local folder to be compatible to older beamtimes.
src/sed/loader/flash/metadata.py
Outdated
| return "{burl}{url}/%2F{npid}".format( | ||
| burl=self.url, | ||
| url="Datasets", | ||
| url="datasets",#"Datasets", |
There was a problem hiding this comment.
Yes, all metadata was migrated to generalized scicat.desy.de with new api where 'Datasets' were changed to 'datasets' :)
Hopefully within next days it should be also available from outside DESY.
This PR adds the lab loader requested in #503 . I tried to make minimal changes to the
FlashLoaderto make this work. The only major addition is the loader specific dataframe class and everything else stays approximately the same. So the lab data works with the flash loader but withbeamlineconfig ascfel.An example config is provided to make this work. Since I took out some hardcoded paramters (was in TODO) into the config, I updated the config model slightly.
Test data for this loading configuration still needs to be setup. I ask @kutnyakhov to provide a public file to perform this. Not sure if a tutorial is necessary or not.