feat(cozy-clisk):Implementing selective file dowload and fixing metadata dedupli…#930
feat(cozy-clisk):Implementing selective file dowload and fixing metadata dedupli…#930
Conversation
2c89c24 to
ecb3e4e
Compare
…data deduplication
ecb3e4e to
c61d290
Compare
|
|
||
| module.exports = getFileIfExists | ||
|
|
||
| async function getFileIfExists({ client, entry, options, folderPath, slug }) { |
There was a problem hiding this comment.
jsdoc please with //ts-check
There was a problem hiding this comment.
If we make a request to the cozy, then the method should be called fetchSomething (https://github.com/cozy/cozy-guidelines/#naming-of-functions)
Get should only be used to get something in memory
|
|
||
| async function getFileIfExists({ client, entry, options, folderPath, slug }) { | ||
| const fileIdAttributes = options.fileIdAttributes | ||
| const sourceAccountIdentifier = options.sourceAccountIdentifier |
There was a problem hiding this comment.
nit: const { fileIdAttributes, sourceAccountIdentifier } = options
| } else { | ||
| log.debug('Rolling back on detection by filename') | ||
| return getFileFromPath(client, entry, folderPath) | ||
| } |
There was a problem hiding this comment.
We prefer the early return, it's a lot easier to read:
if(!isReadyForFileMedata){
return getFileFromPath
} else {
...
...
}
| } | ||
| } | ||
|
|
||
| async function getFileFromMetaData( |
There was a problem hiding this comment.
same here for naming convention. Please follow https://github.com/cozy/cozy-guidelines/#naming-of-functions
| fileIdAttributes, | ||
| sourceAccountIdentifier, | ||
| slug | ||
| ) { |
|
|
||
| const isReadyForFileMetadata = | ||
| fileIdAttributes && slug && sourceAccountIdentifier | ||
| if (isReadyForFileMetadata) { |
There was a problem hiding this comment.
to me isReadyForFileMetadata means that this method will be called several times in a short period of time. But I don't think this is the meaning here. It's more hasEnoughtMetadata or something no?
There was a problem hiding this comment.
Yes, this is to decide if enough Metadata are available.
I can change some naming, but this kind of naming was inherited from the existing saveFiles
Same for functions name that you point.
| 'cozyMetadata.sourceAccountIdentifier', | ||
| 'cozyMetadata.createdByApp' | ||
| ]) | ||
| ) |
There was a problem hiding this comment.
1/ We should really avoid the queryAll() method. This method can return a lot of documents and can be pretty long to execute. Can't we restrict the returned data by querying only the last 12 months or else?
2/ I don't know if you need all the fields from this io.cozy.files but you can reduce the size of the returned data by selecting them by adding : .select(['name', 'cozyMetadata.createdAt'])
3/ You queryAll() but after that, you only return the first result. You should use something like this instead:
client.query(Q('io.cozy.files').where...limitBy(1))
be careful by changing to client.query() since query() will return an object with data.
There was a problem hiding this comment.
Yes, the queryAll is here only to know if more than one document satisfies the request, to log a waring if more than on file does it.
My mistake in the first place.
The get the same feature, we should then use client.query with limitBy(2)
| ) { | ||
| log.debug( | ||
| `Checking existence of ${calculateFileKey(entry, fileIdAttributes)}` | ||
| ) |
There was a problem hiding this comment.
const fileKey = calculateFileKey(entry, fileIdAttributes)
and use fileKey later instead of calculating it 3 times
| 'metadata.fileIdAttributes', | ||
| 'trashed', | ||
| 'cozyMetadata.sourceAccountIdentifier', | ||
| 'cozyMetadata.createdByApp' |
There was a problem hiding this comment.
Since the order is important (https://docs.cozy.io/en/tutorials/data/advanced/#indexes-performances-and-design) I'll changed to:
[
'metadata.fileIdAttributes',
'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
'trashed'
]
There was a problem hiding this comment.
I'd also suggest to use a partial index for the trashed attribute:
Q('io.cozy.files')
.where({
metadata: {
fileIdAttributes: calculateFileKey(entry, fileIdAttributes)
},
cozyMetadata: {
sourceAccountIdentifier,
createdByApp: slug
}
})
.indexFields([
'metadata.fileIdAttributes',
'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
])
.partialIndex({
trashed: false
})There was a problem hiding this comment.
Why that? I though the same at first (putting it in a partial index) but then I checked trashed attribute and it seems that is always there. So I don't understand why to put it in a partial index. :'(
There was a problem hiding this comment.
Since it's always there, it will always be indexed, but we only need the trashed: false docs. So the partial index will allow a lighter index with only non-trashed files
| } | ||
| } | ||
|
|
||
| async function getFileFromPath(client, entry, folderPath) { |
|
|
||
| async function getFileFromPath(client, entry, folderPath) { | ||
| try { | ||
| log.debug(`Checking existence of ${getFilePath({ entry, folderPath })}`) |
There was a problem hiding this comment.
store the result of getFilePath and use it later 🙏
| return files[0] | ||
| } else { | ||
| log.debug('File not found') | ||
| return false |
There was a problem hiding this comment.
should return null. Not boolean
| return result.data | ||
| } catch (err) { | ||
| log.debug(err.message) | ||
| return false |
|
Y a un truc qui me dérange avec ça, c'est qu'on fait les requêtes fichier par fichier. Et si on en a beaucoup, ça va vraiment pas être terrible. |
|
Et aussi, on est censé avoir un context sur les clisk des fichiers déjà récupérés etc, on devrait le passer à cet endroit là, non ? |
On avait enlevé la récupération du contexte lorsque les fichier n'avaient pas le bon createdByApp. Et on ne l'a pas remis depuis. Je pense qu'on peut faire une carte pour ça |
Avec un index ça n'est pas sensé être suffisamment performant ? Parce que sinon, on va de toutes façons devoir se reconstruire un inde x en mémoire pour savoir si un fichier existe déjà. En même temps, c'est vrai qu'ici on est sur un téléphone et le temps de réponse du réseau risque d'être plus élevé. Je ne sais pas vraiment dire à quel point. Mais je suppose que c'est une bonne chose qu'on soit plus facilement capables de faire des mesures maintenant |
|
@doubleface @LucsT what is the status of this PR? |
This PR bring a new exported function in cozy-clisk for flagship app.
This function is needed by the launcher to evaluate early if the file can be found on the cozy.
This code will be use by this PR on flagship app linagora/cozy-flagship-app#670
-> Separation and adaptation of function getFileIfExist in the lib
-> Evaluation of some parameters sooner in the process (folderpath, sourceAccountIdentifier, ...)
-> Adding some units tests that cover the function getFileIfExist.
-> Fixing deduplication on metadata that was not operational before due to received data structure
Need to be done before merge
-> Adapting timing wraper to this new lib function.
Should be done soon
-> Moving 'shouldreplace' sooner in the process
-> We lost the 'free' metadata update when file is already present, as we now do not update each time the file. Need to be adapt. (ex: add qualification on an existing already download file, ...)
-> At minimum, write basic saveFiles tests.