Description
[REQUIRED] Step 1: Describe your environment
- Xcode version: 11.6
- Firebase SDK version: 6.30.0
- Firebase Component: Storage
- Component version: 3.9.0
- Installation method: CocoaPods
[REQUIRED] Step 2: Describe the problem
Full disclosure: I'm having a hard time reproducing this issue. Additionally, while I am fairly confident that the problem lies in a recent Firebase SDK update, I can't prove it. Therefore, I am raising this issue with the information I have in hopes of getting the opinion(s) of the Firebase iOS team.
Our app allows users to create posts with an image that we upload to Firebase Storage. Until recently we were using SDK Version 6.18.0 without issue. In July we updated our app to use SDK Version 6.28.0 and rolled it out to production after not finding any problems during testing. After doing this, we noticed that some images were unexpectedly not being uploaded. Of the posts being created, approximately 1 in 10 are missing their image.- which will become a big problem if we can't quickly resolve it. After some review, we determined that there were no code changes on our end that should have had any impact.
The only changes to our app at this time were dependency updates- where Firebase was one of three dependencies updates. The second SDK was for Branch and the third for KeychainAccess- both seemingly unrelated. We recently updated from 6.28.0 to 6.30.0 and noticed no effect on image uploading. While updating the SDK, we also threw in a Performance trace to profile the image upload task so that we could see how long the upload process was taking. See the Relevant Code section regarding implementation of this trace. As for the results, we now have a few hundred samples and the metrics are interesting.
The images being uploaded have a median size of 574 KB, with a median upload duration of 2.18 seconds as shown below.
The median on WiFi is good, but the 95th percentile is 27.62 seconds which seems unusually long. When looking at non-WiFi, the median is 19 seconds with a 75th percentile of 66 seconds and 95th percentile of 272 seconds. Of course we expect that users with poor service will experience longer uploads, but these numbers seem excessive.
Finally, we can also look at error rates. Occasionally we are seeing an error in the closure that we record with Crashlytics (50 occurrences in the last 90 days), where the localized description is always An unknown error occurred, please check the server response.
. We're not really sure what this means- but this is not a new issue, and was occurring prior our upgrade from 6.18.0 and is still occurring at a similar frequency in 6.30.0.
Looking at the upload duration split up by success/error, it would appear at first glance that the upload duration where the response was Success is very low. However, it only appears that way because the time scale is skewed- as the 95th percentile for duration with Failure responses is 9,687 seconds (yes, that's close to 3 hours). The 75th percentile for Failure is 3,777 seconds, also unusually long.
The 95th percentile for Success responses is 66 seconds, which demonstrates that even successful uploads sometimes do take very long. Knowing this, and knowing that the occasional error response we are getting is much less frequent than the silent upload failure we're more recently experiencing, our current theory is that the image sometimes takes so long to upload that the user may be exiting the app before the operation completes, and the upload may cease when the app resigns its active state and/or terminates.
Unfortunately, we only implemented this trace after we started noticing the problem- so we don't have data from before to compare it to. We can only venture to guess that these numbers we are seeing now are unusual.
Now, I've already checked the Release Notes history to see if there were any interesting notes for the Storage module, but what I am going to do next is study every module to see if downgrading the SDK is an option for us to troubleshoot. If this were consistently reproducible then we could test this easily in a development environment. Instead, because of our difficulty to reproduce, we would need to deploy the downgraded SDK to production. Doing this may be totally fine, but I am weary of causing problems because we'd be reverting half a year of SDK updates with several feature-adds and bug-fixes that may be relevant to us.
I'm curious is anyone on the Firebase team has any ideas about this. In the mean time, we'll keep poking around and will report if we find anything further. As a workaround and safeguard, we're also going to try caching the image locally until we receive a successful response.
Steps to reproduce:
No reliable method of reproduction, see Relevant Code for... well... the relevant code.
Relevant Code:
guard let imageData = someUIImage.jpegData(compressionQuality: 0.9) else { return }
let storageRef = Storage.storage().reference().child("foobar.jpg")
let metadata = StorageMetadata()
metadata.contentType = "image/jpeg"
let uploadTrace = Performance.startTrace(name: "uploadImage")
uploadTrace?.setValue(Int64(imageData.count), forMetric: "Size")
storageRef.putData(imageData, metadata: metadata) { (metadata, error) in
if let error = error {
uploadTrace?.setValue("Failure", forAttribute: "Result")
uploadTrace?.stop()
Crashlytics.crashlytics().record(error: error)
}
else {
uploadTrace?.setValue("Success", forAttribute: "Result")
uploadTrace?.stop()
}
}