summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRobert Carr <racarr@google.com>2021-12-31 16:59:34 -0800
committerRobert Carr <racarr@google.com>2022-01-04 13:12:02 -0800
commitaca25f60537453114bfe7f0f8587e551438ca7a5 (patch)
treeac926cab429f42ae7d775ad874ce463d862dd81a
parent22b6d239c14a65af4b60d15ade8517e768c5f2b3 (diff)
downloadnative-aca25f60537453114bfe7f0f8587e551438ca7a5.tar.gz
DO NOT MERGE: BlastBufferQueue: Fake release if not received by complete
Perfetto telemetry shows a cluster where the transactionComplete callback is arriving but the release buffer callback is not arriving. This leads to blocking and ANRs. We try to work around this by generating a fake release buffer callback for previous buffers when we receive a TransactionCompleteCallback. How can we know this is safe? The criteria to safely release buffers are as follows: Condition 1. SurfaceFlinger does not plan to use the buffer Condition 2. Have the acquire fence (or a later signalling fence)if dropped, have the HWC release fence if presented. Now lets break down cases where the transaction complete callback arrives before the release buffer callback. All release buffer callbacks fall in to two categories: Category 1. In band with the complete callback. In which case the client library in SurfaceComposerClient always sends release callbacks first. Category 2. Out of band, e.g. from setBuffer or merge when dropping buffers In category 2 there are two scenarios: Scenario 1: Release callback was never going to arrive (bug) Scenario 2. Release callback descheduled, e.g. a. Transaction is filled with buffer and held in VRI/WM/SysUI b. A second transaction with buffer is merged in, the release buffer callback is emitted but not scheduled (async binder) c. The merged transaction is applied d. SurfaceFlinger processes a frame and emits the transaction complete callback e. The transaction complete callback is scheduled before the release callback is ever scheduled (since the 1 way binders are from different processes). In both scenarios, we can satisfy Conditions 1 and 2, because the HWC present fence is a later signalling fence than the acquire fence which the out of band callback will pass. While we may block extra on this fence. We will be safe by condition 2. Receiving the transaction complete callback should indicate condition 1 is satisfied. In category 1 there should only be a single scenario, the release buffer callback for frame N-1 is emitted immediately before the transaction callback for frame N, and so if it hasn't come at that time (and isn't out of band/scheduled e.g. category 2) then it will never come. We can further break this down in to two scenarios: 1. stat.previousReleaseFence is set correctly. This is the expected case. In this case, this is the same fence we would receive from the release buffer callback, and so we can satisfy condition 1 and 2 and safely release. 2. stat.previousReleaseFence is not set correctly. There is no known reason for this to happen, but there is also no known reason why we would receive a complete callback without the corresponding release, and so we have to explore the option as a potential risk/behavior change from the change. Hypothetically in category 1 case 2 we could experience some tearing. In category 1 we are currently experiencing ANR, and so the tradeoff is a risk of tearing vs a certainty of ANR. To tear the following would have to happen: 1. Enter the case where we previously ANRed 2. Enter the subcase where the release fence is not set correctly 3. Be scheduled very fast on the client side, in practicality HWC present has already returned and the fence should be firing very soon 4. The previous frame not be GL comp, in which case we were already done with the buffer and don't need the fence anyway. Conditions 2 and 3 should already make the occurence of tearing much lower than the occurence of ANRs. Furthermore 100% of observed ANRs were during GL comp (rotation animation) and so the telemetry indicates the risk of introducing tearing is very low. To review, we have shown that we can break down the side effects from the change in to two buckets (category 1, and category 2). In category 2, the possible negative side effect is to use too late of a fence. However this is still safe, and should be exceedingly rare. In category 1 we have shown that there are two scenarios, and in one of these scenarios we can experience tearing. We have furthermore argued that the risk of tearing should be exceedingly low even in this scenario, while the risk of ANR in this scenario was nearly 100%. This issue is not appearing in git_master branches and so the DO_NOT_MERGE tag. Bug: 197269223 Bug: 212846697 Test: Existing tests pass Change-Id: I757ed132ac076aa811df816e04a68f57b38e47e7
-rw-r--r--libs/gui/BLASTBufferQueue.cpp24
-rw-r--r--libs/gui/SurfaceComposerClient.cpp4
-rw-r--r--libs/gui/include/gui/BLASTBufferQueue.h8
-rw-r--r--libs/gui/include/gui/SurfaceComposerClient.h6
-rw-r--r--libs/gui/include/gui/test/CallbackUtils.h2
5 files changed, 37 insertions, 7 deletions
diff --git a/libs/gui/BLASTBufferQueue.cpp b/libs/gui/BLASTBufferQueue.cpp
index 76636c2b9e..2104c77275 100644
--- a/libs/gui/BLASTBufferQueue.cpp
+++ b/libs/gui/BLASTBufferQueue.cpp
@@ -350,6 +350,16 @@ void BLASTBufferQueue::transactionCallback(nsecs_t /*latchTime*/, const sp<Fence
transactionCompleteCallback = std::move(mTransactionCompleteCallback);
mTransactionCompleteFrameNumber = 0;
}
+ std::vector<ReleaseCallbackId> staleReleases;
+ for (const auto& [key, value]: mSubmitted) {
+ if (currFrameNumber > key.framenumber) {
+ staleReleases.push_back(key);
+ }
+ }
+ for (const auto& staleRelease : staleReleases) {
+ releaseBufferCallbackLocked(staleRelease, stat.previousReleaseFence ? stat.previousReleaseFence : Fence::NO_FENCE,
+ stat.transformHint, stat.currentMaxAcquiredBufferCount);
+ }
} else {
BQA_LOGE("Failed to find matching SurfaceControl in transactionCallback");
}
@@ -358,6 +368,7 @@ void BLASTBufferQueue::transactionCallback(nsecs_t /*latchTime*/, const sp<Fence
"empty.");
}
+
decStrong((void*)transactionCallbackThunk);
}
@@ -399,8 +410,14 @@ void BLASTBufferQueue::flushShadowQueue() {
void BLASTBufferQueue::releaseBufferCallback(const ReleaseCallbackId& id,
const sp<Fence>& releaseFence, uint32_t transformHint,
uint32_t currentMaxAcquiredBufferCount) {
- ATRACE_CALL();
std::unique_lock _lock{mMutex};
+ releaseBufferCallbackLocked(id, releaseFence, transformHint, currentMaxAcquiredBufferCount);
+}
+
+void BLASTBufferQueue::releaseBufferCallbackLocked(const ReleaseCallbackId& id,
+ const sp<Fence>& releaseFence, uint32_t transformHint,
+ uint32_t currentMaxAcquiredBufferCount) {
+ ATRACE_CALL();
BQA_LOGV("releaseBufferCallback %s", id.to_string().c_str());
if (mSurfaceControl != nullptr) {
@@ -421,7 +438,10 @@ void BLASTBufferQueue::releaseBufferCallback(const ReleaseCallbackId& id,
const auto numPendingBuffersToHold =
isEGL ? std::max(0u, mMaxAcquiredBuffers - currentMaxAcquiredBufferCount) : 0;
- mPendingRelease.emplace_back(ReleasedBuffer{id, releaseFence});
+ auto rb = ReleasedBuffer{id, releaseFence};
+ if (std::find(mPendingRelease.begin(), mPendingRelease.end(), rb) == mPendingRelease.end()) {
+ mPendingRelease.emplace_back(rb);
+ }
// Release all buffers that are beyond the ones that we need to hold
while (mPendingRelease.size() > numPendingBuffersToHold) {
diff --git a/libs/gui/SurfaceComposerClient.cpp b/libs/gui/SurfaceComposerClient.cpp
index 922bceb191..725ea6571d 100644
--- a/libs/gui/SurfaceComposerClient.cpp
+++ b/libs/gui/SurfaceComposerClient.cpp
@@ -296,7 +296,7 @@ void TransactionCompletedListener::onTransactionCompleted(ListenerStats listener
transactionStats.latchTime, surfaceStats.acquireTime,
transactionStats.presentFence,
surfaceStats.previousReleaseFence, surfaceStats.transformHint,
- surfaceStats.eventStats);
+ surfaceStats.eventStats, surfaceStats.currentMaxAcquiredBufferCount);
}
callbackFunction(transactionStats.latchTime, transactionStats.presentFence,
@@ -321,7 +321,7 @@ void TransactionCompletedListener::onTransactionCompleted(ListenerStats listener
transactionStats.latchTime, surfaceStats.acquireTime,
transactionStats.presentFence,
surfaceStats.previousReleaseFence, surfaceStats.transformHint,
- surfaceStats.eventStats);
+ surfaceStats.eventStats, surfaceStats.currentMaxAcquiredBufferCount);
if (callbacksMap[callbackId].surfaceControls[surfaceStats.surfaceControl]) {
callbacksMap[callbackId]
.surfaceControls[surfaceStats.surfaceControl]
diff --git a/libs/gui/include/gui/BLASTBufferQueue.h b/libs/gui/include/gui/BLASTBufferQueue.h
index c1d389160c..24d978a7e9 100644
--- a/libs/gui/include/gui/BLASTBufferQueue.h
+++ b/libs/gui/include/gui/BLASTBufferQueue.h
@@ -95,6 +95,8 @@ public:
const std::vector<SurfaceControlStats>& stats);
void releaseBufferCallback(const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
uint32_t transformHint, uint32_t currentMaxAcquiredBufferCount);
+ void releaseBufferCallbackLocked(const ReleaseCallbackId& id, const sp<Fence>& releaseFence,
+ uint32_t transformHint, uint32_t currentMaxAcquiredBufferCount);
void setNextTransaction(SurfaceComposerClient::Transaction *t);
void mergeWithNextTransaction(SurfaceComposerClient::Transaction* t, uint64_t frameNumber);
void setTransactionCompleteCallback(uint64_t frameNumber,
@@ -164,6 +166,12 @@ private:
struct ReleasedBuffer {
ReleaseCallbackId callbackId;
sp<Fence> releaseFence;
+ bool operator==(const ReleasedBuffer& rhs) const {
+ // Only compare Id so if we somehow got two callbacks
+ // with different fences we don't decrement mNumAcquired
+ // too far.
+ return rhs.callbackId == callbackId;
+ }
};
std::deque<ReleasedBuffer> mPendingRelease GUARDED_BY(mMutex);
diff --git a/libs/gui/include/gui/SurfaceComposerClient.h b/libs/gui/include/gui/SurfaceComposerClient.h
index 9af9e64062..0d1d1a37bd 100644
--- a/libs/gui/include/gui/SurfaceComposerClient.h
+++ b/libs/gui/include/gui/SurfaceComposerClient.h
@@ -57,14 +57,15 @@ class Region;
struct SurfaceControlStats {
SurfaceControlStats(const sp<SurfaceControl>& sc, nsecs_t latchTime, nsecs_t acquireTime,
const sp<Fence>& presentFence, const sp<Fence>& prevReleaseFence,
- uint32_t hint, FrameEventHistoryStats eventStats)
+ uint32_t hint, FrameEventHistoryStats eventStats, uint32_t currentMaxAcquiredBufferCount)
: surfaceControl(sc),
latchTime(latchTime),
acquireTime(acquireTime),
presentFence(presentFence),
previousReleaseFence(prevReleaseFence),
transformHint(hint),
- frameEventStats(eventStats) {}
+ frameEventStats(eventStats),
+ currentMaxAcquiredBufferCount(currentMaxAcquiredBufferCount) {}
sp<SurfaceControl> surfaceControl;
nsecs_t latchTime = -1;
@@ -73,6 +74,7 @@ struct SurfaceControlStats {
sp<Fence> previousReleaseFence;
uint32_t transformHint = 0;
FrameEventHistoryStats frameEventStats;
+ uint32_t currentMaxAcquiredBufferCount = 0;
};
using TransactionCompletedCallbackTakesContext =
diff --git a/libs/gui/include/gui/test/CallbackUtils.h b/libs/gui/include/gui/test/CallbackUtils.h
index 64032087eb..d2e04268dc 100644
--- a/libs/gui/include/gui/test/CallbackUtils.h
+++ b/libs/gui/include/gui/test/CallbackUtils.h
@@ -135,7 +135,7 @@ private:
void verifySurfaceControlStats(const SurfaceControlStats& surfaceControlStats,
nsecs_t latchTime) const {
const auto& [surfaceControl, latch, acquireTime, presentFence, previousReleaseFence,
- transformHint, frameEvents] = surfaceControlStats;
+ transformHint, frameEvents, ignore] = surfaceControlStats;
ASSERT_EQ(acquireTime > 0, mBufferResult == ExpectedResult::Buffer::ACQUIRED)
<< "bad acquire time";