diff options
author | commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> | 2014-05-01 17:41:32 +0000 |
---|---|---|
committer | commit-bot@chromium.org <commit-bot@chromium.org@2bbb7eff-a529-9590-31e7-b0007b416f81> | 2014-05-01 17:41:32 +0000 |
commit | 086e5ddc4c8c51d84e930fe4cdbc7eeba64d72f8 (patch) | |
tree | ee64dd2b8693997bd71d25b7040af72d901c77f2 | |
parent | 20c370477a7338e18c3d0156c21a4406e18b333f (diff) | |
download | include-086e5ddc4c8c51d84e930fe4cdbc7eeba64d72f8.tar.gz |
DM: Push GPU-parent child tasks to the front of the queue.
Like yesterday's change to run CPU-parent child tasks serially in thread, this
reduces peak memory usage by improving the temporaly locality of the bitmaps we
create.
E.g. Let's say we start with tasks A B C and D
Queue: [ A B C D ]
Running A creates A' and A", which depend on a bitmap created by A.
Queue: [ B C D A' A" * ]
That bitmap now needs sit around in RAM while B C and D run pointlessly and can
only be destroyed at *. If instead we do this and push dependent child tasks
to the front of the queue, the queue and bitmap lifetime looks like this:
Queue: [ A' A" * B C D ]
This is much, much worse in practice because the queue is often several thousand
tasks long. 100s of megs of bitmaps can pile up for 10s of seconds pointlessly.
To make this work we add addNext() to SkThreadPool and its cousin DMTaskRunner.
I also took the opportunity to swap head and tail in the threadpool
implementation so it matches the comments and intuition better: we always pop
the head, add() puts it at the tail, addNext() at the head.
Before
Debug: 49s, 1403352k peak
Release: 16s, 2064008k peak
After
Debug: 49s, 1234788k peak
Release: 15s, 1903424k peak
BUG=skia:2478
R=bsalomon@google.com, borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/263803003
git-svn-id: http://skia.googlecode.com/svn/trunk/include@14506 2bbb7eff-a529-9590-31e7-b0007b416f81
-rw-r--r-- | utils/SkThreadPool.h | 25 |
1 files changed, 22 insertions, 3 deletions
diff --git a/utils/SkThreadPool.h b/utils/SkThreadPool.h index a75bed8..295b1b4 100644 --- a/utils/SkThreadPool.h +++ b/utils/SkThreadPool.h @@ -50,6 +50,11 @@ public: void add(SkTRunnable<T>*); /** + * Same as add, but adds the runnable as the very next to run rather than enqueueing it. + */ + void addNext(SkTRunnable<T>*); + + /** * Block until all added SkRunnables have completed. Once called, calling add() is undefined. */ void wait(); @@ -66,6 +71,9 @@ public: kHalting_State, // There's no work to do and no thread is busy. All threads can shut down. }; + void addSomewhere(SkTRunnable<T>* r, + void (SkTInternalLList<LinkedRunnable>::*)(LinkedRunnable*)); + SkTInternalLList<LinkedRunnable> fQueue; SkCondVar fReady; SkTDArray<SkThread*> fThreads; @@ -111,7 +119,8 @@ struct ThreadLocal<void> { } // namespace SkThreadPoolPrivate template <typename T> -void SkTThreadPool<T>::add(SkTRunnable<T>* r) { +void SkTThreadPool<T>::addSomewhere(SkTRunnable<T>* r, + void (SkTInternalLList<LinkedRunnable>::* f)(LinkedRunnable*)) { if (r == NULL) { return; } @@ -126,11 +135,21 @@ void SkTThreadPool<T>::add(SkTRunnable<T>* r) { linkedRunnable->fRunnable = r; fReady.lock(); SkASSERT(fState != kHalting_State); // Shouldn't be able to add work when we're halting. - fQueue.addToHead(linkedRunnable); + (fQueue.*f)(linkedRunnable); fReady.signal(); fReady.unlock(); } +template <typename T> +void SkTThreadPool<T>::add(SkTRunnable<T>* r) { + this->addSomewhere(r, &SkTInternalLList<LinkedRunnable>::addToTail); +} + +template <typename T> +void SkTThreadPool<T>::addNext(SkTRunnable<T>* r) { + this->addSomewhere(r, &SkTInternalLList<LinkedRunnable>::addToHead); +} + template <typename T> void SkTThreadPool<T>::wait() { @@ -174,7 +193,7 @@ template <typename T> // We've got the lock back here, no matter if we ran wait or not. // The queue is not empty, so we have something to run. Claim it. - LinkedRunnable* r = pool->fQueue.tail(); + LinkedRunnable* r = pool->fQueue.head(); pool->fQueue.remove(r); |