simpleperf/doc/view_the_profile.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342

# View the profile

[TOC]

## Introduction

After using `simpleperf record` or `app_profiler.py`, we get a profile data file. The file contains
a list of samples. Each sample has a timestamp, a thread id, a callstack, events (like cpu-cycles
or cpu-clock) used in this sample, etc. We have many choices for viewing the profile. We can show
samples in chronological order, or show aggregated flamegraphs. We can show reports in text format,
or in some interactive UIs.

Below shows some recommended UIs to view the profile. Google developers can find more examples in
[go/gmm-profiling](go/gmm-profiling?polyglot=linux-workstation#viewing-the-profile).


## Continuous PProf UI (great flamegraph UI, but only available internally)

[PProf](https://github.com/google/pprof) is a mature profiling technology used extensively on
Google servers, with a powerful flamegraph UI, with strong drilldown, search, pivot, profile diff,
and graph visualisation.

![Example](./pictures/continuous_pprof.png)

We can use `pprof_proto_generator.py` to convert profiles into pprof.profile protobufs for use in
pprof.

```
# Output all threads, broken down by threadpool.
./pprof_proto_generator.py

# Use proguard mapping.
./pprof_proto_generator.py --proguard-mapping-file proguard.map

# Just the main (UI) thread (query by thread name):
./pprof_proto_generator.py --comm com.example.android.displayingbitmaps
```

This will print some debug logs about Failed to read symbols: this is usually OK, unless those
symbols are hotspots.

Upload pprof.profile to http://pprof/ UI:

```
# Upload all threads in profile, grouped by threadpool.
# This is usually a good default, combining threads with similar names.
pprof --flame --tagroot threadpool pprof.profile

# Upload all threads in profile, grouped by individual thread name.
pprof --flame --tagroot thread pprof.profile

# Upload all threads in profile, without grouping by thread.
pprof --flame pprof.profile
This will output a URL, example: https://pprof.corp.google.com/?id=589a60852306144c880e36429e10b166
```

## Firefox Profiler (great chronological UI)

We can view Android profiles using Firefox Profiler: https://profiler.firefox.com/. This does not
require Firefox installation -- Firefox Profiler is just a website, you can open it in any browser.

![Example](./pictures/firefox_profiler.png)

Firefox Profiler has a great chronological view, as it doesn't pre-aggregate similar stack traces
like pprof does.

We can use `gecko_profile_generator.py` to convert raw perf.data files into a Firefox Profile, with
Proguard deobfuscation.

```
# Create Gecko Profile
./gecko_profile_generator.py | gzip > gecko_profile.json.gz

# Create Gecko Profile using Proguard map
./gecko_profile_generator.py --proguard-mapping-file proguard.map | gzip > gecko_profile.json.gz
```

Then drag-and-drop gecko_profile.json.gz into https://profiler.firefox.com/.

Firefox Profiler supports:

1. Aggregated Flamegraphs
2. Chronological Stackcharts

And allows filtering by:

1. Individual threads
2. Multiple threads (Ctrl+Click thread names to select many)
3. Timeline period
4. Stack frame text search

## FlameScope (great jank-finding UI)

[Netflix's FlameScope](https://github.com/Netflix/flamescope) is a rough, proof-of-concept UI that
lets you spot repeating patterns of work by laying out the profile as a subsecond heatmap.

Below, each vertical stripe is one second, and each cell is 10ms. Redder cells have more samples.
See https://www.brendangregg.com/blog/2018-11-08/flamescope-pattern-recognition.html for how to
spot patterns.

This is an example of a 60s DisplayBitmaps app Startup Profile.

![Example](./pictures/flamescope.png)

You can see:

  The thick red vertical line on the left is startup.
  The long white vertical sections on the left shows the app is mostly idle, waiting for commands
  from instrumented tests.
  Then we see periodically red blocks, which shows the app is periodically busy handling commands
  from instrumented tests.

Click the start and end cells of a duration:

![Example](./pictures/flamescope_click.png)

To see a flamegraph for that duration:

![Example](./pictures/flamescope_flamegraph.png)

Install and run Flamescope:

```
git clone https://github.com/Netflix/flamescope ~/flamescope
cd ~/flamescope
pip install -r requirements.txt
npm install
npm run webpack
python3 run.py
```

Then open FlameScope in-browser: http://localhost:5000/.

FlameScope can read gzipped perf script format profiles. Convert simpleperf perf.data to this
format with `report_sample.py`, and place it in Flamescope's examples directory:

```
# Create `Linux perf script` format profile.
report_sample.py | gzip > ~/flamescope/examples/my_simpleperf_profile.gz

# Create `Linux perf script` format profile using Proguard map.
report_sample.py \
  --proguard-mapping-file proguard.map \
  | gzip > ~/flamescope/examples/my_simpleperf_profile.gz
```

Open the profile "as Linux Perf", and click start and end sections to get a flamegraph of that
timespan.

To investigate UI Thread Jank, filter to UI thread samples only:

```
report_sample.py \
  --comm com.example.android.displayingbitmaps \ # UI Thread
  | gzip > ~/flamescope/examples/uithread.gz
```

Once you've identified the timespan of interest, consider also zooming into that section with
Firefox Profiler, which has a more powerful flamegraph viewer.

## Differential FlameGraph

See Brendan Gregg's [Differential Flame Graphs](https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html) blog.

Use Simpleperf's `stackcollapse.py` to convert perf.data to Folded Stacks format for the FlameGraph
toolkit.

Consider diffing both directions: After minus Before, and Before minus After.

If you've recorded before and after your optimisation as perf_before.data and perf_after.data, and
you're only interested in the UI thread:

```
# Generate before and after folded stacks from perf.data files
./stackcollapse.py --kernel --jit -i perf_before.data \
  --proguard-mapping-file proguard_before.map \
  --comm com.example.android.displayingbitmaps \
  > perf_before.folded
./stackcollapse.py --kernel --jit -i perf_after.data \
  --proguard-mapping-file proguard_after.map \
  --comm com.example.android.displayingbitmaps \
  > perf_after.folded

# Generate diff reports
FlameGraph/difffolded.pl -n perf_before.folded perf_after.folded \
  | FlameGraph/flamegraph.pl > diff1.svg
FlameGraph/difffolded.pl -n --negate perf_after.folded perf_before.folded \
  | FlameGraph/flamegraph.pl > diff2.svg
```

## Android Studio Profiler

Android Studio Profiler supports recording and reporting profiles of app processes. It supports
several recording methods, including one using simpleperf as backend. You can use Android Studio
Profiler for both recording and reporting.

In Android Studio:
Open View -> Tool Windows -> Profiler
Click + -> Your Device -> Profileable Processes -> Your App

![Example](./pictures/android_studio_profiler_select_process.png)

Click into "CPU" Chart

Choose Callstack Sample Recording. Even if you're using Java, this provides better observability,
into ART, malloc, and the kernel.

![Example](./pictures/android_studio_profiler_select_recording_method.png)

Click Record, run your test on the device, then Stop when you're done.

Click on a thread track, and "Flame Chart" to see a chronological chart on the left, and an
aggregated flamechart on the right:

![Example](./pictures/android_studio_profiler_flame_chart.png)

If you want more flexibility in recording options, or want to add proguard mapping file, you can
record using simpleperf, and report using Android Studio Profiler.

We can use `simpleperf report-sample` to convert perf.data to trace files for Android Studio
Profiler.

```
# Convert perf.data to perf.trace for Android Studio Profiler.
# If on Mac/Windows, use simpleperf host executable for those platforms instead.
bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace

# Convert perf.data to perf.trace using proguard mapping file.
bin/linux/x86_64/simpleperf report-sample --show-callchain --protobuf -i perf.data -o perf.trace \
    --proguard-mapping-file proguard.map
```

In Android Studio: Open File -> Open -> Select perf.trace

![Example](./pictures/android_studio_profiler_open_perf_trace.png)


## Simpleperf HTML Report

Simpleperf can generate its own HTML Profile, which is able to show Android-specific information
and separate flamegraphs for all threads, with a much rougher flamegraph UI.

![Example](./pictures/report_html.png)

This UI is fairly rough; we recommend using the Continuous PProf UI or Firefox Profiler instead. But
it's useful for a quick look at your data.

Each of the following commands take as input ./perf.data and output ./report.html.

```
# Make an HTML report.
./report_html.py

# Make an HTML report with Proguard mapping.
./report_html.py --proguard-mapping-file proguard.map
```

This will print some debug logs about Failed to read symbols: this is usually OK, unless those
symbols are hotspots.

See also [report_html.py's README](scripts_reference.md#report_htmlpy) and `report_html.py -h`.


## PProf Interactive Command Line

Unlike Continuous PProf UI, [PProf](https://github.com/google/pprof) command line is publicly
available, and allows drilldown, pivoting and filtering.

The below session demonstrates filtering to stack frames containing processBitmap.

```
$ pprof pprof.profile
(pprof) show=processBitmap
(pprof) top
Active filters:
   show=processBitmap
Showing nodes accounting for 2.45s, 11.44% of 21.46s total
      flat  flat%   sum%        cum   cum%
     2.45s 11.44% 11.44%      2.45s 11.44%  com.example.android.displayingbitmaps.util.ImageFetcher.processBitmap
```

And then showing the tags of those frames, to tell what threads they are running on:

```
(pprof) tags
 pid: Total 2.5s
      2.5s (  100%): 31112

 thread: Total 2.5s
         1.4s (57.21%): AsyncTask #3
         1.1s (42.79%): AsyncTask #4

 threadpool: Total 2.5s
             2.5s (  100%): AsyncTask #%d

 tid: Total 2.5s
      1.4s (57.21%): 31174
      1.1s (42.79%): 31175
```

Contrast with another method:

```
(pprof) show=addBitmapToCache
(pprof) top
Active filters:
   show=addBitmapToCache
Showing nodes accounting for 1.05s, 4.88% of 21.46s total
      flat  flat%   sum%        cum   cum%
     1.05s  4.88%  4.88%      1.05s  4.88%  com.example.android.displayingbitmaps.util.ImageCache.addBitmapToCache
```

For more information, see the [pprof README](https://github.com/google/pprof/blob/master/doc/README.md#interactive-terminal-use).


## Simpleperf Report Command Line

The simpleperf report command reports profiles in text format.

![Example](./pictures/report_command.png)

You can call `simpleperf report` directly or call it via `report.py`.

```
# Report symbols in table format.
$ ./report.py --children

# Report call graph.
$ bin/linux/x86_64/simpleperf report -g -i perf.data
```

See also [report command's README](executable_commands_reference.md#The-report-command) and
`report.py -h`.


## Custom Report Interface

If the above View UIs can't fulfill your need, you can use `simpleperf_report_lib.py` to parse
perf.data, extract sample information, and feed it to any views you like.

See [simpleperf_report_lib.py's README](scripts_reference.md#simpleperf_report_libpy) for more
details.