summaryrefslogtreecommitdiff
path: root/simpleperf/doc/collect_etm_data_for_autofdo.md
diff options
context:
space:
mode:
Diffstat (limited to 'simpleperf/doc/collect_etm_data_for_autofdo.md')
-rw-r--r--simpleperf/doc/collect_etm_data_for_autofdo.md110
1 files changed, 109 insertions, 1 deletions
diff --git a/simpleperf/doc/collect_etm_data_for_autofdo.md b/simpleperf/doc/collect_etm_data_for_autofdo.md
index 145c0adf..2c001016 100644
--- a/simpleperf/doc/collect_etm_data_for_autofdo.md
+++ b/simpleperf/doc/collect_etm_data_for_autofdo.md
@@ -81,9 +81,18 @@ branch with count info for binary2
We need to split perf_inject.data, and make sure one file only contains info for one binary.
-Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile like below:
+Then we can use [AutoFDO](https://github.com/google/autofdo) to create profile. AutoFDO only works
+for binaries having an executable segment as its first loadable segment. But binaries built in
+Android may not follow this rule. Simpleperf inject command knows how to work around this problem.
+But there is a check in AutoFDO forcing binaries to start with an executable segment. We need to
+disable the check in AutoFDO, by commenting out L127-L136 in
+https://github.com/google/autofdo/commit/188db2834ce74762ed17108ca344916994640708#diff-2d132ecbb5e4f13e0da65419f6d1759dd27d6b696786dd7096c0c34d499b1710R127-R136.
+Then we can use `create_llvm_prof` in AutoFDO to create profiles used by clang.
```sh
+# perf_inject_binary1.data is split from perf_inject.data, and only contains branch info for binary1.
+host $ autofdo/create_llvm_prof -profile perf_inject_binary1.data -profiler text -binary path_of_binary1 -out a.prof -format binary
+
# perf_inject_kernel.data is split from perf_inject.data, and only contains branch info for [kernel.kallsyms].
host $ autofdo/create_llvm_prof -profile perf_inject_kernel.data -profiler text -binary vmlinux -out a.prof -format binary
```
@@ -91,6 +100,96 @@ host $ autofdo/create_llvm_prof -profile perf_inject_kernel.data -profiler text
Then we can use a.prof for PGO during compilation, via `-fprofile-sample-use=a.prof`.
[Here](https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers) are more details.
+### A complete example: etm_test_loop.cpp
+
+`etm_test_loop.cpp` is an example to show the complete process.
+The source code is in [etm_test_loop.cpp](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/runtest/etm_test_loop.cpp).
+The build script is in [Android.bp](https://android.googlesource.com/platform/system/extras/+/master/simpleperf/runtest/Android.bp).
+It builds an executable called `etm_test_loop`, which runs on device.
+
+Step 1: Build `etm_test_loop` binary.
+
+```sh
+(host) <AOSP>$ . build/envsetup.sh
+(host) <AOSP>$ lunch aosp_arm64-userdebug
+(host) <AOSP>$ make etm_test_loop
+```
+
+Step 2: Run `etm_test_loop` on device, and collect ETM data for its running.
+
+```sh
+(host) <AOSP>$ adb push out/target/product/generic_arm64/system/bin/etm_test_loop /data/local/tmp
+(host) <AOSP>$ adb root
+(host) <AOSP>$ adb shell
+(device) / # cd /data/local/tmp
+(device) /data/local/tmp # chmod a+x etm_test_loop
+(device) /data/local/tmp # simpleperf record -e cs-etm:u ./etm_test_loop
+simpleperf I cmd_record.cpp:729] Recorded for 0.0370068 seconds. Start post processing.
+simpleperf I cmd_record.cpp:799] Aux data traced: 1689136
+(device) /data/local/tmp # simpleperf inject -i perf.data --output branch-list -o branch_list.data
+simpleperf W dso.cpp:557] failed to read min virtual address of [vdso]: File not found
+(device) /data/local/tmp # exit
+(host) <AOSP>$ adb pull /data/local/tmp/branch_list.data
+```
+
+Step 3: Convert ETM data to AutoFDO data.
+
+```sh
+# Build simpleperf tool on host.
+(host) <AOSP>$ make simpleperf_ndk
+(host) <AOSP>$ simpleperf_ndk64 inject -i branch_list.data -o perf_inject_etm_test_loop.data --symdir out/target/product/generic_arm64/symbols/system/bin
+simpleperf W cmd_inject.cpp:505] failed to build instr ranges for binary [vdso]: File not found
+(host) <AOSP>$ cat perf_inject_etm_test_loop.data
+13
+1000-1010:1
+1014-1050:1
+...
+112c->0:1
+// /data/local/tmp/etm_test_loop
+
+(host) <AOSP>$ create_llvm_prof -profile perf_inject_etm_test_loop.data -profiler text -binary out/target/product/generic_arm64/symbols/system/bin/etm_test_loop -out etm_test_loop.afdo -format binary
+(host) <AOSP>$ ls -lh etm_test_loop.afdo
+rw-r--r-- 1 user group 241 Aug 29 16:04 etm_test_loop.afdo
+```
+
+Step 4: Use AutoFDO data to build optimized binary.
+
+```sh
+(host) <AOSP>$ mkdir toolchain/pgo-profiles/sampling/
+(host) <AOSP>$ cp etm_test_loop.afdo toolchain/pgo-profiles/sampling/
+(host) <AOSP>$ vi toolchain/pgo-profiles/sampling/Android.bp
+# edit Android.bp to add a fdo_profile module
+# soong_namespace {}
+#
+# fdo_profile {
+# name: "etm_test_loop_afdo",
+# profile: ["etm_test_loop.afdo"],
+# }
+```
+
+`soong_namespace` is added to support fdo_profile modules with the same name
+
+In a product config mk file, update `PRODUCT_AFDO_PROFILES` with
+
+```make
+PRODUCT_AFDO_PROFILES += etm_test_loop://toolchain/pgo-profiles/sampling:etm_test_loop_afdo
+```
+
+```sh
+(host) <AOSP>$ vi system/extras/simpleperf/runtest/Android.bp
+# edit Android.bp to enable afdo for etm_test_loop.
+# cc_binary {
+# name: "etm_test_loop",
+# srcs: ["etm_test_loop.cpp"],
+# afdo: true,
+# }
+(host) <AOSP>$ make etm_test_loop
+```
+
+If comparing the disassembly of `out/target/product/generic_arm64/symbols/system/bin/etm_test_loop`
+before and after optimizing with AutoFDO data, we can see different preferences when branching.
+
+
## Collect ETM data with a daemon
Android also has a daemon collecting ETM data periodically. It only runs on userdebug and eng
@@ -138,6 +237,15 @@ for ETR to store ETM data. Without IOMMU, the memory needs to be contiguous. If
fulfill the request, simpleperf will report out of memory error. Fortunately, we can use
"arm,scatter-gather" flag to let ETR run in scatter gather mode, which uses non-contiguous memory.
+
+### A possible problem: trace_id mismatch
+
+Each CPU has an ETM device, which has a unique trace_id assigned from the kernel.
+The formula is: `trace_id = 0x10 + cpu * 2`, as in https://github.com/torvalds/linux/blob/master/include/linux/coresight-pmu.h#L37.
+If the formula is modified by local patches, then simpleperf inject command can't parse ETM data
+properly and is likely to give empty output.
+
+
## Enable ETM in the bootloader
Unless ARMv8.4 Self-hosted Trace extension is implemented, ETM is considered as an external debug