Skip to content

Don't observe performance improvement for built-in tests with propeller #3

Open
@uttampawar

Description

@uttampawar

Hi,
I'm not able to observe the performance benefit due to propeller toolchain for the included test program (main.cc, callee.cc). Followed the steps given in Propeller_RFC.pdf.

High level observations:

  1. Elapsed time doesn't show any improvement.
  2. cycles and instruction, branch mispredicts are almost same
  3. overall cache-misses are lower but L1-icache-load-misses are similar

$ time ./a.out.orig.labels 1000000000 2 >& /dev/null
real 0m21.094s
user 0m20.489s
sys 0m0.604s

$ time ./a.out.labels 1000000000 2 >& /dev/null
real 0m20.357s
user 0m19.908s
sys 0m0.448s

Elapsed time varies from 1 to 5%.

Perf data

$ perf stat -e cycles,instructions,cache-misses,L1-icache-load-misses,br_misp_retired.all_branches,br_inst_retired.all_branches,icache_64b.iftag_stall ./a.out.o
rig.labels 1000000000 1> /dev/null

Performance counter stats for './a.out.orig.labels 1000000000':

80,231,347,233      cycles                                                        (66.67%)

243,314,361,618 instructions # 3.03 insn per cycle (83.33%)
22,522 cache-misses (83.33%)
2,644,077 L1-icache-load-misses (83.33%)
20,400,061 br_misp_retired.all_branches (83.33%)
53,442,616,374 br_inst_retired.all_branches (83.34%)
68,554,744 icache_64b.iftag_stall (57.14%)

  21.191516400 seconds time elapsed

Optimized binary

$ perf stat -e cycles,instructions,cache-misses,L1-icache-load-misses,br_misp_retired.all_branches,br_inst_retired.all_branches,icache_64b.iftag_stall ./a.out.l
abels 1000000000 1> /dev/null

Performance counter stats for './a.out.labels 1000000000':

81,446,698,907      cycles                                                        (66.66%)

243,218,220,681 instructions # 2.99 insn per cycle (83.33%)
14,907 cache-misses (83.34%)
2,533,002 L1-icache-load-misses (83.34%)
20,571,010 br_misp_retired.all_branches (83.34%)
53,455,580,211 br_inst_retired.all_branches (83.33%)
68,847,492 icache_64b.iftag_stall (57.14%)

  21.512644234 seconds time elapsed

The referenced paper doesn't mention the benefit for the included test program. What is expected improvement for the included test?

Please see more details (build, runtime steps, etc.) in following gist.
https://gist.github.com/uttampawar/5407f998bc3f02f58c4b83b0b4dc20fe

Any hint is appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions