GCC8 Scimark2 Benchmarks

tests performed with "gcc version 8.0.0 20170731 (experimental) [trunk revision 250741] (GCC)"

  march clock small large
      AVE FFT SOR MC SMM LU AVE FFT SOR MC SMM LU
i7 avx2 3.9 2993.41 2403.37 2252.19 848.16 2247.29 7216.05 2816.28 836.54 1970.81 851.69 2451.49 7970.88
avx2 lto 3.9 2974.45 2323.57 2254.27 1379.39 2332.57 6582.43 2935.20 836.34 1971.09 1379.20 2466.19 8023.16
SKL avx2 3.6 2811.45 2207.65 2118.52 800.70 2144.63 6785.77 2127.29 882.83 1853.39 802.09 2269.68 4828.43
avx2 lto 3.6 2747.22 2099.76 2109.37 1290.42 2178.55 6057.98 2212.26 865.45 1843.02 1299.96 2276.27 4776.59
512 3.5 2505.91 2037.61 2025.37 781.27 2082.87 5602.41 2078.20 868.10 1788.18 781.49 2380.19 4573.05
512 lto 3.5 2598.08 2015.90 2026.44 1203.36 2499.67 5245.06 2189.53 850.48 1793.74 1205.22 2379.30 4718.94

Memory alignment

Have aligned memory wth "sed -i 's/malloc(/aligned_alloc(64,/g' $f" : it affects heavily small LU.
for AVX the same effect can be obtained linking jemalloc

  march clock small large
      AVE FFT SOR MC SMM LU AVE FFT SOR MC SMM LU
i7 avx2 3.9 3490.31 2383.82 2241.28 848.80 2160.06 9817.57 2814.28 837.05 1973.92 846.87 2453.77 7959.81
avx2 lto 3.9 3438.88 2368.48 2245.34 1382.52 2338.45 8859.60 2951.04 837.50 1974.10 1382.42 2473.71 8087.46
SKL avx2 3.6 3246.77 2163.73 2103.33 796.58 2031.98 9138.25 2131.75 848.73 1851.69 796.71 2291.94 4869.70
avx2 lto 3.6 3167.70 2177.46 2086.69 1295.08 2170.37 8108.90 2223.13 888.72 1851.35 1296.81 2280.47 4798.27
512 3.5 3178.83 2100.91 2033.32 783.61 2088.48 8887.83 2131.25 873.05 1800.26 784.95 2438.98 4759.04
512 lto 3.5 3204.47 1771.32 2030.43 1208.49 2512.78 8499.33 2211.64 850.60 1801.03 1208.79 2399.70 4798.09

align memory to native scimark2

mkdir scimark2
cd scimark2/
wget http://math.nist.gov/scimark2/scimark2_1c.zip
unzip scimark2_1c.zip
gcc -Ofast -march=native *.c -lm
./a.out
foreach f ( *.c)
sed -i 's/malloc(/aligned_alloc(64,/g' $f
end
gcc -Ofast -march=native *.c -lm
./a.out

perf counter fo LU with align 64

normalized to cycles

  i7 avx2 SKL avx512 SKL avx2
  small large small large small large
arith_divider_active 0.0030 0.0000 0.0031 0.0000 0.0031 0.0000
branch-instructions 0.2662 0.0513 0.3740 0.0275 0.2702 0.0332
branch-misses 0.0005 0.0012 0.0005 0.0002 0.0005 0.0008
cycle_activity_stalls_mem_any 0.0330 0.3065 0.0396 0.6504 0.0287 0.4467
cycle_activity_stalls_total 0.0367 0.3098 0.0447 0.6521 0.0324 0.4492
cycles 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
fp_arith_inst_retired_128b_packed_double 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_128b_packed_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_256b_packed_double 0.6104 0.5232 0.0000 0.0000 0.6187 0.3390
fp_arith_inst_retired_256b_packed_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_512b_packed_double 0.0000 0.0000 0.2799 0.1797 0.0000 0.0000
fp_arith_inst_retired_512b_packed_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_scalar_double 0.0962 0.0079 0.3320 0.0098 0.0975 0.0050
fp_arith_inst_retired_scalar_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
iTLB-load-misses 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
iTLB-loads 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
icache_16b_ifdata_stall 0.0002 0.0011 0.0001 0.0007 0.0001 0.0017
instructions 2.6086 1.0115 2.9603 0.4112 2.6469 0.6545
mem_load_retired_l1_hit 0.7514 0.2984 0.7781 0.1127 0.7598 0.1865
mem_load_retired_l2_hit 0.1197 0.0514 0.1338 0.0117 0.1197 0.0055
mem_load_retired_l3_hit 0.0010 0.0435 0.0000 0.0790 0.0000 0.0754
mem_load_retired_l3_miss 0.0000 0.0033 0.0000 0.0000 0.0000 0.0000
offcore_requests_outstanding_demand_data_rd_ge_6 0.0003 0.1028 0.0000 0.3924 0.0000 0.4438
resource_stalls_any 0.2338 0.5983 0.1816 0.8331 0.2251 0.7044
rs_events_empty_cycles 0.0036 0.0029 0.0051 0.0013 0.0038 0.0025
task-clock 0.2566 0.2633 0.3081 0.3103 0.2797 0.2802
uops_executed_cycles_ge_1_uop_exec 0.9641 0.6896 0.9557 0.3480 0.9673 0.5491
uops_executed_cycles_ge_2_uops_exec 0.8867 0.4883 0.8671 0.1670 0.8931 0.2913
uops_executed_cycles_ge_3_uops_exec 0.6975 0.2405 0.7148 0.0541 0.7093 0.1087
uops_executed_cycles_ge_4_uops_exec 0.3952 0.0776 0.5075 0.0125 0.4087 0.0219
uops_executed_stall_cycles 0.0365 0.3088 0.0446 0.6522 0.0323 0.4500

normalized to instructions

  i7 avx2 SKL avx512 SKL avx2
  small large small large small large
arith_divider_active 0.0012 0.0000 0.0010 0.0001 0.0012 0.0000
branch-instructions 0.1021 0.0507 0.1263 0.0668 0.1021 0.0508
branch-misses 0.0002 0.0012 0.0002 0.0004 0.0002 0.0012
cycle_activity_stalls_mem_any 0.0126 0.3030 0.0134 1.5819 0.0108 0.6825
cycle_activity_stalls_total 0.0141 0.3063 0.0151 1.5860 0.0122 0.6863
cycles 0.3833 0.9887 0.3378 2.4321 0.3778 1.5280
fp_arith_inst_retired_128b_packed_double 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_128b_packed_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_256b_packed_double 0.2340 0.5173 0.0000 0.0000 0.2338 0.5180
fp_arith_inst_retired_256b_packed_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_512b_packed_double 0.0000 0.0000 0.0945 0.4370 0.0000 0.0000
fp_arith_inst_retired_512b_packed_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
fp_arith_inst_retired_scalar_double 0.0369 0.0078 0.1122 0.0239 0.0368 0.0077
fp_arith_inst_retired_scalar_single 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
iTLB-load-misses 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
iTLB-loads 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
icache_16b_ifdata_stall 0.0001 0.0010 0.0000 0.0018 0.0000 0.0026
instructions 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
mem_load_retired_l1_hit 0.2880 0.2950 0.2628 0.2740 0.2871 0.2849
mem_load_retired_l2_hit 0.0459 0.0508 0.0452 0.0285 0.0452 0.0084
mem_load_retired_l3_hit 0.0004 0.0430 0.0000 0.1921 0.0000 0.1153
mem_load_retired_l3_miss 0.0000 0.0033 0.0000 0.0000 0.0000 0.0000
offcore_requests_outstanding_demand_data_rd_ge_6 0.0001 0.1017 0.0000 0.9544 0.0000 0.6780
resource_stalls_any 0.0896 0.5915 0.0614 2.0263 0.0850 1.0763
rs_events_empty_cycles 0.0014 0.0029 0.0017 0.0031 0.0014 0.0038
task-clock 0.0984 0.2603 0.1041 0.7548 0.1057 0.4281
uops_executed_cycles_ge_1_uop_exec 0.3696 0.6818 0.3228 0.8465 0.3654 0.8390
uops_executed_cycles_ge_2_uops_exec 0.3399 0.4827 0.2929 0.4062 0.3374 0.4451
uops_executed_cycles_ge_3_uops_exec 0.2674 0.2378 0.2414 0.1316 0.2680 0.1661
uops_executed_cycles_ge_4_uops_exec 0.1515 0.0767 0.1714 0.0304 0.1544 0.0335
uops_executed_stall_cycles 0.0140 0.3053 0.0151 1.5863 0.0122 0.6876

-- VincenzoInnocente - 2017-08-02

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2017-08-08 - VincenzoInnocente
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback