GCC8 Scimark2 Benchmarks
tests performed with
"gcc version 8.0.0 20170731 (experimental) [trunk revision 250741] (GCC)"
|
march |
clock |
small |
large |
|
|
|
AVE |
FFT |
SOR |
MC |
SMM |
LU |
AVE |
FFT |
SOR |
MC |
SMM |
LU |
i7 |
avx2 |
3.9 |
2993.41 |
2403.37 |
2252.19 |
848.16 |
2247.29 |
7216.05 |
2816.28 |
836.54 |
1970.81 |
851.69 |
2451.49 |
7970.88 |
avx2 lto |
3.9 |
2974.45 |
2323.57 |
2254.27 |
1379.39 |
2332.57 |
6582.43 |
2935.20 |
836.34 |
1971.09 |
1379.20 |
2466.19 |
8023.16 |
SKL |
avx2 |
3.6 |
2811.45 |
2207.65 |
2118.52 |
800.70 |
2144.63 |
6785.77 |
2127.29 |
882.83 |
1853.39 |
802.09 |
2269.68 |
4828.43 |
avx2 lto |
3.6 |
2747.22 |
2099.76 |
2109.37 |
1290.42 |
2178.55 |
6057.98 |
2212.26 |
865.45 |
1843.02 |
1299.96 |
2276.27 |
4776.59 |
512 |
3.5 |
2505.91 |
2037.61 |
2025.37 |
781.27 |
2082.87 |
5602.41 |
2078.20 |
868.10 |
1788.18 |
781.49 |
2380.19 |
4573.05 |
512 lto |
3.5 |
2598.08 |
2015.90 |
2026.44 |
1203.36 |
2499.67 |
5245.06 |
2189.53 |
850.48 |
1793.74 |
1205.22 |
2379.30 |
4718.94 |
Memory alignment
Have aligned memory wth "sed -i 's/malloc(/aligned_alloc(64,/g' $f" : it affects heavily small LU.
for AVX the same effect can be obtained linking jemalloc
|
march |
clock |
small |
large |
|
|
|
AVE |
FFT |
SOR |
MC |
SMM |
LU |
AVE |
FFT |
SOR |
MC |
SMM |
LU |
i7 |
avx2 |
3.9 |
3490.31 |
2383.82 |
2241.28 |
848.80 |
2160.06 |
9817.57 |
2814.28 |
837.05 |
1973.92 |
846.87 |
2453.77 |
7959.81 |
avx2 lto |
3.9 |
3438.88 |
2368.48 |
2245.34 |
1382.52 |
2338.45 |
8859.60 |
2951.04 |
837.50 |
1974.10 |
1382.42 |
2473.71 |
8087.46 |
SKL |
avx2 |
3.6 |
3246.77 |
2163.73 |
2103.33 |
796.58 |
2031.98 |
9138.25 |
2131.75 |
848.73 |
1851.69 |
796.71 |
2291.94 |
4869.70 |
avx2 lto |
3.6 |
3167.70 |
2177.46 |
2086.69 |
1295.08 |
2170.37 |
8108.90 |
2223.13 |
888.72 |
1851.35 |
1296.81 |
2280.47 |
4798.27 |
512 |
3.5 |
3178.83 |
2100.91 |
2033.32 |
783.61 |
2088.48 |
8887.83 |
2131.25 |
873.05 |
1800.26 |
784.95 |
2438.98 |
4759.04 |
512 lto |
3.5 |
3204.47 |
1771.32 |
2030.43 |
1208.49 |
2512.78 |
8499.33 |
2211.64 |
850.60 |
1801.03 |
1208.79 |
2399.70 |
4798.09 |
align memory to native scimark2
mkdir scimark2
cd scimark2/
wget http://math.nist.gov/scimark2/scimark2_1c.zip
unzip scimark2_1c.zip
gcc -Ofast -march=native *.c -lm
./a.out
foreach f ( *.c)
sed -i 's/malloc(/aligned_alloc(64,/g' $f
end
gcc -Ofast -march=native *.c -lm
./a.out
perf counter fo LU with align 64
normalized to cycles
|
i7 avx2 |
SKL avx512 |
SKL avx2 |
|
small |
large |
small |
large |
small |
large |
arith_divider_active |
0.0030 |
0.0000 |
0.0031 |
0.0000 |
0.0031 |
0.0000 |
branch-instructions |
0.2662 |
0.0513 |
0.3740 |
0.0275 |
0.2702 |
0.0332 |
branch-misses |
0.0005 |
0.0012 |
0.0005 |
0.0002 |
0.0005 |
0.0008 |
cycle_activity_stalls_mem_any |
0.0330 |
0.3065 |
0.0396 |
0.6504 |
0.0287 |
0.4467 |
cycle_activity_stalls_total |
0.0367 |
0.3098 |
0.0447 |
0.6521 |
0.0324 |
0.4492 |
cycles |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
fp_arith_inst_retired_128b_packed_double |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_128b_packed_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_256b_packed_double |
0.6104 |
0.5232 |
0.0000 |
0.0000 |
0.6187 |
0.3390 |
fp_arith_inst_retired_256b_packed_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_512b_packed_double |
0.0000 |
0.0000 |
0.2799 |
0.1797 |
0.0000 |
0.0000 |
fp_arith_inst_retired_512b_packed_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_scalar_double |
0.0962 |
0.0079 |
0.3320 |
0.0098 |
0.0975 |
0.0050 |
fp_arith_inst_retired_scalar_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
iTLB-load-misses |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
iTLB-loads |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
icache_16b_ifdata_stall |
0.0002 |
0.0011 |
0.0001 |
0.0007 |
0.0001 |
0.0017 |
instructions |
2.6086 |
1.0115 |
2.9603 |
0.4112 |
2.6469 |
0.6545 |
mem_load_retired_l1_hit |
0.7514 |
0.2984 |
0.7781 |
0.1127 |
0.7598 |
0.1865 |
mem_load_retired_l2_hit |
0.1197 |
0.0514 |
0.1338 |
0.0117 |
0.1197 |
0.0055 |
mem_load_retired_l3_hit |
0.0010 |
0.0435 |
0.0000 |
0.0790 |
0.0000 |
0.0754 |
mem_load_retired_l3_miss |
0.0000 |
0.0033 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
offcore_requests_outstanding_demand_data_rd_ge_6 |
0.0003 |
0.1028 |
0.0000 |
0.3924 |
0.0000 |
0.4438 |
resource_stalls_any |
0.2338 |
0.5983 |
0.1816 |
0.8331 |
0.2251 |
0.7044 |
rs_events_empty_cycles |
0.0036 |
0.0029 |
0.0051 |
0.0013 |
0.0038 |
0.0025 |
task-clock |
0.2566 |
0.2633 |
0.3081 |
0.3103 |
0.2797 |
0.2802 |
uops_executed_cycles_ge_1_uop_exec |
0.9641 |
0.6896 |
0.9557 |
0.3480 |
0.9673 |
0.5491 |
uops_executed_cycles_ge_2_uops_exec |
0.8867 |
0.4883 |
0.8671 |
0.1670 |
0.8931 |
0.2913 |
uops_executed_cycles_ge_3_uops_exec |
0.6975 |
0.2405 |
0.7148 |
0.0541 |
0.7093 |
0.1087 |
uops_executed_cycles_ge_4_uops_exec |
0.3952 |
0.0776 |
0.5075 |
0.0125 |
0.4087 |
0.0219 |
uops_executed_stall_cycles |
0.0365 |
0.3088 |
0.0446 |
0.6522 |
0.0323 |
0.4500 |
normalized to instructions
|
i7 avx2 |
SKL avx512 |
SKL avx2 |
|
small |
large |
small |
large |
small |
large |
arith_divider_active |
0.0012 |
0.0000 |
0.0010 |
0.0001 |
0.0012 |
0.0000 |
branch-instructions |
0.1021 |
0.0507 |
0.1263 |
0.0668 |
0.1021 |
0.0508 |
branch-misses |
0.0002 |
0.0012 |
0.0002 |
0.0004 |
0.0002 |
0.0012 |
cycle_activity_stalls_mem_any |
0.0126 |
0.3030 |
0.0134 |
1.5819 |
0.0108 |
0.6825 |
cycle_activity_stalls_total |
0.0141 |
0.3063 |
0.0151 |
1.5860 |
0.0122 |
0.6863 |
cycles |
0.3833 |
0.9887 |
0.3378 |
2.4321 |
0.3778 |
1.5280 |
fp_arith_inst_retired_128b_packed_double |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_128b_packed_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_256b_packed_double |
0.2340 |
0.5173 |
0.0000 |
0.0000 |
0.2338 |
0.5180 |
fp_arith_inst_retired_256b_packed_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_512b_packed_double |
0.0000 |
0.0000 |
0.0945 |
0.4370 |
0.0000 |
0.0000 |
fp_arith_inst_retired_512b_packed_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
fp_arith_inst_retired_scalar_double |
0.0369 |
0.0078 |
0.1122 |
0.0239 |
0.0368 |
0.0077 |
fp_arith_inst_retired_scalar_single |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
iTLB-load-misses |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
iTLB-loads |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
icache_16b_ifdata_stall |
0.0001 |
0.0010 |
0.0000 |
0.0018 |
0.0000 |
0.0026 |
instructions |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
mem_load_retired_l1_hit |
0.2880 |
0.2950 |
0.2628 |
0.2740 |
0.2871 |
0.2849 |
mem_load_retired_l2_hit |
0.0459 |
0.0508 |
0.0452 |
0.0285 |
0.0452 |
0.0084 |
mem_load_retired_l3_hit |
0.0004 |
0.0430 |
0.0000 |
0.1921 |
0.0000 |
0.1153 |
mem_load_retired_l3_miss |
0.0000 |
0.0033 |
0.0000 |
0.0000 |
0.0000 |
0.0000 |
offcore_requests_outstanding_demand_data_rd_ge_6 |
0.0001 |
0.1017 |
0.0000 |
0.9544 |
0.0000 |
0.6780 |
resource_stalls_any |
0.0896 |
0.5915 |
0.0614 |
2.0263 |
0.0850 |
1.0763 |
rs_events_empty_cycles |
0.0014 |
0.0029 |
0.0017 |
0.0031 |
0.0014 |
0.0038 |
task-clock |
0.0984 |
0.2603 |
0.1041 |
0.7548 |
0.1057 |
0.4281 |
uops_executed_cycles_ge_1_uop_exec |
0.3696 |
0.6818 |
0.3228 |
0.8465 |
0.3654 |
0.8390 |
uops_executed_cycles_ge_2_uops_exec |
0.3399 |
0.4827 |
0.2929 |
0.4062 |
0.3374 |
0.4451 |
uops_executed_cycles_ge_3_uops_exec |
0.2674 |
0.2378 |
0.2414 |
0.1316 |
0.2680 |
0.1661 |
uops_executed_cycles_ge_4_uops_exec |
0.1515 |
0.0767 |
0.1714 |
0.0304 |
0.1544 |
0.0335 |
uops_executed_stall_cycles |
0.0140 |
0.3053 |
0.0151 |
1.5863 |
0.0122 |
0.6876 |
--
VincenzoInnocente - 2017-08-02