I. Devices

#1. PC i9-9900K 64 Gb, x64: Intel Core i9-9900K, 8 core (16-threads) (x64), 64 Gb RAM, Ubuntu 20.04.6 Desktop (Linux 5.4.0 kernel), GCC 9.4.0, OpenMPI 4.0.3, "powersave" scaling governor (cpu freq increases from 800000 to 3600000 on load)
#2. Raspberry Pi 4 4Gb, Arm64: Broadcom BCM2711, Quad core Cortex-A72 (ARM v8), 4Gb RAM, 64bit Ubuntu 22.04.4 Desktop (Linux 5.15.0 kernel), GCC 11.4.0, OpenMPI 4.1.2, "performance" scaling governor, cpu freq 1500000, fan/screen/keyboard/mouse attached, 5V 2A power adapter
#3. StarFive VisionFive 8Gb, RISC-V: StarFive JH7100, Dual core SiFive U74 (RV64GC=rv64imafdc_zicsr_zifencei), 8 Gb RAM, Ubuntu 23.10 Server (Linux 6.5.0 kernel), GCC 13.2.0, OpenMPI X.X.X, scaling governor not available, default cpu freq, fan attached, 5V 2A power adapter
#4. Raspberry Pi 4 4Gb, Arm32 (hardfp): Broadcom BCM2711, Quad core Cortex-A72 (ARM v8), 4Gb RAM, 64bit Ubuntu 22.04.4 Desktop (Linux 5.15.0 kernel) with 32bit rootfs, GCC 11.4.0, OpenMPI 4.1.2, "performance" scaling governor, cpu freq 1500000, fan attached, 5V 2A power adapter
#5. StarFive VisionFive 2 8Gb, RISC-V: StarFive JH7110, Quad core SiFive U74 (RV64GC=rv64imafdc_zicsr_zifencei), 8 Gb RAM, Debian Bookwork from StarFive (Linux 6.1.31 kernel), GCC 12.2.0, OpenMPI 4.1.4, "performance" scaling governor, cpu freq 1500000, fan attached, 5V 2A power adapter

II. Notes

1. benchmark: vacuum, 3d 50x50x50, 300 time steps, total field scattered field 26x26x26, pml 30x30x30, no saving of result to disk, use Ca/Cb
2. 20 measurements for all cases, 20% of lowest and 20% of highest are discarded
3. for all measurements below 99% confidence interval is shown
4. sequentual build:
cmake .. -DCMAKE_BUILD_TYPE=Release -DSOLVER_DIM_MODES=DIM3 -DVALUE_TYPE=d -DCOMPLEX_FIELD_VALUES=ON -DPRINT_MESSAGE=ON -DPARALLEL_GRID=OFF
make fdtd3d
5. parallel build for X topology:
cmake .. -DCMAKE_BUILD_TYPE=Release -DSOLVER_DIM_MODES=DIM3 -DVALUE_TYPE=d -DCOMPLEX_FIELD_VALUES=ON -DPRINT_MESSAGE=ON -DPARALLEL_GRID=ON -DPARALLEL_GRID_DIMENSION=3 -DPARALLEL_BUFFER_DIMENSION=x -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc
make fdtd3d
6. parallel build for XY topology:
cmake .. -DCMAKE_BUILD_TYPE=Release -DSOLVER_DIM_MODES=DIM3 -DVALUE_TYPE=d -DCOMPLEX_FIELD_VALUES=ON -DPRINT_MESSAGE=ON -DPARALLEL_GRID=ON -DPARALLEL_GRID_DIMENSION=3 -DPARALLEL_BUFFER_DIMENSION=xy -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc
make fdtd3d
7. parallel build for XYZ topology:
cmake .. -DCMAKE_BUILD_TYPE=Release -DSOLVER_DIM_MODES=DIM3 -DVALUE_TYPE=d -DCOMPLEX_FIELD_VALUES=ON -DPRINT_MESSAGE=ON -DPARALLEL_GRID=ON -DPARALLEL_GRID_DIMENSION=3 -DPARALLEL_BUFFER_DIMENSION=xyz -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc
make fdtd3d
8. run sequential:
./fdtd3d --time-steps 300 --size x:50,y:50,z:50 --3d --angle-teta 0 --angle-phi 0 --angle-psi 0 --dx 0.0005 --wavelength 0.02 --log-level 0 --use-ca-cb --use-ca-cb-pml --pml-size x:10,y:10,z:10 --use-pml --use-tfsf --tfsf-size-left x:12,y:12,z:12 --tfsf-size-right x:12,y:12,z:12
9. run parallel with N processes:
mpiexec -n N ./fdtd3d --time-steps 300 --size x:50,y:50,z:50 --3d --angle-teta 0 --angle-phi 0 --angle-psi 0 --dx 0.0005 --wavelength 0.02 --log-level 0 --use-ca-cb --use-ca-cb-pml --pml-size x:10,y:10,z:10 --use-pml --use-tfsf --tfsf-size-left x:12,y:12,z:12 --tfsf-size-right x:12,y:12,z:12
10. SCORE is calculated as: number_of_timesteps*100/execution_time_in_seconds

III. Raw data

Release 1.2

- [Sequential] #1 PC i9-9900K x64: 16.0042 +- 0.0214422 seconds, binary size: 869592 byte
- [Best] #1 PC i9-9900K x64 (MPI, XYZ toplogy, 8 processes): 3.6075 +- 0.00865463 seconds, binary size: 1074488 byte
- [Sequential] #2 Raspberry Pi 4 Arm64: 61.7725 +- 0.564082 seconds, binary size: 735344 byte
- [Best] #2 Raspberry Pi 4 Arm64 (MPI, XY topology, 4 processes): 28.1033 +- 0.132836 seconds, binary size: 913320 byte
- [Sequential] #3 StarFive VisionFive RV64GC: 496.644 +- 2.85636 seconds, binary size: 686624 byte
- [Best] #3 StarFive VisionFive RV64GC (MPI, X topology, 2 processes): 289.362 +- 0.722665 seconds, binary size: 834440 byte
- [Sequential] #4 Raspberry Pi 4 Arm32 (hardfp): 74.4635 +- 0.532851 seconds, binary size: 598244 byte
- [Sequential] #5 StarFive VisionFive 2 RV64GC: 177.729 +- 0.12831 seconds, binary size: 621944 byte
- [Best] #5 StarFive VisionFive 2 RV64GC (MPI, XY topology, 4 processes): 53.2233 +- 0.0573871 seconds, binary size: 774488 byte

Release 1.1

- [Sequential] #1 PC i9-9900K x64: 16 +- 0.0341937 seconds, binary size: 869592 byte
- [Best] #1 PC i9-9900K x64 (MPI, XYZ topology, 8 processes): 3.61083 +- 0.00710943 seconds, binary size: 1074480
- [Sequential] #2 Raspberry Pi 4 Arm64: 61.6625 +- 0.431186 seconds, binary size: 735344 byte
- [Best] #2 Raspberry Pi 4 Arm64 (MPI, XY topology, 4 processes): 28.4108 +- 0.0975764 seconds, binary size: 913312
- [Best][Sequential] #3 StarFive VisionFive RV64GC: not supported
- [Sequential] #4 Raspberry Pi 4 Arm32 (hardfp): 74.853 +- 0.41344 seconds, binary size: 598244 byte
- [Best][Sequential] #5 StarFive VisionFive 2 RV64GC: not supported

Release 1.0

- [Sequential] #1 PC i9-9900K x64: 17.9 +- 0.0305838 seconds, binary size: 924048 byte
- [Best] #1 PC i9-9900K x64 (MPI, XYZ topology, 8 processes): 3.82167 +- 0.00643503 seconds, binary size: 1134176 byte
- [Best][Sequential] #2 Raspberry Pi 4 Arm64: not supported
- [Best][Sequential] #3 StarFive VisionFive RV64GC: not supported
- [Best][Sequential] #4 Raspberry Pi 4 Arm32 (hardfp): not supported
- [Best][Sequential] #5 StarFive VisionFive 2 RV64GC: not supported