This repository contains the code and scripts used to analyze the performance of different matrix transposition implementations. The project focuses on optimizing matrix transposition operations using various techniques, including sequential methods, implicit parallelism with compiler optimizations and explicit parallelism with OpenMP. For further information (Problem Statement and Performance Analysis) check-out reportID_235252.pdf.
NB: MobaXterm was utilized to establish an SSH connection from my local Windows system to the university’s cluster
matrix-transposition/
│
├── README.md
└── d1_project/
├── mainCode/ # Main code for the project
│ ├── build.sh # Script to build the code
│ ├── run.sh # Script to run the experiments
│ ├── implicitScript.sh # Script to test implicit method with different flags
│ ├── serial.c # Contains 'serial' method
│ ├── serial.h
│ ├── implicit.c # Contains 'implicit' method
│ ├── implicit.h
│ ├── explicit.c # Contains 'omp' method
│ ├── explicit.h
│ ├── explicitblock.c # Contains 'omp2' method
│ ├── explicitblock.h
│ ├── main.c # Main function
│ ├── utils.c # Utility functions
│ └── utils.h
├── explicitTest/ # For testing explicit implementations
│ ├── buildExp.sh # Script to build and link files
│ ├── runExp.sh # Script to run the program
│ ├── explicit.c # Contains 'omp' method
│ ├── explicit.h
│ ├── explicit1.c # Contains 'omp1' method
│ ├── explicit1.h
│ ├── explicit2.c # Contains 'omp2' method
│ ├── explicit2.h
│ ├── explicit3.c # Contains 'omp3' method
│ └── explicit3.h
│
└── submit.pbs # PBS script to run the main experiments
- C Compiler: GCC 9.1.0 (or compatible version with OpenMP support)
- OpenMP: For parallel implementations
- Bash: For executing shell scripts
To reproduce the results and obtain the performance measurements for:
- the best four methods (
serial,implicitwith blocking,omp, andomp2) ==>transpose_times.cvs, - the various combinations of flags I used to compile the implicit.c file ==>
implicitVersus.cvs, - the different ways in which I implemented the explicit parallelization method ==>
explicitVersus.cvs,
follow these steps:
-
Navigate to the Project Directoy
cd matrix-transposition/d1_projectConvert to Unix format the
submit.pbsfile and ensure it's executabledos2unix submit.pbs chmod +x submit.pbsNavigate to
mainCode/and convert to unix format the bash scriptscd mainCode dos2unix build.sh run.sh implicitScript.shNavigate to
explicitTest/and convert to unix format the bash scriptscd ../explicitTest dos2unix buildExp.sh runExp.sh -
Navigate to the Project Directory:
cd .. -
Submit the PBS Job:
qsub submit.pbsNote:
- The
submit.pbsscript will:- Load the required GCC module
- Compile the code using
build.shlocated inmainCode/ - Run the experiment using
run.shandimplicitScript.shlocated inmainCode/ - Compile the code using
buildExp.shlocated inexplicitTest/ - Run the experiment using
runExp.shlocated inexplicitTest/ - The output files
transpose_times.csv,implicitVersus.csvandimplicitVersus.csvwill be created in thed1_project/directory
- The
-
Retrieve the Output Files from
d1_project/:transpose_times.csvcontains measurements for the best four methods (serial,implicit,explicit,explicitblock) across different matrix sizes and thread counts.implicitVersus.csvcontains measurements for various combinations of flags across different matrix sizes. Flag combinations used include:noflagsO1O1 -march_nativeO2O2 -funroll-loopsO2 -fprefetch-loop-arrayO2 -ftree-vectorizeO2 -march=nativeO2 -funroll-loops -march=native``O2 -ftree-vectorize -march=native-O2 -march=native -ftree-vectorize -funroll-loops-O2 -funroll-loops -fprefetch-loop-arrays -ftree-vectorize -march=native
explicit_times.csv: Contains measurements for the explicit methods (omp,omp1,omp2,omp3) across different matrix sizes and thread counts.- Standard output and error logs (
matrix_transpose.outandmatrix_transpose.err).
If you prefer to run the code interactively:
-
Open an Interactive Session:
qsub -I -q short_cpu -l select=1:ncpus=96:ompthreads=96:mem=512mb -
Navigate to the
mainCode/Directory:cd matrix-transposition/d1_project/mainCode -
Load the GCC Module:
module load gcc91 -
Compile the Code:
chmod +x build.sh ./build.sh -
Run the Experiments Using one of the methods specified:
a) Run one of the four methods (
serial,implicit,explicitblockorexplicit) with a specified exponent (from4to12)export OMP_NUM_THREADS='number of threads'; ./$OUTPUT 'exponent' 'method'b) Run all four methods with a specified exponent (from
4to12)export OMP_NUM_THREADS='number of threads'; ./$OUTPUT 'exponent'c) Run the same script that the .pbs is running
chmod +x run.sh ./run.sh- The
transpose_times.csvfile will be created in the parent directoryd1_project/.
- The
To analyze and compare different explicit methods (omp, omp1, omp2, omp3), you can use the scripts in the explicitTest/ directory.
-
Navigate to the
explicitTest/Directory:cd matrix-transposition/d1_project/explicitTest -
Load the GCC Module:
module load gcc91 -
Compile the Explicit Methods:
- Make sure the
buildExp.shscript is executable:chmod +x buildExp.sh ./buildExp.sh
- Make sure the
-
Run the Experiments:
- Make sure the
runExp.shscript is executable:chmod +x runExp.sh ./runExp.sh
- Make sure the
-
Output:
- The measurements will be saved in
explicit_times.csvlocated in thed1_project/directory. - Each command is executed 5 times to allow calculation of average times and plotting of graphs in Python.
- The measurements will be saved in
To test the implicit method with various compiler optimization flags, use the implicitScript.sh script in the mainCode/ directory.
-
Navigate to the
mainCode\Directory:cd matrix-transposition/d1_project/mainCode -
Load the GCC Module:
module load gcc91
-
Make the Script Executable:
chmod +x implicitScript.sh
-
Run the Script:
./implicitScript.sh
This script compiles the implicit method with different combinations of compiler flags and measures the execution time.
-
Output:
- The measurements will be saved in
implicit_times.csvlocated in thed1_project/directory.
- The measurements will be saved in
transpose_times.csv: Contains measurements for the best four methods (serial,implicit,explicit,explicitblock) across different matrix sizes and thread counts. Generated byrun.shinmainCode/.explicit_times.csv: Contains measurements for the explicit methods (omp,omp1,omp2,omp3) across different matrix sizes and thread counts. Generated byrunExp.shinexplicitTest/.implicit_times.csv: Contains measurements for the implicit method compiled with different compiler flags across various matrix sizes. Generated byimplicitScript.shinmainCode/.
Note: All output files are created in the d1_project/ directory (parent directory of mainCode/ and explicitTest/).
- Measurement Repetition: In the initial configurations of the scripts, each experiment was conducted five times to accommodate variability in execution times. However, this component has been omitted to ensure that the files generated upon executing the .pbs file contain single, non-repeated measurements, thereby facilitating easier interpretation and analysis.
- Plotting Results: The generated
.csvfiles can be used to plot graphs and analyze the performance of different methods using tools like Python'smatplotliborpandas. - Environment Setup:
- Ensure that the GCC 9.1.0 module (
gcc91) is available on your system. - The scripts assume a Unix-like environment with Bash shell.
- Ensure that the GCC 9.1.0 module (
- Permissions:
- Before running any scripts, make sure they have executable permissions (
chmod +x script.sh).
- Before running any scripts, make sure they have executable permissions (