Course Project for How To Write Fast Numerical Code, ETH Spring 2019
├── data/ # scripts to download data
├── docs/ # project-related documents generated along the course.
│ ├── Report.pdf
│ └── Presentation.pdf
├── include/ # header files
├── src/ # source code
├── tests/ # configuration files for experiments
├── CMakeLists.txt
├── main.cpp # starts the program
└── README.md
mkdir build && cd build
cmake .. && make
ln -s ../data .
ln -s ../tests .
cd ..
cd data/
chmod +x download.sh && sh download.sh
cd ..
cd build/
./lambdamart tests/mslr.14.conf
- The baseline implementation, that makes use of no optimization.
for feature in features:
for sample in samples:
update()
for candidate in candidates:
cumulate()
get_best_split()
- Perform loop over all combinations of blocking & unrolling for
update
&cumulate
, which are implemented using AVX. - Allows for variable re-use for candidate in
update
, resulting in reduction of cost incurred due to branching.
for feat_block in features:
for sample_block in samples:
for sample in sample_block:
for feature in feat_block:
update_using_avx()
for feature in feat_block:
for cand_block in candidates:
for bin in cand_block:
cumulate_using_avx()
for candidate in candidates:
for feature in feat_block:
get_best_split()
- Perform loop over all combinations of blocking & unrolling for
update
&cumulate
- Allows for variable re-use for candidate in
update
, resulting in reduction of cost incurred due to branching.
for feat_block in features:
for sample_block in samples:
for sample in sample_block:
for feature in feat_block:
update()
for feature in feat_block:
for cand_block in candidates:
for bin in cand_block:
cumulate()
for candidate in candidates:
for feature in feat_block:
get_best_split()
- Similar to
feature_blocking_cumulator
. - Replace double by float.
- Change intrinsics accordingly.
- Similar
feature_blocking_cumulator_no_avx
. - Replace double by float.
- Algorithmic improvement.
- Changes dataset to dense representation.
- Makes algorithmic changes to deal with new format.