Course Project for How To Write Fast Numerical Code, ETH Spring 2019
├── data/ # scripts to download data
├── docs/ # project-related documents generated along the course.
│ ├── Report.pdf
│ └── Presentation.pdf
├── include/ # header files
├── src/ # source code
├── tests/ # configuration files for experiments
├── CMakeLists.txt
├── main.cpp # starts the program
└── README.mdmkdir build && cd build
cmake .. && make
ln -s ../data .
ln -s ../tests .
cd ..cd data/
chmod +x download.sh && sh download.sh
cd ..cd build/
./lambdamart tests/mslr.14.conf- The baseline implementation, that makes use of no optimization.
for feature in features:
for sample in samples:
update()
for candidate in candidates:
cumulate()
get_best_split()- Perform loop over all combinations of blocking & unrolling for
update&cumulate, which are implemented using AVX. - Allows for variable re-use for candidate in
update, resulting in reduction of cost incurred due to branching.
for feat_block in features:
for sample_block in samples:
for sample in sample_block:
for feature in feat_block:
update_using_avx()
for feature in feat_block:
for cand_block in candidates:
for bin in cand_block:
cumulate_using_avx()
for candidate in candidates:
for feature in feat_block:
get_best_split() - Perform loop over all combinations of blocking & unrolling for
update&cumulate - Allows for variable re-use for candidate in
update, resulting in reduction of cost incurred due to branching.
for feat_block in features:
for sample_block in samples:
for sample in sample_block:
for feature in feat_block:
update()
for feature in feat_block:
for cand_block in candidates:
for bin in cand_block:
cumulate()
for candidate in candidates:
for feature in feat_block:
get_best_split() - Similar to
feature_blocking_cumulator. - Replace double by float.
- Change intrinsics accordingly.
- Similar
feature_blocking_cumulator_no_avx. - Replace double by float.
- Algorithmic improvement.
- Changes dataset to dense representation.
- Makes algorithmic changes to deal with new format.