+++ title = "Optimisation of vectorised code" date = "2022-04-29" author = "Yann Herklotz" tags = [] categories = [] backlinks = ["2f"] forwardlinks = ["3a8g", "2e1b"] zettelid = "2f1" +++ This is work by Caroline in Irisa/Inria Rennes. This is based on the idea of single program multiple data (SPMD), where you have a single program which is run on different threads and on different data. The idea then is that you can actually combine the threads again into one vectorised instruction (so multiple threads actually are executing one common vectorised instruction). You can use GSA ([\#3a8g], [\#2e1b]) to vectorise instructions by using a blend instruction which is based on the predicate. This basically means that if you are creating many threads and are somehow branching on the actual threads themselves (this could be the colour of the fragment), then you can still generate the same instructions for all the threads, but use blend instructions to select the correct results after the fact. In some way this is also speculation, but you just reroll the resulting case you took the wrong branch for a few of the threads. Then you can add skips by comparing the vector using `any` and `all` checks, which can always be checks to all 0s by performing the same check for all the threads. This is a really interesting use-case for GSA, because you are not really using the predicates that GSA generates to actually analyse the code, but you are using it dynamically to be able to vectorise as many of the instructions as possible. [\#3a8g]: /zettel/3a8g [\#2e1b]: /zettel/2e1b