content/zettel/2f1.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

+++
title = "Optimisation of vectorised code"
date = "2022-04-29"
author = "Yann Herklotz"
tags = []
categories = []
backlinks = ["2f"]
forwardlinks = ["3a8g", "2e1b"]
zettelid = "2f1"
+++

This is work by Caroline in Irisa/Inria Rennes. This is based on the
idea of single program multiple data (SPMD), where you have a single
program which is run on different threads and on different data. The
idea then is that you can actually combine the threads again into one
vectorised instruction (so multiple threads actually are executing one
common vectorised instruction).

You can use GSA ([\#3a8g], [\#2e1b]) to vectorise instructions by using
a blend instruction which is based on the predicate. This basically
means that if you are creating many threads and are somehow branching on
the actual threads themselves (this could be the colour of the
fragment), then you can still generate the same instructions for all the
threads, but use blend instructions to select the correct results after
the fact. In some way this is also speculation, but you just reroll the
resulting case you took the wrong branch for a few of the threads.

Then you can add skips by comparing the vector using `any` and `all`
checks, which can always be checks to all 0s by performing the same
check for all the threads.

This is a really interesting use-case for GSA, because you are not
really using the predicates that GSA generates to actually analyse the
code, but you are using it dynamically to be able to vectorise as many
of the instructions as possible.

  [\#3a8g]: /zettel/3a8g
  [\#2e1b]: /zettel/2e1b