1 files changed, 38 insertions, 0 deletions
diff --git a/content/zettel/2f1.md b/content/zettel/2f1.md
new file mode 100644
index 0000000..29e8cc4
--- /dev/null
+++ b/content/zettel/2f1.md
@@ -0,0 +1,38 @@
++++
+title = "Optimisation of vectorised code"
+date = "2022-04-29"
+author = "Yann Herklotz"
+tags = []
+categories = []
+backlinks = ["2f"]
+forwardlinks = ["3a8g", "2e1b"]
+zettelid = "2f1"
++++
+
+This is work by Caroline in Irisa/Inria Rennes. This is based on the
+idea of single program multiple data (SPMD), where you have a single
+program which is run on different threads and on different data. The
+idea then is that you can actually combine the threads again into one
+vectorised instruction (so multiple threads actually are executing one
+common vectorised instruction).
+
+You can use GSA ([\#3a8g], [\#2e1b]) to vectorise instructions by using
+a blend instruction which is based on the predicate. This basically
+means that if you are creating many threads and are somehow branching on
+the actual threads themselves (this could be the colour of the
+fragment), then you can still generate the same instructions for all the
+threads, but use blend instructions to select the correct results after
+the fact. In some way this is also speculation, but you just reroll the
+resulting case you took the wrong branch for a few of the threads.
+
+Then you can add skips by comparing the vector using `any` and `all`
+checks, which can always be checks to all 0s by performing the same
+check for all the threads.
+
+This is a really interesting use-case for GSA, because you are not
+really using the predicates that GSA generates to actually analyse the
+code, but you are using it dynamically to be able to vectorise as many
+of the instructions as possible.
+
+  [\#3a8g]: /zettel/3a8g
+  [\#2e1b]: /zettel/2e1b