Have been thinking and reading more about parallelism recently. This set of slides from Guy Steele distilled a lot for me.

In order to realise parallel hardware performance we need to optimize our programs for computational bandwidth rather than latency. In terms of programming this means deprecating accumulation (cons, fold, streams, sequences) and favouring divide-and-conquer. This suggests a move to trees as the fundamental abstract building block for data.