Make use of hardware linear interpolation in a GPU to sample 2 pixels
with a single texture access inside the blur shaders by sampling between
both pixels based on their relative weight.
This is significantly easier for a single dimension as 2D bilinear
filtering would raise additional constraints on the kernels (not single
zero-entries, no zero-diagonals, ...) which require additional checks
with limited improvements. Therfore, only use interpolation along the
larger dimension should be a sufficient improvement.
Using this will effectively half the number of texture accesses and
additions needed for a kernel. E.g. a 1D-pass of the gaussian blur
with radius 15 will only need 16 samples instead of 31.