I thought after my last post of the Shader Analyser which output AMD ISA (Instruction Set Architecture) it was worth doing a little write up on what exactly that is and why it may be worth while to take the generated code into consideration when doing low level optimisation.
So to get started we will look at a simple example pixel shader. In all of these examples we will be using shader model 5.0 and will be building for Hawaii architecture.
struct PS_INPUT { float4 pos : SV_POSITION; float4 tex : TEXCOORD0; }; float4 psMain(PS_INPUT input) : SV_TARGET { return float4(1,1,1,1); }
Ok, so here we have our very basic Pixel Shader. It is taking an input structure from the Vertex Shader which is passing in a position but not using it and just writing out one into each channel.
So, we can look at this in three levels getting progressively lower: the ASM that is generated from DirectX, the AMD ISA and then the AMD IL (Input Language) which is the instructions actually passed in the GPU. So lets take a look at each of these:
DirectX ASM
ps_5_0 dcl_globalFlags refactoringAllowed dcl_output o0.xyzw mov o0.xyzw, l(1.000000,1.000000,1.000000,1.000000) ret
Here we can see that is declaring an output register (o0.xyzw) and then copying 1.0 into each channel at that address and returning. This is as straight forward as you can get. It makes no use of the position or texcoord data passed through to the shader as we haven't accessed them at all in the HLSL shader.
AMD ISA
shader psMain v_mov_b32 v0, 1.0 // 00000000: 7E0002F2 v_cvt_pkrtz_f16_f32 v0, v0, v0 // 00000004: 5E000100 s_nop 0x0000 // 00000008: BF800000 exp mrt0, v0, v0, v0, v0 done compr vm // 0000000C: F8001C0F 00000000 s_endpgm // 00000014: BF810000 end
So now we are getting more complex looking, but don't worry it is still very simple when you break it down! Here is a link to the all instructions. So lets take a look at this line by line.
Our first instruction is v_mov_b32 which the documentation says:
This means that instruction is just a move pretty much like the "mov" in the DirectX ASM. Where it is moving the value of 1.0 into the register v0.
Next we have the more intimidatingly named v_cvt_pkrtz_f16_f32, but not to worry again we will go to the documentation to see what this is:
So in the first instruction we stored a 32 bit value of 1.0 into the register v0. Now we are going to store two 16 bit values in that same register. And the two 16 bit values are both going to be the vlaue we stored in v0 initially converted into 16 bit. So this gives us a register which is storing two 16 bit values of 1.0. This is a little strange, but things tend to get a little bit strange the further down you go as you start seeing things the compiler has done to make the code run more optimally for its hardware.
Our next instruction is "s_nop" if you have worked with assembly before this may be familiar it it means no operation. The description from the documentation:
Now this is even more odd than before you must be thinking. Why on earth would a shader want to waste an instruction doing nothing? Well, this calls for us to dig further into the documentation where we will find this little bit of information:
S_SETREG is an instruction to write data to an internal hardware register, so this could be telling us that the reason for this s_nop may be that the compiler is adding the required s_nop as the next instruction is going to write to the same register as the instruction above. However, in this case I believe the s_nop is there to pad this shader to be 4 instructions and has been placed before the export instead of after it as an optimisation.
The next instruction is "exp" which our documentation tells us is the export function for this shader program. This is where the shader writes to the render targets. This line is a little more complex than the others so we can look at it one bit at a time.
The first bit to make sense of would be the words at the end: "done compr vm". These are each individual flags. The flag "done" is used to indicate that this is the last output to a render target from this program, "compr" is telling the GPU that this is 16bit per component rather than 32 bit and "vm" is saying that this is a valid mask for the wavefront and must be set at least once per pixel shader. I will go into wavefronts and what this means in move detail in a later post.
The next part of this line to take a look at is the "mrt0" this is telling the program to write into the first render target. This is specified in our HLSL shader where we set the output of psMain to write to SV_TARGET.
The last part of the line is the repeat of the register "v0". This is telling the program which value to write into each channel. "v0" currently contains two channels, each with a 16-bit value of 1.0 in it. And due to the "compr" flag only the first component is read.
Finally the last line of the AMD ISA is "s_endpgm" which is obviously the instruciton to the end the program but to be consistent here is the description from the documentation:
This is telling GPU to end this program and the wave wavefront. Pretty straight forward.
So you can see that the ISA is just the lower level version of the the DirectX ASM. It is doing the same things but is a little bit more explicit about how and we are beginning to see the quirks of the GPU come through.
AMD IL
This will be covered in the next post!