[AMDGPU][NFC] DWARF vector composite location description operations (#71623) (0e42df40) · Commits · llvm-doe / llvm-project

llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst

+27 −4

Original line number	Diff line number	Diff line
		@@ -448,11 +448,34 @@ See ``DW_AT_LLVM_vector_size`` in :ref:`amdgpu-dwarf-base-type-entries`.

		AMDGPU optimized code may spill vector registers to non-global address space
		memory, and this spilling may be done only for SIMT lanes that are active on
		entry to the subprogram.

		To support this, a composite location description that can be created as a
		masked select is required. In addition, an operation that creates a composite
		entry to the subprogram. To support this the CFI rule for the partially spilled
		register needs to use an expression that uses the EXEC register as a bit mask to
		select between the register (for inactive lanes) and the stack spill location
		(for active lanes that are spilled). This needs to evaluate to a location
		description, and not a value, as a debugger needs to change the value if the
		user assigns to the variable.

		Another usage is to create an expression that evaluates to provide a vector of
		logical PCs for active and inactive lanes in a SIMT execution model. Again the
		EXEC register is used to select between active and inactive PC values. In order
		to represent a vector of PC values, a way to create a composite location
		description that is a vector of a single location is used.

		It may be possible to use existing DWARF to incrementally build the composite
		location description, possibly using the DWARF operations for control flow to
		create a loop. However, for the AMDGPU that would require loop iteration of 64.
		A concern is that the resulting DWARF would have a significant size and would be
		reasonably common as it is needed for every vector register that is spilled in a
		function. AMDGPU can have up to 512 vector registers. Another concern is the
		time taken to evaluate such non-trivial expressions repeatedly.

		To avoid these issues, a composite location description that can be created as a
		masked select is proposed. In addition, an operation that creates a composite
		location description that is a vector on another location description is needed.
		These operations generate the composite location description using a single
		DWARF operation that combines all lanes of the vector in one step. The DWARF
		expression is more compact, and can be evaluated by a consumer far more
		efficiently.

		An example that uses these operations is referenced in the
		:ref:`amdgpu-dwarf-further-examples` appendix.