Acceleration Structure Conversion by devshgraphicsprogramming · Pull Request #790 · Devsh-Graphics-Programming/Nabla

devshgraphicsprogramming · 2024-11-21T06:30:58Z

Description

Conversion of ICPU BLAS and TLAS to IGPU including building.

We may need to support a list of IGPUBLAS in IGPUTLAS for sanity/lifetime coupling, but only if update/rebuild is not allowed or something (need to make a separate issue out of it because I have no clue how that's gonna be structured).

Testing

Ray Query Example

TODO list:

Note that pointer/build param encoding stuff shouldn't be in the CPU side but don't touch anything. Also fix a typo, change the SRange to a std::span, and add default SPIR-V optimizer if none provided to asset converter.

…tion structures

…their storage. Change more stuff to span in `ICPUBottomLevelAccelerationStructure` Use a semantically better typedef/alias in `ILogicalDevice::createBottomLevelAccelerationStructure`

devshgraphicsprogramming · 2024-11-21T16:02:17Z

+	// finally the contents
+//TODO:	hasher << lookup.asset->getContentHash();


note to self, need to make the ICPUBottomLevelAccelerationStructure and IPreHashed

devshgraphicsprogramming · 2024-11-21T16:03:30Z

+				using mem_prop_f = IDeviceMemoryAllocation::E_MEMORY_PROPERTY_FLAGS;
+				const auto deviceBuildMemoryTypes = device->getPhysicalDevice()->getMemoryTypeBitsFromMemoryTypeFlags(mem_prop_f::EMPF_DEVICE_LOCAL_BIT);
+				const auto hostBuildMemoryTypes = device->getPhysicalDevice()->getMemoryTypeBitsFromMemoryTypeFlags(mem_prop_f::EMPF_DEVICE_LOCAL_BIT|mem_prop_f::EMPF_HOST_WRITABLE_BIT|mem_prop_f::EMPF_HOST_CACHED_BIT);
+
+				constexpr bool IsTLAS = std::is_same_v<AssetType,ICPUTopLevelAccelerationStructure>;
+				accelerationStructureParams[IsTLAS].resize(gpuObjects.size());
+				for (auto& entry : conversionRequests)
+				for (auto i=0ull; i<entry.second.copyCount; i++)
+				{
+					const auto* as = entry.second.canonicalAsset;
+					const auto& patch = dfsCache.nodes[entry.second.patchIndex.value].patch;
+					const bool motionBlur = as->usesMotion();
+					// we will need to temporarily store the build input buffers somewhere
+					size_t inputSize = 0;
+					ILogicalDevice::AccelerationStructureBuildSizes sizes = {};
+					{
+						const auto buildFlags = patch.getBuildFlags(as);
+						if constexpr (IsTLAS)
+						{
+							AssetVisitor<GetDependantVisit<ICPUTopLevelAccelerationStructure>> visitor = {
+								{visitBase},
+								{asset,uniqueCopyGroupID},
+								patch
+							};
+							if (!visitor())
+								continue;
+							const auto instanceCount = as->getInstances().size();
+							sizes = device->getAccelerationStructureBuildSizes(patch.hostBuild,buildFlags,motionBlur,instanceCount);
+							inputSize = (motionBlur ? sizeof(IGPUTopLevelAccelerationStructure::DevicePolymorphicInstance):sizeof(IGPUTopLevelAccelerationStructure::DeviceStaticInstance))*instanceCount;
+						}
+						else
+						{
+							const uint32_t* pMaxPrimitiveCounts = as->getGeometryPrimitiveCounts().data();
+							// the code here is not pretty, but DRY-ing is of this is for later
+							if (buildFlags.hasFlags(ICPUBottomLevelAccelerationStructure::BUILD_FLAGS::GEOMETRY_TYPE_IS_AABB_BIT))
+							{
+								const auto geoms = as->getAABBGeometries();
+								if (patch.hostBuild)
+								{
+									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>> cpuGeoms = {
+										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>*>(geoms.data()),geoms.size()
+									};
+									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+								}
+								else
+								{
+									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>> cpuGeoms = {
+										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>*>(geoms.data()),geoms.size()
+									};
+									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+									// TODO: check if the strides need to be aligned to 4 bytes for AABBs
+									for (const auto& geom : geoms)
+									if (const auto aabbCount=*(pMaxPrimitiveCounts++); aabbCount)
+										inputSize = core::roundUp(inputSize,sizeof(float))+aabbCount*geom.stride;
+								}
+							}
+							else
+							{
+								core::map<uint32_t,size_t> allocationsPerStride;
+								const auto geoms = as->getTriangleGeometries();
+								if (patch.hostBuild)
+								{
+									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>> cpuGeoms = {
+										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const IGPUBuffer>*>(geoms.data()),geoms.size()
+									};
+									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+								}
+								else
+								{
+									const std::span<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>> cpuGeoms = {
+										reinterpret_cast<const IGPUBottomLevelAccelerationStructure::Triangles<const ICPUBuffer>*>(geoms.data()),geoms.size()
+									};
+									sizes = device->getAccelerationStructureBuildSizes(buildFlags,motionBlur,cpuGeoms,pMaxPrimitiveCounts);
+									// TODO: check if the strides need to be aligned to 4 bytes for AABBs
+									for (const auto& geom : geoms)
+									if (const auto triCount=*(pMaxPrimitiveCounts++); triCount)
+									{
+										switch (geom.indexType)
+										{
+											case E_INDEX_TYPE::EIT_16BIT:
+												allocationsPerStride[sizeof(uint16_t)] += triCount*3;
+												break;
+											case E_INDEX_TYPE::EIT_32BIT:
+												allocationsPerStride[sizeof(uint32_t)] += triCount*3;
+												break;
+											default:
+												break;
+										}
+										size_t bytesPerVertex = geom.vertexStride;
+										if (geom.vertexData[1])
+											bytesPerVertex += bytesPerVertex;
+										allocationsPerStride[geom.vertexStride] += geom.maxVertex;
+									}
+								}
+								for (const auto& entry : allocationsPerStride)
+									inputSize = core::roundUp<size_t>(inputSize,entry.first)+entry.first*entry.second;
+							}
+						}
+					}
+					if (!sizes)
+						continue;
+					// this is where it gets a bit weird, we need to create a buffer to back the acceleration structure
+					IGPUBuffer::SCreationParams params = {};
+					constexpr size_t MinASBufferAlignment = 256u;
+					params.size = core::roundUp(sizes.accelerationStructureSize,MinASBufferAlignment);
+					params.usage = IGPUBuffer::E_USAGE_FLAGS::EUF_ACCELERATION_STRUCTURE_STORAGE_BIT|IGPUBuffer::E_USAGE_FLAGS::EUF_SHADER_DEVICE_ADDRESS_BIT;
+					// concurrent ownership if any
+					const auto outIx = i+entry.second.firstCopyIx;
+					const auto uniqueCopyGroupID = gpuObjUniqueCopyGroupIDs[outIx];
+					const auto queueFamilies =  inputs.getSharedOwnershipQueueFamilies(uniqueCopyGroupID,as,patch);
+					params.queueFamilyIndexCount = queueFamilies.size();
+					params.queueFamilyIndices = queueFamilies.data();
+					// we need to save the buffer in a side-channel for later
+					auto& out = accelerationStructureParams[IsTLAS][baseOffset+entry.second.firstCopyIx+i];
+					out = {
+						.storage = device->createBuffer(std::move(params)),
+						.scratchSize = sizes.buildScratchSize,
+						.motionBlur = motionBlur,
+						.compactAfterBuild = patch.compactAfterBuild,
+						.inputSize = inputSize
+					};


this needs some love from me

devshgraphicsprogramming · 2024-11-21T16:03:51Z

+			// This gets deferred till AFTER the Buffer Memory Allocations and Binding for Acceleration Structures
+			if constexpr (!std::is_same_v<AssetType,ICPUBottomLevelAccelerationStructure> && !std::is_same_v<AssetType,ICPUTopLevelAccelerationStructure>)
+				dfsCache.for_each([&](const instance_t<AssetType>& instance, dfs_cache<AssetType>::created_t& created)->void
 				{
+					auto& stagingCache = std::get<SReserveResult::staging_cache_t<AssetType>>(retval.m_stagingCaches);


need to pack up the lambda and defer it

devshgraphicsprogramming · 2024-11-21T16:04:20Z

+		// Deal with Deferred Creation of Acceleration structures
+		{
+			for (auto asLevel=0; asLevel<2; asLevel++)
+			{
+				// each of these stages must have a barrier inbetween
+				size_t scratchSizeFullParallelBuild = 0;
+				size_t scratchSizeFullParallelCompact = 0;
+				// we collect that stats AFTER making sure that the BLAS / TLAS can actually be created
+				for (const auto& deferredParams : accelerationStructureParams[asLevel])
+				{
+					// buffer failed to create/allocate
+					if (!deferredParams.storage.get())
+						continue;
+					IGPUAccelerationStructure::SCreationParams baseParams;
+					{
+						auto* buf = deferredParams.storage.get();
+						const auto bufSz = buf->getSize();
+						using create_f = IGPUAccelerationStructure::SCreationParams::FLAGS;
+						baseParams = {
+							.bufferRange = {.offset=0,.size=bufSz,.buffer=smart_refctd_ptr<IGPUBuffer>(buf)},
+							.flags = deferredParams.motionBlur ? create_f::MOTION_BIT:create_f::NONE
+						};
+					}
+					smart_refctd_ptr<IGPUAccelerationStructure> as;
+					if (asLevel)
+					{
+						as = device->createBottomLevelAccelerationStructure({baseParams,deferredParams.maxInstanceCount});
+					}
+					else
+					{
+						as = device->createTopLevelAccelerationStructure({baseParams,deferredParams.maxInstanceCount});
+					}
+					// note that in order to compact an AS you need to allocate a buffer range whose size is known only after the build
+					const auto buildSize = deferredParams.inputSize+deferredParams.scratchSize;
+					// sizes for building 1-by-1 vs parallel, note that
+					retval.m_minASBuildScratchSize = core::max(buildSize,retval.m_minASBuildScratchSize);
+					scratchSizeFullParallelBuild += buildSize;
+					if (deferredParams.compactAfterBuild)
+						scratchSizeFullParallelCompact += deferredParams.scratchSize;
+					// triangles, AABBs or Instance Transforms will need to be supplied from VRAM
+	// TODO: also mark somehow that we'll need a BUILD INPUT READ ONLY BUFFER WITH XFER usage
+					if (deferredParams.inputSize)
+						retval.m_queueFlags |= IQueue::FAMILY_FLAGS::TRANSFER_BIT;
+				}
+				// 
+				retval.m_maxASBuildScratchSize = core::max(core::max(scratchSizeFullParallelBuild,scratchSizeFullParallelCompact),retval.m_maxASBuildScratchSize);
+			}
+			//
+			if (retval.m_minASBuildScratchSize)
+			{
+				retval.m_queueFlags |= IQueue::FAMILY_FLAGS::COMPUTE_BIT;
+				retval.m_maxASBuildScratchSize = core::max(core::max(scratchSizeFullParallelBLASBuild,scratchSizeFullParallelBLASCompact),core::max(scratchSizeFullParallelTLASBuild,scratchSizeFullParallelTLASCompact));
+			}
+		}


needs some love from me

…Host builds

…ice and Host build requests separately

Also update comments about what ends up in `m_gpuObjects`

…h of typos

devshgraphicsprogramming · 2025-04-18T19:31:03Z

+		#ifndef _NBL_DEBUG
+			if (!params.optimizer)
+			{
+				using pass_e = asset::ISPIRVOptimizer::E_OPTIMIZER_PASS;
+				// shall we do others?
+				params.optimizer = core::make_smart_rectd_ptr<asset::ISPIRVOptimizer>({EOP_STRIP_DEBUG_INFO});
+			}
+		#endif


@kevyuu we should move this to your ISPIRVDebloater (or trimmer as I'd like to call it) and make it an option to not run the SPIR-V optimizer multiple times for no reason

devshgraphicsprogramming added 2 commits November 20, 2024 16:44

Decide on the patchable parameters for the TLAS and BLAS builds.

1e3f5dd

Note that pointer/build param encoding stuff shouldn't be in the CPU side but don't touch anything. Also fix a typo, change the SRange to a std::span, and add default SPIR-V optimizer if none provided to asset converter.

start going through the implementation

f065d7c

devshgraphicsprogramming assigned keptsecret Nov 21, 2024

devshgraphicsprogramming commented Nov 21, 2024

View reviewed changes

Comment thread src/nbl/video/utilities/CAssetConverter.cpp

devshgraphicsprogramming added 5 commits November 21, 2024 11:17

const correctness

3689118

shared ownership needs to be settled for the buffers backing acceelra…

215ee50

…tion structures

fix typos

19e3e57

Realize that compacted acceleration structures need an allocator for …

100ae7d

…their storage. Change more stuff to span in `ICPUBottomLevelAccelerationStructure` Use a semantically better typedef/alias in `ILogicalDevice::createBottomLevelAccelerationStructure`

start attempts at AS creation

0763416

devshgraphicsprogramming commented Nov 21, 2024

View reviewed changes

Comment thread src/nbl/video/utilities/CAssetConverter.cpp

devshgraphicsprogramming commented Nov 21, 2024

View reviewed changes

devshgraphicsprogramming requested review from AnastaZIuk, Erfan-Ahmadi and keptsecret November 21, 2024 16:05

devshgraphicsprogramming added 4 commits November 21, 2024 21:22

realize that scratches need to be provided separately for Device and …

11a141c

…Host builds

starting writing the building code, realize we need to bucket the Dev…

6675224

…ice and Host build requests separately

Figure out the TLAS/BLAS compaction logic and swap in cache.

8f43fef

Also update comments about what ends up in `m_gpuObjects`

fix a nasty possible threading bug with IDeferredOperation and a bunc…

a4307f4

…h of typos

devshgraphicsprogramming commented Apr 17, 2025

View reviewed changes

Comment thread include/nbl/video/utilities/CAssetConverter.h

devshgraphicsprogramming merged commit b9be039 into master Apr 18, 2025

devshgraphicsprogramming deleted the AS_conv branch April 18, 2025 16:02

devshgraphicsprogramming commented Apr 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acceleration Structure Conversion#790

Acceleration Structure Conversion#790
devshgraphicsprogramming merged 11 commits into
masterfrom
AS_conv

devshgraphicsprogramming commented Nov 21, 2024 •

edited

Loading

Uh oh!

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Uh oh!

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Uh oh!

Uh oh!

devshgraphicsprogramming Apr 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// finally the contents
		//TODO: hasher << lookup.asset->getContentHash();

Conversation

devshgraphicsprogramming commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

TODO list:

Uh oh!

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

devshgraphicsprogramming Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devshgraphicsprogramming Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devshgraphicsprogramming commented Nov 21, 2024 •

edited

Loading