ํ‹ฐ์Šคํ† ๋ฆฌ ๋ทฐ

Github

LukasFratzl/TurboSequence: Skeletal Based GPU Crowds for UE5 ๐Ÿš€

๊ฐœ์š”

TurboSequence๋Š” Niagara Mesh Particles๋ฅผ ์‚ฌ์šฉํ•˜๋˜, Skeletal Mesh๊ฐ€ ์•„๋‹Œ Static Mesh๋ฅผ ๋ Œ๋”๋งํ•ฉ๋‹ˆ๋‹ค. ํ•ต์‹ฌ ํŠธ๋ฆญ์€ ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ…์Šค์ฒ˜์— ๋ฒ ์ดํฌํ•˜๊ณ , GPU Compute Shader + Vertex Shader์—์„œ ์‹ค์‹œ๊ฐ„ ์Šคํ‚ค๋‹์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์™œ Niagara SkeletalMesh Instance๊ฐ€ ์•„๋‹Œ๊ฐ€?

Niagara์˜ SkeletalMesh ๋ Œ๋”๋Ÿฌ๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ ๊ฐ ์ธ์Šคํ„ด์Šค๋งˆ๋‹ค ๊ฐœ๋ณ„ draw call์„ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. TurboSequence๋Š” ์ด๋ฅผ ์šฐํšŒํ•˜์—ฌ:

  • Static Mesh๋ฅผ Niagara Mesh Particles๋กœ ๋ Œ๋”๋ง (๋‹จ์ผ instanced draw call)
  • ์• ๋‹ˆ๋ฉ”์ด์…˜์„ GPU์—์„œ ๊ณ„์‚ฐํ•˜์—ฌ CPU ์˜ค๋ฒ„ํ—ค๋“œ ์ œ๊ฑฐ

ํŒŒ์ดํ”„๋ผ์ธ ์ „์ฒด ๊ตฌ์กฐ

1๋‹จ๊ณ„: ์˜คํ”„๋ผ์ธ ๋ณ€ํ™˜ (Content Pipeline)

// TurboSequence_ControlPanelLibrary_Lf.cpp - ConvertSkeletalMeshToTurboSequence_BlueprintThreadSafe

1.1 Static Mesh ์ƒ์„ฑ

// ์›๋ณธ Skeletal Mesh์˜ ๊ฐ LOD๋ฅผ Static Mesh๋กœ ๋ณ€ํ™˜
for (int32 i = 0; i < SkeletalMesh->GetLODNum(); ++i)
{
    UStaticMesh* StaticMesh = NewObject<UStaticMesh>();
    // Skeletal Mesh์˜ ์ •์  ๋ฐ์ดํ„ฐ๋ฅผ Static Mesh๋กœ ๋ณต์‚ฌ
    // ๋ณธ ์›จ์ดํŠธ๋Š” ์ œ๊ฑฐ๋˜๊ณ  Bind Pose ์ •์ ๋งŒ ์ €์žฅ
}

1.2 ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ…์Šค์ฒ˜ ์ƒ์„ฑ

// TurboSequence_Helper_Lf.cpp - CreateAnimationLibraryTexture2D

ํ•ต์‹ฌ์€ ๋ชจ๋“  ์• ๋‹ˆ๋ฉ”์ด์…˜ ํ”„๋ ˆ์ž„์˜ ๋ณธ Transform์„ ํ…์Šค์ฒ˜์— ๋ฒ ์ดํฌ:

ํ…์Šค์ฒ˜ ๊ตฌ์กฐ:
- Width: NumBones * 3 (๊ฐ ๋ณธ๋‹น 3ํ”ฝ์…€ = Rotation Quat + Translation + Scale)
- Height: TotalAnimationFrames (๋ชจ๋“  ์• ๋‹ˆ๋ฉ”์ด์…˜์˜ ๋ชจ๋“  ํ”„๋ ˆ์ž„)
- Format: RGBA16F (๊ณ ์ •๋ฐ€๋„)

์˜ˆ์‹œ:
- ๋ณธ 50๊ฐœ, ์• ๋‹ˆ๋ฉ”์ด์…˜ 10๊ฐœ (๊ฐ 100ํ”„๋ ˆ์ž„) = 150x1000 ํ…์Šค์ฒ˜
- Pixel[0-2, FrameY] = Bone0์˜ Rotation(RGBA), Translation(RGB), Scale(RGB)
- Pixel[3-5, FrameY] = Bone1์˜ ๋ฐ์ดํ„ฐ...

ํ…์Šค์ฒ˜ ํ”ฝ์…€ ์ธ์ฝ”๋”ฉ:

// ๊ฐ ๋ณธ์˜ Transform์„ 3๊ฐœ ํ”ฝ์…€์— ์ €์žฅ
FLinearColor RotationPixel = FLinearColor(Quat.X, Quat.Y, Quat.Z, Quat.W);
FLinearColor TranslationPixel = FLinearColor(Pos.X, Pos.Y, Pos.Z, 0);
FLinearColor ScalePixel = FLinearColor(Scale.X, Scale.Y, Scale.Z, 0);

1.3 ๋ณธ ์›จ์ดํŠธ ํ…์Šค์ฒ˜ ์ƒ์„ฑ

// ๊ฐ ์ •์ ์˜ ๋ณธ ์ธ๋ฑ์Šค์™€ ์›จ์ดํŠธ๋ฅผ ํ…์Šค์ฒ˜์— ์ €์žฅ
// Width: NumVertices
// Height: 1
// RGBA: (BoneIndex0, Weight0, BoneIndex1, Weight1)

2๋‹จ๊ณ„: ๋Ÿฐํƒ€์ž„ CPU (Game Thread)

// TurboSequence_Manager_Lf.cpp

2.1 ์ธ์Šคํ„ด์Šค ์Šคํฐ

FTurboSequence_MinimalMeshData_Lf AddSkinnedMeshInstance_GameThread(...)
{
    // 1. MeshAsset์—์„œ Static Mesh + ํ…์Šค์ฒ˜ ์ฐธ์กฐ ๊ฐ€์ ธ์˜ค๊ธฐ
    // 2. Niagara System์— ์ธ์Šคํ„ด์Šค ์ถ”๊ฐ€ ์ค€๋น„
    // 3. GPU์— ์ „๋‹ฌํ•  ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ
    FTurboSequence_MinimalMeshData_Lf Data;
    Data.AnimationStartFrame = 0;
    Data.AnimationEndFrame = 100;
    Data.CustomData = FVector4f(...); // Niagara User Parameter๋กœ ์ „๋‹ฌ
    return Data;
}

2.2 ์• ๋‹ˆ๋ฉ”์ด์…˜ ์—…๋ฐ์ดํŠธ (Concurrent)

void UpdateMeshAnimation_Concurrent(...)
{
    // CPU์—์„œ๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋งŒ ์—…๋ฐ์ดํŠธ (์–ด๋–ค ์• ๋‹ˆ๋ฉ”์ด์…˜, ์–ด๋–ค ํ”„๋ ˆ์ž„)
    Instance.AnimationStartFrame = NewAnimStartFrame;
    Instance.AnimationTime += DeltaTime;
    // ์‹ค์ œ ๋ณธ ๊ณ„์‚ฐ์€ GPU์—์„œ ์ˆ˜ํ–‰
}

2.3 Solve (Game Thread)

void SolveMeshes_GameThread(float DeltaTime, ...)
{
    // 1. CPU์—์„œ ์—…๋ฐ์ดํŠธ๋œ ๋ชจ๋“  ์ธ์Šคํ„ด์Šค ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์Œ
    // 2. Niagara System์— ์ธ์Šคํ„ด์Šค ๋ฐ์ดํ„ฐ ์ „๋‹ฌ
    // 3. GPU Compute Shader ๋””์ŠคํŒจ์น˜ ์Šค์ผ€์ค„๋ง
    
    for (UpdateGroup : UpdateGroups)
    {
        NiagaraComponent->SetCustomData(InstanceData); // GPU๋กœ ์ „์†ก
        DispatchComputeShaders(UpdateGroup);
    }
}

3๋‹จ๊ณ„: ๋Ÿฐํƒ€์ž„ GPU - Compute Shader

// Shaders/Private/BoneSettings_CS_Lf.usf

3.1 ๋ณธ ๋ฐ์ดํ„ฐ ์ค€๋น„ Compute Shader

[numthreads(64, 1, 1)]
void BoneSettings_CS_Lf(uint3 ThreadId : SV_DispatchThreadID)
{
    uint InstanceID = ThreadId.x;
    
    // 1. ์ธ์Šคํ„ด์Šค ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋กœ๋“œ (Niagara User Parameters)
    float AnimTime = InstanceAnimTimes[InstanceID];
    int StartFrame = InstanceStartFrames[InstanceID];
    int EndFrame = InstanceEndFrames[InstanceID];
    
    // 2. ํ˜„์žฌ ํ”„๋ ˆ์ž„ ๊ณ„์‚ฐ
    float FrameFloat = StartFrame + (AnimTime * FPS);
    int Frame0 = floor(FrameFloat);
    int Frame1 = ceil(FrameFloat);
    float BlendAlpha = frac(FrameFloat);
    
    // 3. ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ…์Šค์ฒ˜์—์„œ ๋ณธ Transform ์ƒ˜ํ”Œ๋ง
    for (int BoneIdx = 0; BoneIdx < NumBones; BoneIdx++)
    {
        // ํ…์Šค์ฒ˜ UV ๊ณ„์‚ฐ
        float U0 = (BoneIdx * 3.0f) / TextureWidth;
        float V0 = Frame0 / TextureHeight;
        float V1 = Frame1 / TextureHeight;
        
        // ๋ณธ Transform ๋กœ๋“œ (3ํ”ฝ์…€ = Rotation, Translation, Scale)
        float4 Rot0 = AnimLibraryTexture.SampleLevel(Sampler, float2(U0, V0), 0);
        float4 Trans0 = AnimLibraryTexture.SampleLevel(Sampler, float2(U0 + 1.0/Width, V0), 0);
        float4 Scale0 = AnimLibraryTexture.SampleLevel(Sampler, float2(U0 + 2.0/Width, V0), 0);
        
        float4 Rot1 = AnimLibraryTexture.SampleLevel(Sampler, float2(U0, V1), 0);
        float4 Trans1 = AnimLibraryTexture.SampleLevel(Sampler, float2(U0 + 1.0/Width, V1), 0);
        float4 Scale1 = AnimLibraryTexture.SampleLevel(Sampler, float2(U0 + 2.0/Width, V1), 0);
        
        // 4. ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„
        float4 FinalRot = QuatSlerp(Rot0, Rot1, BlendAlpha);
        float3 FinalTrans = lerp(Trans0.xyz, Trans1.xyz, BlendAlpha);
        float3 FinalScale = lerp(Scale0.xyz, Scale1.xyz, BlendAlpha);
        
        // 5. ๊ฒฐ๊ณผ๋ฅผ Per-Instance ๋ณธ ๋ฒ„ํผ์— ์ €์žฅ (๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉ)
        OutBoneTransforms[InstanceID * NumBones + BoneIdx] = 
            ComposeTransform(FinalRot, FinalTrans, FinalScale);
    }
}

3.2 ๋ฉ”์‹œ ๋‹จ์œ„ Compute Shader

// Shaders/Private/MeshUnit_CS_Lf.usf
[numthreads(64, 1, 1)]
void MeshUnit_CS_Lf(uint3 ThreadId : SV_DispatchThreadID)
{
    uint InstanceID = ThreadId.x;
    
    // ์ถ”๊ฐ€ ๋ฉ”์‹œ๋ณ„ ๋กœ์ง (LOD ์„ ํƒ, ๊ฐ€์‹œ์„ฑ ๋“ฑ)
    // ์‹ค์ œ ์ •์  ์Šคํ‚ค๋‹์€ Vertex Shader์—์„œ ์ˆ˜ํ–‰
}

4๋‹จ๊ณ„: ๋Ÿฐํƒ€์ž„ GPU - Vertex Shader (Material)

// Shaders/Private/GPU_VertexSkinning_VS_Lf.ush
// Material Function: MF_TurboSequence_PositionOffset_Lf์—์„œ ํ˜ธ์ถœ

์ •์  ์Šคํ‚ค๋‹ ์ˆ˜ํ–‰

float3 TurboSequence_VertexSkinning(
    float3 LocalPosition,
    uint VertexID,
    uint InstanceID)
{
    // 1. ๋ณธ ์›จ์ดํŠธ ํ…์Šค์ฒ˜์—์„œ ํ•ด๋‹น ์ •์ ์˜ ์˜ํ–ฅ๋ฐ›๋Š” ๋ณธ ์ •๋ณด ๋กœ๋“œ
    float4 BoneWeightData = BoneWeightTexture.Load(int3(VertexID, 0, 0));
    int BoneIndex0 = int(BoneWeightData.x);
    float Weight0 = BoneWeightData.y;
    int BoneIndex1 = int(BoneWeightData.z);
    float Weight1 = BoneWeightData.w;
    
    // 2. Compute Shader์—์„œ ๊ณ„์‚ฐํ•œ ๋ณธ Transform ๋กœ๋“œ
    float4x4 BoneMatrix0 = BoneTransforms[InstanceID * NumBones + BoneIndex0];
    float4x4 BoneMatrix1 = BoneTransforms[InstanceID * NumBones + BoneIndex1];
    
    // 3. ์ •์ ์— ๋ณธ Transform ์ ์šฉ (์Šคํ‚ค๋‹)
    float3 SkinnedPos0 = mul(float4(LocalPosition, 1.0), BoneMatrix0).xyz;
    float3 SkinnedPos1 = mul(float4(LocalPosition, 1.0), BoneMatrix1).xyz;
    
    // 4. ์›จ์ดํŠธ ๋ธ”๋ Œ๋”ฉ
    float3 FinalPosition = SkinnedPos0 * Weight0 + SkinnedPos1 * Weight1;
    
    // 5. World Position Offset์œผ๋กœ ์ถœ๋ ฅ
    return FinalPosition - LocalPosition; // Offset ๋ฐ˜ํ™˜
}

Material Graph ์—ฐ๊ฒฐ

Material:
  World Position Offset Pin <- MF_TurboSequence_PositionOffset_Lf ์ถœ๋ ฅ
  โ”œโ”€ VertexID (Vertex Interpolator)
  โ”œโ”€ InstanceID (Niagara Mesh Particles ์ž๋™ ์ œ๊ณต)
  โ””โ”€ Bone Transform Buffer (Compute Shader ์ถœ๋ ฅ)

5๋‹จ๊ณ„: Niagara Instanced Rendering

// Niagara System์ด ์ตœ์ข… ๋ Œ๋”๋ง ์ˆ˜ํ–‰
NiagaraRenderer->DrawInstances(
    StaticMesh,                    // ๋ณ€ํ™˜๋œ Static Mesh
    Material,                      // ์ปค์Šคํ…€ ์Šคํ‚ค๋‹ Material
    InstanceCount,                 // 10k-50k ์ธ์Šคํ„ด์Šค
    PerInstanceData                // Compute Shader ๊ฒฐ๊ณผ
);

ํ•ต์‹ฌ: Niagara๋Š” ํ•˜๋‚˜์˜ DrawIndexedInstanced ํ˜ธ์ถœ๋กœ ๋ชจ๋“  ์ธ์Šคํ„ด์Šค๋ฅผ ๋ Œ๋”๋ง. GPU๋Š” ๊ฐ ์ธ์Šคํ„ด์Šค๋งˆ๋‹ค Vertex Shader๋ฅผ ์‹คํ–‰ํ•˜๋ฉฐ, ๊ทธ ์•ˆ์—์„œ ๋ณธ ์Šคํ‚ค๋‹์ด ๋ฐœ์ƒ.


GPU ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์•„์›ƒ

GPU Memory:
โ”œโ”€ Animation Library Texture (VRAM)
โ”‚  โ””โ”€ 150x1000x8bytes = ~1.2MB (๋ณธ 50๊ฐœ, ํ”„๋ ˆ์ž„ 1000๊ฐœ ๊ธฐ์ค€)
โ”œโ”€ Bone Weight Texture (VRAM)
โ”‚  โ””โ”€ NumVertices x 16bytes (์ •์ ๋‹น 4๊ฐœ float)
โ”œโ”€ Bone Transform Buffer (Compute Shader ์ถœ๋ ฅ)
โ”‚  โ””โ”€ NumInstances x NumBones x 64bytes (4x4 ํ–‰๋ ฌ)
โ”‚     ์˜ˆ: 10k instances x 50 bones = ~30MB
โ””โ”€ Static Mesh Vertex/Index Buffers (VRAM)
   โ””โ”€ ํ‘œ์ค€ Static Mesh ๋ฐ์ดํ„ฐ

์„ฑ๋Šฅ ์ตœ์ ํ™” ์›๋ฆฌ

์™œ ๋น ๋ฅธ๊ฐ€?

  1. Draw Call ๊ทน์†Œํ™”
    • ๊ธฐ์กด: 10k ์ธ์Šคํ„ด์Šค = 10k draw calls
    • TurboSequence: 10k ์ธ์Šคํ„ด์Šค = 1 draw call (์•„ํ‚คํƒ€์ž…๋‹น)
  2. CPU ๋ถ€ํ•˜ ์ด๋™
    • ๊ธฐ์กด: CPU์—์„œ ๋ณธ ํ–‰๋ ฌ ๊ณ„์‚ฐ → GPU๋กœ ์ „์†ก
    • TurboSequence: CPU๋Š” ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋งŒ ์—…๋ฐ์ดํŠธ, GPU๊ฐ€ ๋ชจ๋“  ๊ณ„์‚ฐ ์ˆ˜ํ–‰
  3. ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ ์ตœ์ ํ™”
    • ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ…์Šค์ฒ˜ ์••์ถ•์œผ๋กœ ์ „์†ก
    • ๋ณธ๋‹น 3ํ”ฝ์…€(48๋ฐ”์ดํŠธ) vs ๊ธฐ์กด ๋ณธ ํ–‰๋ ฌ(64๋ฐ”์ดํŠธ)

ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

์žฅ์ :

  • ์ˆ˜๋งŒ ๊ฐœ ์ธ์Šคํ„ด์Šค๋ฅผ ๋‹จ์ผ draw call๋กœ ๋ Œ๋”๋ง
  • CPU ์˜ค๋ฒ„ํ—ค๋“œ ๊ฑฐ์˜ ์—†์Œ
  • ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ธ”๋ Œ๋”ฉ๋„ GPU์—์„œ ๊ฐ€๋Šฅ

๋‹จ์ :

  • Static Mesh ๋ณ€ํ™˜ ํ•„์š” (์˜คํ”„๋ผ์ธ ์ž‘์—…)
  • ํ…์Šค์ฒ˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ (์• ๋‹ˆ๋ฉ”์ด์…˜๋งˆ๋‹ค ํ…์Šค์ฒ˜ ์ƒ์„ฑ)
  • ๋ณต์žกํ•œ ์• ๋‹ˆ๋ฉ”์ด์…˜ ๋ธ”๋ Œ๋”ฉ ์ œํ•œ (CPU ๋กœ์ง ํ•„์š” ์‹œ ๋ณ‘๋ชฉ)
  • ๋ณธ ๊ฐœ์ˆ˜ ์ œํ•œ (30-75๊ฐœ ๊ถŒ์žฅ, ํ…์Šค์ฒ˜ ํฌ๊ธฐ ์ด์Šˆ)

์ฝ”๋“œ ํ๋ฆ„ ์š”์•ฝ

1. [Editor] Skeletal Mesh → Static Mesh + Animation Textures
2. [CPU GameThread] Spawn Instances → Update Animation Meta
3. [CPU GameThread] SolveMeshes → Dispatch Compute Shaders
4. [GPU Compute] BoneSettings_CS → Calculate Bone Transforms per Instance
5. [GPU Compute] MeshUnit_CS → LOD/Visibility processing
6. [GPU Vertex] Material Vertex Shader → Skinning per Vertex
7. [GPU Raster] Niagara Instanced Draw Call → Final Render

์‹ค์ œ ๊ตฌํ˜„ ํ™•์ธ ํŒŒ์ผ

  • Compute Shader: Shaders/Private/BoneSettings_CS_Lf.usf, MeshUnit_CS_Lf.usf
  • Vertex Shader: Shaders/Private/GPU_VertexSkinning_VS_Lf.ush
  • Material Functions: MF_TurboSequence_PositionOffset_Lf.uasset
  • Texture ์ƒ์„ฑ: Source/TurboSequence_Lf/Private/TurboSequence_Helper_Lf.cpp::CreateAnimationLibraryTexture2D
  • Compute Dispatch: Source/TurboSequence_Lf/Private/TurboSequence_Manager_Lf.cpp::SolveMeshes_GameThread

์ด ๊ตฌ์กฐ๋กœ Niagara์˜ Static Mesh Instancing + GPU Compute Skinning์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ ํฌ๋ผ์šฐ๋“œ ๋ Œ๋”๋ง์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.