[ICCV 2025] 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

Object mesh가 필요없는 Template free 방식의 pose estimation 방식이 많이 나왔음

BundleSDF는 object pose를 SDF를 학습해서 알아내지만 , SDF를 얻고 나서 pose tracking이 가능하기 때문에 실질적으로 tracking frequency가 ~0.4Hz 정도임.

빠른 pose update가 중요한 상황에서는 SDF같은 neural object field를 학습하는거는 overhead가 큼

논문에서는 Gaussian Splatting을 이용해서 model-free, live object tracking and reconstruction를 진행

Contribution

model-free 6D object pose estimation, reconstruction에 2D Gaussian Splatting을 사용하는 novel method 제안

Differentiable Gaussian rendering을 활용하여, 2D GS 기반 Gaussian Object Field(shape)와 object-centric pose graph(pose)를 함께 최적화

dynamic keyframe selection, reconstruction confidence-based filtering mechanism 을 사용해서 pose error 줄임

novel adaptive Gaussian density control mechanism을 사용하여 중요도가 낮은 Gaussian들을 필터링해서 계산효율성 증대

Method

SAM2로 video에서 object segmentation, LoFTR로 프레임간 point corres 찾아서 keyframe만들어서 Bundle Adjustment를 통한 Coarse Pose Initialization을 진행

LoFTR로 찾은 초기 keyframe들은 differentiable rendering을 사용해서 2D Gaussian들과 joint optimization으로 refine됨.

dynamic keyframe selection technique을 사용해서

3.1. Coarse Pose Initialization

SAM2로 video에서 object segmentation,

LoFTR로 두 RGBD 프레임간의 feature point correspondences 찾고 nonlinear least squares optimization를 통해서 coarse pose 찾음

새로운 프레임이 기존 keyframe memory pool의 프레임과 공간적으로 충분히 다른 viewpoint면 keyframe memory pool에 추가

3.2. Gaussian Object Field

2D Gaussian Splatting(2DGS)를 이용해서 Gaussian Object Field 만듬

3D Gaussian Splatting
- scene을 3D Gaussian particle로 표현
- 3D centroid (mean): $µ \in \mathbb{R}^3$
- covariance matrix: $Σ \in R^{3×3}, Σ = RSS^⊤R^⊤$
  - diagonalized scaling matrix: $S = diag([s_x, s_y, s_z])$
  - rotation matrix: $R ∈ SO(3)$
- spherical harmonic coefficients: $c ∈ R^k$
- opacity: $α ∈ [0, 1]$
- Gaussian converted to camera coordinates: $Σ′ = JWΣW^⊤J^⊤$
  - world-to-camera transformation matrix: $W$
  - local affine transformation: J
- $Σ’$ 의 세번째 열과 행을 빼면 2D covariance matrix $Σ^{2D}$ 이 되고, image plane에서의 2D Gaussian $G^{2D}$ 를 나타냄

픽셀 $p=[u,v]^T$ 의 색 $\hat c(p)$ 는 카메라의 view direction의 앞에서 뒤 순서로 $\alpha$ 블랜딩을 통해서 다음과 같이 계산됨
- $\alpha_i , c_i$ : opacity, view-dependent appearance of i-th Gaussian
- $c_i$ 를 z depth coordinate으로 바꾸면 depth 이미지도 얻을 수 있음

2DGS 는 scaling matrix의 z는 0 이 되어서 다음과 같음 $S = diag([s_u, s_v, 0])$ , 따라서 3차원 공간에서 principal axes $t_u$ , $t_v$ 를 가지는 2d oriented planar Gaussian disk가 됨

2D Gaussian의 normal은 $t_w = t_u \times t_v$ 이고 각 Gaussian들의 rotation matrix는 $R = [t_u, t_v, t_w]$ 임

2D Gaussian을 image plane에 projection해서 loss를 계산하여 gradient를 전파해서, 각 2D Gaussian의 파라미터와 keyframe pose를 함께 optimize

3.3. Dynamic Keyframe Selection for Gaussian Splatting Optimization

coarse pose initialization에 오류가 있는 keyframe이 포함되면 2D Gaussian Splatting optimization이 divergence할 수 있고, Gaussian Splatting은 image-level(tile-based) rendering을 사용하므로keyframe 수가 증가할수록 계산 비용이 선형적으로 증가

이를 해결하기 위해 dynamic keyframe selection을 도입하여 오류 keyframe를 제거하고 sparse keyframe 유지

객체 중심을 원점으로 하는 sphere를 가정하고 구 위에 icosahedron의 vertices와 face centers를 사용해 균일하게 분포된 anchor point 집합을 생성

coarse pose로 얻은 각 keyframe의 카메라 위치를 가장 가까운 anchor point에 할당(cluster)→ 각 anchor point는 하나의 대표 viewpoint를 의미

각 anchor point에 대응하는 cluster에서 object mask 크기가 가장 큰 keyframe을 선택. 객체가 크게 occluded된 view를 피하고, 가시성이 좋은 view를 우선 사용

위 과정으로 선택된 keyframe들만 사용해 2D Gaussians + keyframe poses를 joint optimization

joint optimization 중에는 2D Gaussian reconstruction error를 기준으로 pose가 잘못된 keyframe을 추가로 제거

각 iteration마다 reconstruction loss에 대해 MAD (Median Absolute Deviation)를 계산 loss가 median ± 3 × MAD를 초과하는 view는 outlier keyframe으로 분류하여 제거

3.4. Opacity Percentile-based Adaptive Density Control

Gaussian Object Field 최적화 중 Gaussian의 개수와 밀집도(compactness)를 안정적으로 유지하기 위해 주기적인 pruning과 densification을 수행

기존 3DGS의 Adaptive Density Control은 densification interval, opacity threshold 등에 대해 많은 engineering tuning이 필요하다는 한계가 있음
→ object-centric 설정에서는 Gaussian scale을 제한하고
→ opacity percentile 기반 pruning 전략을 사용

일정 optimization step마다 opacity가 하위 5퍼센타일에 속하는 Gaussian들을 제거. 이 과정을 반복하여 Gaussian들의 95퍼센타일 opacity가 사전 정의된 threshold를 초과할 때까지 pruning 수행

이를 통해 rendering 시 충분한 수의 고품질 Gaussian은 유지하고 기여도가 낮은 Gaussian은 제거

Gaussian의 positional gradient가 특정 threshold를 초과하면 Gaussian을 split 또는 clone (3DGS와 유사한 방식)

Gaussian Object Field 최적화가 수렴하면 2D Gaussians는 freeze하고, RGB, depth, normal reconstruction을 사용해 모든 keyframe pose를 최종 refinement

3.5. Online Pose Graph Optimization

Gaussian Object Field로부터 최종적으로 업데이트된 keyframe pose들을 입력으로 사용

이 pose들을 기준으로 global object-centric coordinate system을 설정

keyframe memory pool을 유지하며 각 keyframe에 대한 key correspondences를 저장
→ (Gaussian optimization과는 분리된, online tracking 단계)

새로운 frame이 들어올 때마다 모든 keyframe을 사용하지 않고
- incoming frame의 view frustum을 기준으로 memory pool에서 overlapping frame subset만 선택

overlap 판단을 위해 memory pool의 각 frame에 대해 point-normal map을 생성. 해당 frame의 normal과 incoming frame의 camera-ray direction 간 dot-product를 계산. 이를 통해 visibility를 평가

incoming frame 기준 visibility ratio가 사전 정의된 threshold를 넘는 frame만 선택

선택된 keyframe들과 incoming frame으로 pose graph를 구성

pose graph optimization은 pairwise geometric consistency를 사용. dense pixel-wise reprojection error를 최소화하도록 수행

4. Experiments

[CoRL 2025] Humanoid Policy ∼ Human Policy (2)	2026.02.22
Multi-Modal Manipulation via Multi-Modal Policy Consensus (2)	2026.02.16
[CVPR 2025] OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints (1)	2026.01.31
[CVPR 2025] Hand-held Object Reconstruction from RGB Video with Dynamic Interaction (2)	2026.01.26
[ECCV 2024] Grounding Image Matching in 3D with MASt3R (2)	2026.01.25

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

거대고슴도치

[ICCV 2025] 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

[ICCV 2025] 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

'DL' 카테고리의 다른 글

'DL'의 다른글

티스토리툴바

[ICCV 2025] 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting

'DL' 카테고리의 다른 글

'DL'의 다른글

관련글

티스토리툴바