Gaussian Splatting

I’ve been playing around with some Gaussian Splats! Before I go into the what and how, have a look at the gallery so far

Click on the images to enter an interactive viewer

Pieralongia Rock Towers in the Dolomites, Northern Italy

Drone footage by Antonio Iaccarino

More splats from Antonio’s drone footage:

Rocket Lab’s Electron Rocket (F36) at LC-1 in Mahia

Drone footage by James Rattray

Sisyphus

Sculpture by Francis Upritchard, Chrischurch Art Gallery Description (couldn’t find one from Auckland Art Gallery, where I saw it). Splat recorded at Auckland Art Gallery.

Koedal Baydham Adhaz Parw (Crocodile Shark) Mask

Sculpture by Alick Tipoti, National Gallery of Australia Description. Splat recorded at Auckland Art Gallery.

A Quiver of Names

Sculpture by Zac Langdon-Pole, Auckland Art Gallery Description. Splat recorded at Auckland Art Gallery.

Albert Park Pavilion, ft. Liv

Works by Len Castle, Pat Perrin, Tom Kreisler, and Isobel Thom. For more detailed attribution see this. Splat recorded at Auckland Art Gallery.

Batik Cloth, Ever Present: First Peoples Art of Australia

Unfortunately I couldn’t find detailed attribution online, but you can read more here.

A cherry blossom outside our house

Some plants on our coffee table (ft. my flatmate)

Olivia in our front yard

This one didn’t really work, but it’s still interesting

Will on a hit an run spree

Greissen’s car

This was originally to help her sell the car online, but the render probably wont persuade any buyer

Context

Gaussian Splatting is a new novel way of turning a point cloud into a beautifully rendered 3D scene. I’m definitely not an expert on this, but here is my understanding so far.

Gaussian Splatting is a form of Neural Radiance Fields (NeRF) however they have some core differences/novel aspects:

  • They represent the volume in a scene with a collection of 3D gaussian distributions (a.k.a. splats) with colour and alpha values.
  • They use neat tricks for rasterising the scene that allows for much faster rendering
  • The properties of each gaussian (variance in each direction, colour, and alpha values) are trainable parameters, and are trained against the original videos

The faster rasterisation technique means that the outputs can be rendered at real time frame rates (and the training is super fast too!).

My process for creating splats

NOTE: I have since containerised the process and built it into a Dagster job. My code is WIP, but can be found here

A friend from work shared some youtube videos with me about what they are and how they make them (here and here) and I had to try it out for myself.

I was able to get some simple renders up and running in an afternoon (testament to the training speed and code quality of the researchers!). Here is an outline of my processh. Be warned - this is very scrappy and thrown together, I’ve spent most of my time playing around recording stuff

Setup

My hardware/software setup:

  • OS: Ubuntu 22.04
  • GPU: RTX4090
  • RAM: 16gb
  • CPU: Intel(R) Core(TM) i5-9600KF CPU @ 3.70GHz (time for an upgrade…)
  • Camera: Samsung S22 Rear ultra wide camera (12MP)

You’ll need to clone Gaussian Splatting, and follow their install/setup requirements.

I had a bit of trouble installing the correct CUDA runtime, however sudo apt install cuda-11-6 was how I got it in the end (after following this most of the way).

I also needed conda and ffmpeg.

Record a Video

When recording a scene you want to splat, I generally get about 2mins of footage circling the objects of interest 2-3 times, trying to get the biggest range of perspectives possible. I try to get high/low/mid angle shots at close-ish range, and then also try and get a view from reasonably far away (2-3m).

This is a big area for experimentation. It’s very GIGO - so try out a few and see what works.

ffmpeg to Get the Frames

I then use ffmpeg to convert the videos into frames. I try and end up with about ~400 images because that seems to complete processing in a reasonable amount of time and doesn’t run into VRAM issues on my RTX4090 (I’m sure there are a bunch of optimisations that could be done to really squeeze out that VRAM).

The magical ffmpeg incantation is:

FILE_NAME=your_file_name
FPS=desired fps

ffmpeg -i $FILE_NAME -qscale:v 1 -qmin 1 -vf fps=$FPS %04d.jpg

For each video I do a little calculation to figure out what FPS to use to make sure I sit at about 400 images. Based of The NeRF Guru’s amazing video. A lot of great stuff in that video - but it’s all for Windows so I ended up skipping a lot of the setup stuff.

That script will create a bunch of jpg’s in the current directory - in order for the next steps to work you’re going to need the data in a folder like this:

~/path/to/data/dir/
    inputs/
        0001.jpg
        0002.jpg
        ...
    original_video.mp4

I use this little script to set that up for me:

DATA_DIR=data/SCENE_NAME
FILE_PATH=~/from/phone/video_name.mp4
FPS=6

FILE_NAME=$(basename $FILE_PATH)

# create the directory
mkdir -p $DATA_DIR
cp $FILE_PATH $DATA_DIR

(
    cd $DATA_DIR
    ffmpeg -i $FILE_NAME -qscale:v 1 -qmin 1 -vf fps=$FPS %04d.jpg
    mkdir input
    mv *.jpg input
)

Gaussian Splatting Scripts

The Gaussian Splatting repository contains two scripts:

  • convert.py
  • train.py

With the project conda environment activated, from the repository root, I run this script:

python convert.py -s $DATA_DIR
python train.py -s $DATA_DIR

If I run out of VRAM, I will adjust these parameters for train.py:

  • Increase these values:
    • --densify_grad_threshold, starts at 0.0002
    • --densification_interval, starts at 100
  • Decrease this value:
    • --densify_until_iter, starts at 15_000

This has worked to varying degrees of success (YMMV).

View the result!

There will be an output/ directory in the gaussian splatting repository root with the outputs for your splat, the folder names are hashes so I just sort by modified to find the latest.

You want to find the point_cloud/iteration_30000/point_cloud.ply file (iteration_7000 is also often quite good).

Then I use this awesome project that uses WebGL to render gaussian splatting scenes in your browser to view the splats!

All you have to do is go to an instance of the renderer (they provide one here) and drag your point_cloud.ply file into the window.