E-learning Services
From Real to Virtual: Digital Twins with Ray-Ban Meta and Gaussian Splatting

From Real to Virtual: Digital Twins with Ray-Ban Meta and Gaussian Splatting

In the XR laboratory of CESGA’s e-learning department, we explored innovative ways to create digital twins by taking advantage of recently acquired technological equipment. Traditionally, the process of creating digital twins required complex photogrammetry, manual 3D modeling, or specific instrumentation to gather physical data. However, current advances in computer vision and 3D graphics allow us to automate and simplify much of this work.

In this first project, we captured real-world scenes using the smart Ray-Ban Meta glasses. From the recorded video, we generated a three-dimensional point cloud and created a model based on Gaussian Splatting—a novel technique that surpasses classical photogrammetry in many aspects, especially in visual fidelity and real-time performance. Once generated and optimized, we integrated the model into Unity, where it can be enhanced with animations, interactions, or real-time data to create immersive experiences. Finally, we visualized our digital twin in various formats and devices: from conventional web browsers to virtual reality devices, including mobile phones with augmented reality features.

In this article, we present this modern methodology for the rapid creation of digital twins from video, combining portable capture devices and advanced 3D reconstruction algorithms.

Capturing the Scene with the Ray-Ban Meta

The first step in building a digital twin is to gather accurate data of the physical object or space. In this methodology, we used the Ray-Ban Meta smart glasses, developed by Meta in collaboration with Ray-Ban, which incorporate an integrated camera for capturing photos and videos from a first-person perspective. Their main advantage is that they allow recording exactly from the user’s point of view without the need to hold a camera or scanner, facilitating a more natural and continuous capture of the scene.

The Ray-Ban Meta smart glasses, developed by Meta in collaboration with Ray-Ban, incorporate an integrated camera for capturing photos and videos from a first-person perspective.

The latest generation of Ray-Ban Meta glasses features a 12-megapixel ultra-wide-angle camera capable of recording 1080p video for several minutes. This enables the recording of high-quality panoramic images, ideal for reconstructing a realistic digital twin. The wide-angle lens helps capture a large portion of the scene in each shot, ensuring sufficient overlap between consecutive images—a key factor for subsequent 3D reconstruction.

To start the capture, we recorded a video while moving slowly around the object we wanted to capture, maintaining a constant distance of between 1 and 2 meters and avoiding sudden movements or quick turns. The device is kept at eye level for most of the journey, but lower (approximately at chest level) and higher (almost at head level) perspectives are also recorded to obtain various viewpoints. The pace must be slow and steady, ensuring that every part of the object appears in several shots and avoiding sudden lighting changes or excessively dark areas. The video finishes with a final, even slower lap to capture fine details and ensure complete coverage from all angles.

We recorded a video of the area by slowly moving around the object we want to scan.

Extracting Frames and Processing the Images

Once we have the video recorded with the glasses, the next step is to extract frames: from the video file, we select individual frames at regular intervals (for example, 1 frame per second, depending on the speed of movement) to obtain still images. In this way, from a 3-minute video, we can extract several hundred static images. It is important that there is enough overlap and continuity between successive images, that is, each part of the scene appears in multiple photos from slightly different angles. This requirement is easily met thanks to the wide-angle of the Ray-Ban Meta and the natural movement of the user’s head: adjacent frames from the video usually share many details, which will later help in matching them. Blurry or repetitive frames are also typically discarded to create a set of high-quality images that thoroughly cover the scene.

We use a custom script created to extract the frames from the video, in this case 1 out of every 8.

The next step is to employ photogrammetry software to align the images and obtain an initial reconstruction. In this case, we used RealityCapture (from Capturing Reality/Epic Games), a professional tool very efficient in building 3D models from photographs. The alignment process in RealityCapture analyzes all the images and finds common points between them (features or points of interest), grouping them as they correspond to the same physical point viewed from different angles. The result is a point cloud representing these detected characteristic points in 3D space, along with an estimation of the position and orientation of each camera at the moment of capture. Essentially, RealityCapture identifies where the camera was located for each photo and the basic structure of the underlying scene. If all goes well, all the images will be part of the same aligned set and we will see a cluster of points in space forming a skeletal version of the real environment, with the cameras arranged around it.

RealityCapture analyzes all the images and finds common points between them, grouping them as they correspond to the same physical point seen from different shots.

With this point cloud and the calibrated cameras, we have the backbone of our digital twin. RealityCapture also allows generating a dense traditional mesh (millions of points and polygons) and photorealistic textures from the images, following the classic photogrammetry workflow. However, in this project we opted to diverge from that conventional path and take advantage of the initial calibration to apply a newer technique: Gaussian Splatting. Before doing so, we export from RealityCapture the necessary data. Specifically, we save in one file (for example, CSV) the intrinsic and extrinsic parameters of each camera—that is, the positions, orientations, and calculated optical parameters—and in another file the point cloud. These data will serve as input for the next software to be used, PostShot.

Creating the 3D Model with Gaussian Splatting

Once we have the images and their calibration, we use the PostShot (Jawset) software to create the 3D model using Gaussian Splatting (GS). But what exactly does this technique consist of and why is it attracting so much attention in the world of computer vision and 3D graphics? In Gaussian Splatting, the scene is represented not as a typical mesh of triangles but through a set of 3D Gaussians—essentially small “clouds” of density and color in three-dimensional space, centered at specific points. These millions of Gaussians are differentiable and summable elements: when projected onto a given view, they combine to reconstruct the image from that angle.

Image of the Gaussians or Splats that form the resulting model.

This approach presents several important advantages compared to traditional photogrammetry and modeling. Firstly, it avoids the need to build complex polygonal geometry: there is no need to determine faces or UV maps for textures, as the continuous point set itself describes the appearance of the environment. Secondly, it better captures the lighting and transparency effects present in the original photographs, since each Gaussian can store color and brightness information that contributes volumetrically to the final image (for example, reflections or translucent areas can be reproduced better than in a static mesh). Thirdly, and perhaps most notably, the Gaussian Splatting method enables real-time performance without sacrificing visual quality, something previous methods could not achieve.

Once the concept is understood, let’s look at the practical process of creating the 3D model with GS using PostShot. This software provides an integrated workflow for Gaussian Splatting. First, we import all our data: the extracted images (or video frames) and the calibration files (cameras) and point cloud that we exported from RealityCapture. PostShot is capable of reading that prior alignment and using it as a starting point for training the Gaussian Splatting model, which improves quality and even allows for real metric scale in the scene. Once the data are loaded, we start the training process in PostShot: internally, the software creates an initial representation with Gaussians placed at the positions of our point cloud, and then iteratively adjusts parameters to minimize the error between the model’s synthetic views and the original photographs. This optimization process runs entirely on the GPU and quite quickly; in fact, PostShot offers a real-time preview during training. We can orbit the virtual camera around the model while it is learning, and see how the initially blurry patches gradually refine to reproduce all the details of the scene (as if the image were progressively coming into focus). In just a few minutes, depending on the number of images and the complexity, a trained Gaussian Splatting model that matches our photographs is obtained.

The training process of the model in PostShot.

An interesting aspect is that PostShot performs all the processing locally on our computer, without the need to upload images to the cloud, thus ensuring data privacy. Once satisfied with the result, the software allows exporting the resulting 3D model.

Alternatives to PostShot

In this project, we used the PostShot software because it is one of the most advanced available and for its ability to display the training result of the models in real time. However, there are several open-source and cross-platform alternatives that should yield the same results:

  • OpenSpla – A free and open-source implementation of 3D Gaussian Splatting, written in C++, focused on being portable, lightweight, and fast.
  • GSplat – An open-source library for CUDA-accelerated Gaussian rasterization, with Python bindings.
  • Scaniverse – An application that has recently incorporated support for Gaussian Splatting, allowing users to create high-quality 3D scans directly from their mobile devices. It offers on-device processing, ensuring data privacy.

Cleaning and Optimizing the Model

The Gaussian Splatting model obtained may easily be composed of several million Gaussian points, each contributing slightly to the scene. This density is necessary to reproduce all details and textures with photographic fidelity. However, for use in interactive applications (in real time and on devices with limited capabilities) it is advisable to optimize it and reduce its complexity. One of the first optimization techniques is cleaning artifacts and stray points: sometimes the training process may leave small isolated splats in space (due to noise or image reflections) that barely contribute to visual quality but add computational load. PostShot provides tools to select and eliminate these unnecessary points easily, or even combine them with neighboring points. In this way, the total number of splats decreases and the rendering load is also reduced.

Supersplat Editor

Another interesting tool for editing our Gaussian Splatting is Supersplat, which features an open-source editor for the editing and optimization of 3D Gaussian Splats, functioning directly in the browser without needing additional installations. This editor can also be installed locally on our computer, thus avoiding the need to upload models to the internet.

Model optimization by cleaning artifacts and stray points that add computational load.

Model Compression

Once the model has been edited to eliminate unnecessary areas, we have the additional option to compress it so that it can be used on devices with low computational power, such as mobile phones or standalone virtual reality devices.

The Gaussian Splatting technology is still relatively new and is constantly evolving. New formats and compression methods are continuously emerging, aiming to establish themselves as the standard for representing and storing 3D scenes. Among the currently highlighted compression options are:

  • Compression of .ply files: Tools like SuperSplat allow significant reduction of .ply file sizes. This process involves eliminating redundant data and storing the remaining elements in quantized data formats, resulting in a considerable reduction in file size without losing visual quality.
  • .splat format: This format seeks to optimize the representation of splats, storing the information more efficiently and allowing faster loading and rendering in game engines.
  • .spz compression: Recently, Niantic introduced the .spz format, which is intended to become the equivalent of JPEG for Gaussian Splatting. This format offers advanced compression, maintaining visual quality and facilitating the distribution of 3D models across various platforms.

The choice of compression method will depend on the specific needs of the project, considering factors such as file size, required visual quality, and compatibility with the development tools used.

Integrating with 3D Engines: Unity

Once optimized, we import the Gaussian Splatting model into Unity, one of the most widely used real-time 3D development engines. The import is performed via a specific plugin capable of interpreting the model’s format, since to date neither Unity nor Unreal offer native support for GS. In our case, we used the UnityGaussianSplatting plugin, created by Aras Pranckevičius, which enables real-time visualization of Gaussian Splatting models in Unity.

Once the model is loaded in Unity or Unreal Engine, our digital twin appears in the scene as just another three-dimensional object (even though internally it is not a conventional mesh). We can then start adding interactive and functional elements around it.

After integrating and enriching the scene with new 3D elements, animations, and lights (taking into account the mentioned limitations), it can be exported to be used in interactive experiences or video games, taking advantage of the real-time rendering capabilities of the engines.

Visualization on the Web and in Extended Reality Devices

On the Web

There are libraries like Three.js that, with specific extensions, allow representing the Gaussian point cloud directly in the browser. This makes it easy to share the digital twin, as any user could interactively explore it simply by accessing a URL, without installing applications or owning specialized hardware. In our project, we tested a web viewer where the user can rotate, zoom, and toggle some elements of the digital twin. Although the web experience is limited by the GPU restrictions of browsers, it proved that even complex GS models can be visualized fluidly on modern PCs. This capability anticipates a future where virtual catalogs, 3D urban maps, or cultural tours are offered online using Gaussian Splatting for unprecedented realism.

Augmented Reality

One of the most prominent possibilities for visualizing these digital twins comes from augmented reality (AR). This technology allows digital elements to be superimposed on the real world, offering an immersive and interactive experience. AR visualization can be carried out in two main ways:

  • Through a URL (via the web): Using advanced web technologies, it is possible to access AR experiences directly from the browser without needing to install additional applications. This facilitates the dissemination and accessibility of digital twins to a broader audience.
  • Through a native application: The development of specific applications for mobile devices or AR glasses allows one to take full advantage of hardware capabilities, offering a more optimized experience with additional functionalities.

Visualization of the model on a mobile phone with augmented reality.

In our project, we used the GaussianSplats3D library by Mark Kellogg (available on GitHub) which allows the model to be rendered directly on a web page, additionally providing support for Augmented Reality.

Implementing digital twins in AR offers multiple advantages across various fields, from education to industry, enabling a more natural and enriched interaction with digital content.

Virtual Reality

Visualizing digital twins in virtual reality (VR) offers an immersive experience that allows users to interact directly with the three-dimensional models. In this modality, the user can use controllers or their own hands to rotate, resize, and navigate through the model, exploring every detail from multiple perspectives.

To visualize our model in Virtual Reality, we used the Gracia viewer, which enables the import of Gaussian Splatting models for visualization on devices such as the Meta Quest 3. It is important to note that models based on Gaussian Splatting have high graphical requirements, especially when viewed in virtual reality. In order for them to run smoothly directly on a standalone device like the Meta Quest 3, strong optimization and compression of the model are necessary. Given that this process can be complex and limit visual quality, in our case we opted for a simpler solution: we used the Gracia application on a Meta Quest 3 connected to a PC, which allows us to obtain superior graphical quality while maintaining a comfortable immersive experience.

Visualization of the model in a Virtual Reality setting, in this case using a Meta Quest 3 device.

Summary and Future Projects

In this first project of the XR laboratory, we were able to utilize some of the new equipment available in the lab, such as the Meta Ray-Ban for video capture, the Meta Quest 3 and Apple Vision Pro headsets for immersive visualization, as well as a high-performance PC with an RTX 4090 graphics card, which was key for training the Gaussian Splatting models and rendering the models.

The recently acquired Apple Vision Pro VR device by CESGA.

Based on this foundation, we will continue to periodically publish on this blog the advances and research projects we carry out in the fields of extended reality, robotics, and applied artificial intelligence, with the main objective of sharing knowledge.