Apple, a company synonymous with groundbreaking technological innovations, has once again asserted its dominance in the realm of artificial intelligence (AI). Recently, the Cupertino, California-based tech giant unveiled groundbreaking advancements in AI research, unveiling two papers that introduce novel techniques for 3D avatars and streamlined language model inference. These breakthroughs have the potential to usher in more immersive visual experiences and facilitate the execution of complex AI systems on widely-used consumer devices like the iPhone and iPad.
In the initial research paper, Apple’s cadre of scientists unveils HUGS (Human Gaussian Splats), a groundbreaking technique designed to create animated 3D avatars from succinct monocular videos—recordings made with a single camera. Muhammed Kocabas, serving as the lead author, elucidates, “Our approach is tailored to monocular videos with a limited frame count (50-100), and it autonomously acquires the ability to unravel the static environment and generate a fully animatable human avatar within a mere 30-minute timeframe.
HUGS achieves this by representing both the human and background scene using 3D Gaussian splatting, an efficient rendering technique. The human model is initially derived from a statistical body shape model called SMPL. However, HUGS introduces the capability for the Gaussians to deviate, allowing for the capture of intricate details such as clothing and hair.
An innovative neural deformation module is employed to animate the Gaussians in a lifelike manner through linear blend skinning. This coordinated movement eliminates artifacts while reposing the avatar. Kocabas highlights that HUGS “enables novel-pose synthesis of the human and novel view synthesis of both the human and the scene.”
Compared to previous methods of avatar generation, HUGS stands out by being up to 100 times faster in both training and rendering. The researchers showcase photorealistic results after optimizing the system for just 30 minutes on a typical gaming GPU. Furthermore, HUGS outperforms state-of-the-art techniques like Vid2Avatar and NeuMan in terms of 3D reconstruction quality.
Apple’s achievement in introducing these new 3D modeling capabilities is truly impressive. The real-time performance and the ability to create avatars from videos captured in real-world scenarios could open up exciting possibilities for virtual try-ons, telepresence, and synthetic media in the near future. Imagine the potential if you could effortlessly create novel 3D scenes directly from your iPhone camera!
Closing the Memory Divide in AI Inference
In the subsequent paper, Apple’s researchers confronted a significant challenge associated with deploying large language models (LLMs) on devices with constrained memory resources. Contemporary natural language models like GPT-4 boast an extensive number of parameters, running into the hundreds of billions, thereby rendering inference computationally demanding on consumer-grade hardware.
The proposed system strategically addresses this challenge by minimizing data transfer from flash storage to the limited DRAM during inference. Keivan Alizadeh, the lead author, elucidates, “Our approach involves the creation of an inference cost model that aligns with the behavior of flash memory, guiding us to optimize in two critical dimensions: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks.”
Two primary techniques are introduced to achieve this optimization. “Windowing” facilitates the reuse of activations from recent inferences, while “row-column bundling” enhances efficiency by reading larger blocks of data, grouping rows and columns together. On an Apple M1 Max CPU, the application of these methods results in a notable 4-5x improvement in inference latency compared to a naive loading approach. The speedup is even more impressive on a GPU, reaching 20-25x.
Co-author Mehrdad Farajtabar underscores the significance of this breakthrough, stating, “This advancement holds particular importance for deploying advanced LLMs in resource-constrained environments, thereby broadening their practicality and accessibility.” The optimizations stand to pave the way for seamlessly running complex AI assistants and chatbots on ubiquitous devices such as iPhones, iPads, and other mobile platforms in the near future.
Apple’s Smart Plan
Moreover, with the unveiling of these research findings, Apple not only solidifies its position as a frontrunner in AI research and applications but also actively contributes to the broader AI community. This act of sharing knowledge has the potential to catalyze further breakthroughs in the field, underscoring Apple’s confidence in its role as a technological trailblazer and its dedication to expanding the limits of what technology can accomplish.
It is worth noting that as Apple contemplates integrating these advancements into its product lineup, a note of caution is sounded by experts. The incorporation of such technologies into consumer products necessitates a careful and responsible approach, taking into account concerns ranging from privacy protection to the prevention of misuse. The social impact of these innovations must be carefully considered and managed.
The trajectory Apple is charting becomes apparent as it not only enhances its devices but also appears to anticipate the future requirements of AI-infused services. Enabling more sophisticated AI models to operate on devices with limited memory could potentially set the stage for a new era of applications and services, leveraging the capabilities of large language models in ways that were once deemed impractical.
In essence, Apple’s latest strides in AI, as showcased in these papers, have the potential to propel artificial intelligence into uncharted territory. What may have seemed like a distant vision—photorealistic digital avatars and robust AI assistants seamlessly integrated into portable devices—is rapidly transforming from speculation into reality, thanks to the pioneering work of Apple’s scientists.