Hurdling the AI Power Wall With Photonic Computing

News2024

Hurdling the AI Power Wall With Photonic Computing

Executive Summary

- Traditional computing based on copper wires and transistors is reaching physical limits just as energy demands skyrocket

- Innovation and productivity now rely on computers, which are increasingly energy-dependent

- Optical computing provides a way to bypass these physical constraints using light but has been hard to scale historically

- Opticore has solved the data input-output problem for scaling optical computing

- The US is unlikely to maintain its lead in AI infrastructure or manage its energy supply without this paradigm shift

- Opticore’s technology makes this shift possible and solves critical challenges in  power, climate, sovereignty, and security

For decades, CMOS has been the dominant technology behind the world’s computing infrastructure, powering everything from smartphones to supercomputers. However, as we push the limits of Moore’s Law and scale up demand for high-performance computing (especially for AI and machine learning), CMOS is facing fundamental challenges. Opticore is a startup company spun off from MIT labs and has developed a unique and foundry-ready photonic computing system for large-scale AI computing with 100x lower in full-system energy cost. By solving the optoelectronic I/O conversion, Opticore technology unlocks the potential of light for high clock rates and overcomes the AI memory wall with data transmission at no capacitive losses.

Innovation and Progress Hits the Power Wall

The emergence of deep learning is revolutionizing social progress and the way information is processed.  With the rise of deep learning, machines may soon be able to perform most human tasks as well, if not better. The promise of this is now far from abstract or science fiction. An MIT study found striking productivity improvements in a chemical research group, in particular automating 57% of "idea generation" tasks which were previously things only people could do. Whether or not Artificial General Intelligence (AGI) is achievable, we are already entering a golden age of accelerating scientific discovery. Historically, such progress  has coincided with rapid improvements in human well-being. The challenge now is that AI, and computation in general are incredibly energy intensive. As noted elsewhere  the limits of human progress  are no longer just measured in floating point operations per second or FLOPS, but in the energy required to achieve them.

Traditional economic growth models like the Solow-Swan model take labour, capital and technology to model economic growth. Extensions of this model have looked at the introduction of energy and particularly usable energy or exergy to further decompose what drives economic growth. In this formulation, long run per capita growth is driven by capital and human labor but both of these are dependent upon energy inputs. In this model and for much of human history the progress of technology was unrelated to energy inputs. For example, Isaac Newton could not have advanced science if he had starved, but even with an extra meal a day it is unlikely he would have produced more groundbreaking work . The industrial revolution, powered by the explosion of usable energy or exergy from fossil fuels improved living standards dramatically through capital accumulation. However, energy did not play a direct role in the production or  discovery of new knowledge until much later - its role was only indirect through facilitating the development of machines or capital. This was the case until the advent of electronic computation. With analog tubes and valves and later with transistors a surge in human knowledge began as tasks once limited to human brains could be offloaded to devices. These devices quickly outperformed humans in certain tasks, starting with code breaking at Bletchley Park and today with the advent of deep learning it seems everything is in play.

Technology remains the key driver of human progress once “catch up” growth is over, but it depends on computation and computation depends on energy. The efficiency with which the world produces energy and turns energy into computation and new technology now defines the upper limit of human progress.

How Much Energy? – the new measure of computing power

Many researchers agree that AI-driven demand is already pushing energy demand well above historical trends. Work by Semianalysis provides a detailed bottom-up estimate of this growth, while  public work by Institute for Progresss (IFP) outlines potential future scenarios. Estimates of data center power demand growth vary, but the supply of energy is growing much more slowly than compute demand and is now the limiting factor. As a result of this growth in computing power may no longer be measured with how many chips can be made per year, but rather the number of computing operations can be done with the short run inelastic supply of energy. There are numerous drivers here but two stand out.

Firstly, Power Use Effectiveness (PUE) in data centers are plateauing as they reach near a theoretical maximum from the rack infrastructure to the chipset devices. The overhead for cooling and thermal management is approaching its minimum, leaving little room for further efficiency gains, especially with the implementation of direct-to-chip and immersion liquid cooling. Google’s fleet of data centers – generally considered one of the  most efficient is nearing thermodynamic limits.

Secondly, the remaining headroom for energy improvements on CMOS is  a topic of ongoing academic debate but all indicators suggest the runway is rapidly shrinking. This short runway means that a paradigm shift in algorithms, network interconnect or hardware architecture is required – and likely all three. The thermal loads are becoming increasingly severe as indicated by this slide from Vertiv, a provider of thermal management systems. Rack density is already hitting a limit of ~1000kW:

NVIDIA’s prescient 2014 vision of scaling the power wall has come to pass, but it now faces fundamental physical limits. Capacitive resistance in electronic circuits constrains clock rates and throughput and data center thermal engineering has done almost all it can.

Scaling AI compute with the current paradigm is expected to drive a significant increase in electricity demand with some estimates predicting a 12-20% rise in power consumption in the United States by 2030.  For example, NVIDIA’s Blackwell chip output will require 7GW of absorption in 2025 alone, but the US is only adding ~2GW of capacity, according to CBRE data. This creates a further problem: these chips cannot be turned on unless a large portion of them are exported to Southeast Asia, the Middle East, or wherever there is adequate power infrastructure. This situation presents serious challenges to US capacity to control access to cutting-edge chips, as recent findings of ineffective export controls have shown.

The Problems With CMOS and How Opticore Fixes Them

The Rupp graph, enhanced with additional trend lies and the «frequency wall» (inspired by The Economist).

Opticore: Solving the memory wall using waveguides as “superconducting wires”

The inherent limitations of CMOS electronics originate from fundamental properties of materials. When switching from being “off” to “on” transistors incur a penalty like stop-start driving in a car. When on, they do not conduct perfectly and continue to dissipate energy as heat. Opticore solves this by using different fundamental properties - those of light.

1. Breaking Performance Bottlenecks with Photonics: By leveraging photonic chips that solve the data input-output (I/O) problem, Opticore enables faster data transfer, greater bandwidth, and real-time reprogrammability, making it ideal for demanding AI training tasks.

2. Unmatched Energy Efficiency: Opticore’s photonic chips transmit data and directly compute using light, which requires nearly no power dissipation in data communications. This results in energy-efficient computing without cooling overhead typically associated with CMOS systems. This argument would have been true for other optical systems, but unfortunately these systems are far from reach due to their large chip size and their energy cost in data I/O for electro-optic conversion. These bottlenecks are overcome recently in Opticore’s unique temporal mapping architecture.

3. Low Manufacturing Risk: Unlike leading-edge CMOS processes (< 5nm feature size), which are costly and difficult to scale, Opticore’s photonic chips are built on mature, proven semiconductor nodes (> 28 nm features size). This approach allows us to quickly scale up production in a cost-effective manner without the need for complex and expensive new manufacturing processes. Opticore can deliver high-performance computing solutions faster and more affordably, while leveraging existing semiconductor fabrication infrastructure.

4. Efficient Scaling: By using time multiplexing in its algorithmic architecture Opticore can achieve speed improvements multiple orders of magnitude over historical optical architectures - as much as one million times faster.

This enables a paradigm shift in both computing power and energy efficiency. The image below compares the performance of Opticore with existing leading-edge GPUs and TPUs, highlighting the significant advantages of this shift with almost 100x improvements in energy efficiency and a 100x improvement in compute density with a pathway to another 100x on top of that for an almost 10,000x improvement in energy efficiency and compute density.

Opticore can allow computing to operate faster and with orders of magnitude energy efficiency. This translates to lower costs and faster scaling of AI clusters, allowing AI to advance at the speed of light, rather than being constrained by the speed of utility permitting.

The Broader Environmental and Geopolitical Stakes

The emerging physical constraints of CMOS transistor-based compute are no secret, and AI has become a frontier of competition globally. Photonics are gaining increasing attention as a solution, as they are fabricated on relatively mature nodes and offer a way to bypass not only the fundamental energy constraints of traditional computing but also the geopolitical challenges emerging from trade conflicts and export controls. Dominance in CMOS chip design and fabrication does not imply continued leadership in photonics, much as mastery in internal combustion engines does not ensure dominance  in electric vehicles (EVs). If nations want to be the first at the finish line, they must be at the start line when a paradigm shift occurs.

This is especially urgent for the United States. As CMOS reaches its limits, AI progress is constrained by the speed of algorithmic advancements and infrastructure development.  While the U.S. remains a leader in algorithmic development and software,  its ability to permit and build new infrastructure — whether power grids, data centers, or semiconductor fabs — is lagging behind. Many developed nations, particularly in the Anglophone world, face political and logistical challenges in quickly scaling up the necessary infrastructure, leaving them at a fundamental disadvantage unless they can bypass these limitations with a technology that significantly reduces the energy intensity of computing. In the U.S., retail politics around power rates are already creating obstacles, and these will only intensify as AI-driven demand for energy grows. Meanwhile, climate goals, which could be jeopardized by rising energy consumption, add another layer of urgency. A shift to a more energy-efficient and powerful photonic computing paradigm is not just preferable – it is existential.

With Opticore and photonic computing, there are no real trade-offs between security, the climate, and AI. You can have it all. Our technology enables you to build chips on mature nodes, enabling fast, scalable, and affordable production –  especially when compared to leading edge CMOS processes. At Opticore, we envision a future of abundant, clean, and secure compute – a vision as clean and clear as a ray of light.

Machine-learning system based on light could yield more powerful, efficient large language models

- MIT news

Find Out More