Lightmatter Mars SoC AI Inference Using Light

7
Hot Chips 32 Lightmatter Mars Cover
Hot Chips 32 Lightmatter Mars Cover

Concluding Hot Chips 32 was perhaps the most profound talk. The Lightmatter Mars chip aims to do AI inferencing computation by modulating the wavelength of light. This optical computation means that the calculations can be done fast with the majority of power being consumed outside of the optical array performing the calculation.

Hot Chips 32 Lightmatter Mars
Hot Chips 32 Lightmatter Mars

To be clear, this is probably not the technology that will be in your data center in 2020/ 2021, but it may be the most profound technology challenging an industry norm presented similar to Carbon Nanotube NRAM Exudes Excellence in Persistent Memory at Hot Chips 30 or Cerebras Wafer Scale Engine AI chip is Largest Ever at Hot Chips 31.

Lightmatter Mars for AI Inferencing

Taking a step back, the question is why would one use silicon photonics instead of traditional transistors for AI inferencing. The basic idea here is that light travels faster, scales better, and uses less power. The challenge, of course, is that there are decades of doing computation using electric transistors.

Hot Chips 32 Lightmatter Why Photonics
Hot Chips 32 Lightmatter Why Photonics

Perhaps the biggest savings derives from transport. Since an optical chip is trying to modulate light rather than turn on and off a charge gate, and for chip distances the amount of electrical loss is nearly zero, the computation portion of the chip can be made very efficient. Indeed, the majority of the power is used to convert digital signals to lasers then to read the output and convert it at the other end. A way to think about it is if you connected two switch ports using QSFP28 lasers, and the cable connecting the switches was somehow performing calculations, then the biggest power cost would be the QSFP28 lasers.

Hot Chips 32 Lightmatter Optical Data Transport
Hot Chips 32 Lightmatter Optical Data Transport

These systems provide observation of phase shift through interference. Effectively by observing how light changes through the structure, one can see the results of the calculation.

Hot Chips 32 Lightmatter Mach Zehnder Interferometer MZI
Hot Chips 32 Lightmatter Mach Zehnder Interferometer MZI

There are a number of ways to operate phase shifters. Thermal phase shifters are usually slow, in the KHz range. P/N junctions are large but commonly used in high-end optics. A nano optical electro mechanical system (NOEMS) uses a small amount of charge to move the waveguide. This is what Lightmatter uses. This provides low loss and the static power requirements are nearly zero. Capacitance is very small. This works in the 100’s of MHz speed so the Mars chip operates at 1GHz.

Hot Chips 32 Lightmatter Programmable Phase Shifters
Hot Chips 32 Lightmatter Programmable Phase Shifters

Lightmatter uses directional directional couplers means that one gets a 2×2 matrix multiplied by a 1×2 vector. Effectively Lightmatter is using more than a simple MZI shown above.

Hot Chips 32 Lightmatter Optical Vector MAC
Hot Chips 32 Lightmatter Optical Vector MAC

These arrays are built into larger arrays. This is just a small setup but they can be built into 1000’s of MZIs. The company said that it believes these are being manufactured in a reliable manner.

Hot Chips 32 Lightmatter Arrays Of MZI
Hot Chips 32 Lightmatter Arrays Of MZI

DAC on the side of the square encodes data on one side of the array. Then it is detected and re-converted to digital on the other side. 64 DACs and ADC mean one gets 4096 MAC operations. The power scales with the square root of the area because most of the power is for the conversion on either side. This is different than classic chips.

Hot Chips 32 Lightmatter Data Conversion At Edges Of Square
Hot Chips 32 Lightmatter Data Conversion At Edges Of Square

The design can multiplex into different wavelengths of light which increases the ability to perform more computation per cycle.

Hot Chips 32 Lightmatter Parallel Processing
Hot Chips 32 Lightmatter Parallel Processing

The Lightmatter Mars Photonics Core operates at 1GHz which is mostly driven by how fast it can modulate the NOEMS charge.

Hot Chips 32 Lightmatter Mars Photonics Core
Hot Chips 32 Lightmatter Mars Photonics Core

Not everything makes sense to do with light. Photonics is the core, and this is the SoC around it. It is small but has a 30MB of SRAM for the cache. That is not enough memory to run huge models, but it is enough for smaller models. The company said during the presentation that it is looking at a larger memory. It said 4-stack HBM3 will not get the bandwidth they need.

Hot Chips 32 Lightmatter Digital System Mars SoC
Hot Chips 32 Lightmatter Digital System Mars SoC

The SoC holds weights next to the weight DACs to minimize data movement.

Hot Chips 32 Lightmatter Digital System Mars Digital Architecture
Hot Chips 32 Lightmatter Digital System Mars Digital Architecture

The photonics array acts as the ALU. Some of the other tasks that do not go to the Photonics MAC array.

Hot Chips 32 Lightmatter Digital System Activation Pipeline
Hot Chips 32 Lightmatter Digital System Activation Pipeline

Large batch sizes mean less data conversion. As a result, the chip becomes more efficient.

Hot Chips 32 Lightmatter Digital System Weight Updates
Hot Chips 32 Lightmatter Digital System Weight Updates

Most of the power is used by the digital side. The 3W TDP includes the digital as well as laser power. This is an all-in 3W TDP.

Hot Chips 32 Lightmatter Mars Power
Hot Chips 32 Lightmatter Mars Power

Using 3D integration, there is less than 1mm of routing. That saves even more power because data does not have to traverse a huge distance to travel between chips.

Hot Chips 32 Lightmatter Mars 3D Integration
Hot Chips 32 Lightmatter Mars 3D Integration

Lightmatter is building hooks to integrate its chips into popular AI software frameworks.

Hot Chips 32 Lightmatter Software
Hot Chips 32 Lightmatter Software

This is a picture of the Mars development board.

Hot Chips 32 Lightmatter Mars Summary
Hot Chips 32 Lightmatter Mars Summary

Final Words

Right now, the Mars chips are back and in the lab. Let us be clear if they work, and they have a path to scale, then Lightmatter is going to get purchased for a lot of money.

Even though this was the last presentation of Hot Chips 32, it was perhaps the most profound. While other companies were making alternatives to NVIDIA GPUs for AI chips, Lightmatter is making something uniquely different.

7 COMMENTS

  1. That’s incredible. I especially like this quote:

    “A way to think about it is if you connected two switch ports using QSFP28 lasers, and the cable connecting the switches was somehow performing calculations, then the biggest power cost would be the QSFP28 lasers.”

    I have to re-read this bit again: “and the cable connecting the switches was somehow performing calculations”. Incredible.

    The slide on parallel processing is equally impressive. “Single instruction multiple data” type parallel processing, but, the “instruction” is basically a series of prisms and mirrors? Such that simultaneous differing wavelengths are mangled by the instructions in the way intended, allowing it to operate on all of these wavelengths at the same time. And somehow perform computations with this. Wow.

  2. Hmm.. The energy and performance numbers presented were worse than what can be done with conventional digital electronics. Why is this interesting at all? Why are people spending time on this?

  3. @William

    People said the said thing about the first SSDs and look where we are now. You can’t just look at the raw numbers of an immature technology compared to a very mature technology and conclude the entire venture is worthless. I think the idea here could be interesting and it might pan out to be a disruptor in the long term. You have to look at it like https://thinkingscifi.files.wordpress.com/2016/12/s-curve.jpg

  4. It’s kind of silly seeing an optical strand coupled on the side of the package when they should be integrated beside it. Might still be cheaper to do and align rather than going full PCB integrated, though years ago when I was with Compaq/HP, TI was working on a coupling that worked that way.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.