Advantest Talks Semi

Unconventional Knowledge about PCIe Gen 5 NVMe SSDs & CXL

August 02, 2022 Keith Schaub vice president of Technology and Strategy at Advantest, Edmundo De La Puente, R&D Director for the MPT 3000 and Stan Tsze, MPT 3000 Global Marketing Manager Season 2 Episode 2
Advantest Talks Semi
Unconventional Knowledge about PCIe Gen 5 NVMe SSDs & CXL
Show Notes Transcript Chapter Markers

The rapid evolution of advanced memory and storage technology is helping to transform the modern world at lightning fast speeds. Blink and you could miss the latest innovations. The new Flash SSD memories are blisteringly fast (up to 128 Gbps), powered by leading edge PCIe Gen. 5 & CXL. 

Essential for next generation data centers, artificial intelligence (AI) and 5G applications, the latest Flash SSDs are debuting at the Flash Summit Conference and Expo August 2-4, 2022. Listen in as Advantest Flash Memory experts Edmundo De La Puente, R&D Director for the MPT 3000 and Stan Tsze, MPT 3000 Global Marketing Manager describe what’s changing in the market, why it’s changing, and how Advantest’s MPT3000 is always a step ahead of the competition.

Thanks for tuning in to "Advantest Talks Semi"!

If you enjoyed this episode, we'd love to hear from you! Please take a moment to leave a rating on Apple Podcast. Your feedback helps us improve and reach new listeners.

Don't forget to subscribe and share with your friends. We appreciate your support!

Keith Schaub: Semiconductor memory, a critically important and fundamental building block for everything semiconductor as everything needs memory. Advantest has long been a market leader in this segment. There are all kinds of memory and storage chips: things like SRAM, DRAM, EEPROMs, FLASH, hard drives and solid-state drives or SSDs. Today will mainly focus on SSDs. Some analysts have the SSD market forecasted to reach 86.5 billion dollars by 2030 and it has been growing at a healthy clip of 15% per year, partly due to Covid and the new work from anywhere global economy but also from growing demand for gaming, IoT’s smart connected everything, and the explosion of new use cases for artificial intelligence and machine learning.  

Hello and welcome to Advantest Talks Semi. I'm your host, Keith Schaub, Vice President of Technology and Strategy at Advantest. To help us understand what's happening and where we are going in the memory market, I'm joined by two of Advantest’s memory experts, Edmundo De La Puente, R&D Director for the MPT 3000 and Stan Tsze, MPT 3000 Global Marketing Manager. Edmundo and Stan, welcome to Advantest Talks Semi! 

Edmundo De La Puente: Thanks Keith, glad to be here. 

Keith Schaub: Great, thanks for coming on the show. So, Edmundo if we could, I'd like to do a little memory 101. Can you briefly describe the difference between memory and storage, and what's the difference, and why are they different, and what is an SSD? And how is it different than a generic memory chip? 

Edmundo De La Puente: Yeah of course. Memories such as DRAM or NAND, one key aspect is that it can be accessed directly through their own interfaces. Each type defines the signaling and timing to be able to perform typical operations such as programming arrays and read. Controllers in this case must be designed specifically to talk to these memory devices and would only work with these device types. For example, a DDR4 DRAM controller on an FPGA or SOC will be designed to interface to those devices only. Storage or SSD, they hide the memory behind the controller, and they use a protocol interface to get data back and forth. The controller inside these storage devices handle the memory interfaces and timing and present a consistent image to the host. So that simplifies these interfaces. It makes them more universal and all of these protocols are spec based, and the industry are using those pretty widely nowadays. 

Keith Schaub: I see. So, could we loosely characterize that as memory chips are more like what we do at package test and wafer test for SOCs whereas SSD test is more of a system level test for memory? 

Edmundo De La Puente: Yeah, that's correct. So, from the test point of view, the traditional memory chips are tested just like memory by memory testers and they're focused on exercising the memory inside these devices. Whereas the SSDs, they are devices that are interfacing to host CPU or some other smart component. And from the test point of view, we're looking for basically exercising the interface. We're not looking to exercise the memory in very detail because that has been done already on other steps on the test process. 

Keith Schaub: Okay Edmundo, thanks for that level setting. Stan, memory test trends, so what are some of the emerging trends and emerging technologies that are driving memory storage test solutions? 

Stan Tsze: That's a good question, Keith. Well, if you look at SSDs, SSDs really, if you track back over 10 years ago, started with SAS and SATA drive, they are hugely installed today. Over the past 10 years, we started seeing PCIe SSD growing into the market, right? And you know more recently the last 2-3 years, PCIe has actually taken in growth in a very big way and actually in around about 2019, PCIe actually overtook SAS and SATA in terms of volume shipment, right? And the key benefits of PCIe was that it basically speed and storage capacity, right? It's able to handle much faster speed using a very small form factor with better compatibility and as well as latency, right? Those are the things that kind of have to, you know, make PCIe grow into a huge market today. As we move further along, right now we're starting to see a new memory coming to the market and play and it's a CXL SSDs. And we're starting to see that the benefits of CXL and how it could become a mainstream driver in the coming years. 

Keith Schaub: So, you mentioned CXL, this is something new in the market. Can you give a little more detail on what that is? What does CXL mean? 

Stan Tsze: So CXL stands for Compute Express Link, right? It is a new memory solution to enable us to do more machine learning, AI seamlessly. Right? And basically, the beauty of CXL is able to enable memory pulling. One of the problems that we get into is that a CPU or a graphics chip all have their own, let's say DRAM and memory, and they are unable to share this memory between themselves. One of the things that the benefit of CXL is it’s able to enable that memory sharing and memory pooling. And so, it is basically a big change in the infrastructure, and it looks like a very promising memory solution for like I said, AI and machine learning era. 

Keith Schaub: Great, thanks Stan. So, speaking of speed, let's shift over to the actual protocol. Edmundo, we mentioned PCI Express 5 and it's twice as fast as PCIe 4 and my understanding is Generation 4 topped out at around 16 gigatransfers per second, whereas Gen. 5 delivers twice the speed up to 32 gigatransfers per second using 16 lanes. So, as Stan mentioned, the speed increase, being able to access memory much faster for a lot of these new megatrends like machine learning and cloud centric computing but, can you break that down for us? I mean at a high level, walk us through what is this speed doubling? How are we getting it from Gen. 4 to Gen. 5? And how did the lanes play into this? 

Edmundo De La Puente: Yeah, basically PCIe has been evolving and for every generation the data rate has been doubling. So, like you said, Gen. 4 currently is running at 16 gigabits per second per lane. And you know, these are serial interfaces, they run very fast as opposed to parallel interfaces with traditional memory. So, when we say a lane, a lane is actually a duplex communication method where you have serial serdes interface that for writing data to the device or transmitting data to the device and then you have a separate par of lines for the receiving side. So, it's fully duplex. You can send data at the same time as you're receiving data, so it makes it go basically twice as fast. So, when we say 16 gigabits per second, but because it's duplex, that actually doubles the bandwidth and for example, for a x4 device, the total bandwidth is 16 gigabytes per second. So, it's pretty fast. Now of course this can scale on the width, you know, there are x4 devices, x8 devices, x16 devices. And so, you can increase the data bandwidth by adding more lanes. Now when you go to Gen. 5 and primarily what changes is the data rate. The clock doubles from 16 gigabits per second or the data rate to 32 gigabits per second. And when we go to Gen. 6, it will double again. However, Gen. 6 will be introducing a new method for the physical layer, which is called PAM 4 . And PAM 4, adds a third dimension to how you can control data bandwidth because in PAM 4 we're going to be using a different voltage on the transmission and on the receiving end as well. So, you not only have a clock that determines how quickly you send data back and forth, but also for every unit of time, you can send twice as many bits because we're using our different voltage levels to encode additional data. So, when we go from Gen. 5 to Gen. 6, the clock will not double in frequency. What's doubling is the fact that we're using voltage to encode more bits. But the net result is that Gen. 6 will have double the bandwidth, the data bandwidth as Gen. 5, so it will be going to 64 gigabit per second per lane. 

Keith Schaub: Okay, so actually that's really interesting for me because I come from the RF world, and I'm assuming that PAM here is Pulse Amplitude Modulation. And like you said, you get more bits because you just have amplitude levels of voltage that you can play with. So curious, why hasn't something like that been used in previous generations? I mean that's been around forever in RF and I'm just curious as to why it's only now showing up in Gen. 6. 

Edmundo De La Puente: Well PAM 4 has been shown multiple times at different technical consortiums, at least for the last maybe 5-6 years, and it's just the industry shifting is what's prevented. They figure they can squeeze a little bit more performance using the traditional serdes interfaces and going to PAM 4 as the challenge of, you know, today, typically you worry about the eye opening, which is a single bit, it's either one or a zero. So, you just have a single eye. But when you go to PAM 4, now you're gonna have multiple levels. So, your transceivers, they get more complicated, is more susceptible to noise because now, you know, if you have noise on the system it could impact the voltage levels enough that maybe they cannot be detected on the right logic levels. But it's gotten to the point where if you double the clock once more, the eyes are gonna be so small and the data recovery is going to be very difficult. So, it's kind of the point where, okay, we need something new to be able to scale on the data communications. And PAM 4 is actually the beginning, right? That there's PAM 6 and you can encode more bits, more and more bits on the voltage domain. But so right now it's, it's kind of a switch on the industry is starting to adopt it because of the difficulties of maintaining the single bit and just doubling the clock. 

Keith Schaub: I see. So, really the industry is kind of reached a roadblock and they are forced to implement this new dimension to get the data rates higher. Okay then, just a follow on to that, with PAM 4, right, you've got four voltage levels. What sort of voltage range are we looking at? And wouldn't, like you said, wouldn't that be a challenge? Because the voltage from level zero to level one is now pretty small compared to what it was in previous generations. Is that a problem?  

Edmundo De La Puente: Yeah, that's the challenge right now, which is when in the past the eye height was had more margin, Right? As long as you were within the eye mask, it was easier to achieve. But now the eye masks are gonna be, the voltage is going to need to be more accurate so that you can detect the different levels. But I believe it's the same amplitude, it’s just being subdivided into more levels to be able to send two bits as opposed to one bit. 

Keith Schaub: Thanks Edmundo. Then the next question, what sort of AI applications are driving this need for speed and why is it so important? And how does the speed doubling help? And are there other innovations like lane scaling that could impact the memory chip manufacturing or the memory test solution itself? 

Edmundo De La Puente: The memory bandwidth requirements has continued to grow over time. Primarily because of the amount of data that is being consumed by devices or applications. One example would be autonomous driving vehicles which they use lots of data to manage the systems. There's also been some trends on power. If the only way to get higher data bandwidth is to double the width, let's say, you go from four lanes to eight lanes and you get twice the bandwidth that actually consumes more power than if you go faster and you stay on the same number of lanes. Or somebody could say, hey I don't need additional bandwidth, but I want to save power. They could go from say x8 down to x4 at a higher data rate and that actually will save some power. So, it's really data consumption applications that are just pulling a lot of data and some power management because these devices as they go from generation to generation power keeps going up and that's another concern. 

Keith Schaub: Do any of the Generations 5 or 6 support something like lane scaling? So, say you laid it out for eight lanes but in most cases, you could get away with one or two lanes and save that power for the appropriate speed and then you could scale if necessary but dynamically. 

Edmundo De La Puente: Yeah. So basically, there's this feature called lane masking, where you can scale down, let's say you have like you said 8 bit interface or eight lanes, but you just want to use four. So, then you could do that. That's no problem. They could also use some of the lower power modes, there's another whole dimension on CPUs at least and other devices where depending what you're doing on the application, you could slow it down to save battery or power and but so there's some power management tools, as well as yeah you can reduce the number of lanes, you give up some of the bandwidth, but you know, if you had a system that was used in Gen. 4 by let's say Gen. 4 x8 and you want to cut back some power, you could go to Gen. 5 x4. Well, now that we're gonna have Gen. 6, you could even go to Gen. 6 x2. You get the same bandwidth but you don't have to interconnect eight lanes because routing these lanes on the PCB is not trivial because the speed is so high, there needs to be a lot of care making sure that signal integrity is designed properly. So, connecting two lanes is a lot easier than connecting eight lanes, even though they're going faster. But still, the problem can be managed better than having too many lanes. 

Keith Schaub: Great, thanks Edmundo. So, Stan, before we move on, any comments that you want to add?  

Stan Tsze: Maybe one thing to note here is, you know, just so that we kind of understand the capacity of what Edmundo is talking about. Right? So today we're all talking about zettabytes, right? You know, from, we have come a long way from megabyte and now we're talking about zettabyte as we actually talk about Gen. 5 and beyond Gen. 6 and beyond and CXL, the world is going to come to a term called Yottabyte. One Yottabyte is one million trillion megabytes. And to think of that kind of scale in a data center running AI application and machine learning, that's what the world is heading into and that's gonna, you know, post a lot of challenges. 

Keith Schaub: Actually Stan, that's perfect segue into sort of the next question where we're talking more about the recent explosion of AI and machine learning and how these new applications require high performance computing, but also the tremendous amount of memory. And you just mentioned Yottabyte type size of memory is coming. So, what sort of impact is this having on cloud centric examples? What I mean by that is, cloud-based applications have higher latencies than edge based applications, but a lot of the enterprise workloads are still moving into the cloud and how do we deal with cloud network latencies being slower than edge based, Edmundo? 

Edmundo De La Puente: Well, yes, basically there's now these memory computer architectures where, you know, traditional memory structures, there's a bus between the CPU and the memory and data. Whenever the CPU needs data has to get across this bus. But when you need to get lots of data very, very quickly you want something that has very low latency and the busses especially protocol busses, they add overhead. So, it's difficult to get data fast. So now the question is, okay, how do we integrate memory much closer to the CPU so that some of these computations can be run at a much faster pace than getting data from a server or even within the same system. Right? If you had DRAM on the motherboard to a CPU, there's gonna be some amount of latency before the CPU can get to the data. So, the push now is to move memory closer to the CPU so that these and big amounts so large amounts of data. But yeah definitely the server, the cloud is great for data storage where you don't require very, very low latency, but you need this large amount of data that needs to be transferred back and forth. But some of these new applications, that's not gonna work. 

Keith Schaub: All of this is driving fascinating new memory architectures and you alluded to this I guess in memory compute type architecture. I'm pretty sure this is creating a lot of new key test challenges and companies like Advantest with the MPT 3000 have to address those challenges. So, how are we doing that and how does the MPT 3000 address those types of challenges? 

Edmundo De La Puente: The challenge is you know, basically twofold. The first one is for us, an MPT. You know, we've done a pretty good job to meet the time requirements. We are typically challenged where we're looking at designing a system to test future devices that are not in the market of course, and customers need a solution to get the tests are qualified and the device qualified, all of those things. And this is nothing uncommon with ATE. We're always designing systems with sometimes technology that is not quite ready and so it becomes an interesting challenge from that point of view. But also, cost of test of course, every time these frequency doubles we have to use more expensive components. Even when you talk about PCB materials, it doesn't get any cheaper. But there's always the challenge that we have to maintain certain cost of test. We cannot just double the cost of the system. So that drives us into very creative solutions, have to minimize the impact of just going faster. And then of course power is always, it's always there as a challenge. The devices now consume a lot more power and we want to make sure that our throughput doesn't suffer where we want to maintain our parallism so customers can test many devices at the same time. But as these generations of technology PCIe they get faster and faster, it's a challenge. And we've been able to sort of solve a lot of these issues and provide a solution that does not double the power or cause any of that. 

Keith Schaub: I see, Stan? 

Stan Tsze: Yeah, sure. Until that, you know, we're always seeing a lot of change in, you know, the SSD form factor, right? From M.2, you know, for client base and then there is U.2, U.3, E1.S, E2.S, you can name it, right? There are many, many form factors. So that is the other thing that challenges that we are also having to deal with and finally in a workload, right? We're always constantly thinking about how do we have an architecture that can deal with all these workloads multiple, you know, different protocols and testing at a high volume. Right? And, that's always a challenge. 

Keith Schaub: Great, thanks Stan. What drives having so many form factors? 

Stan Tsze: First of all, different segmentation, right? So, if you look at client and mobile, right, they are always getting smaller and smaller, right? So, end up to, you know, it's a very big factor now that cars are coming up and things like BGA is gonna start coming up, right? So that different application, you know, on mobility and things that drive, it's one driver. So basically, the segment drive different form factor, right? And people are always, you know, as Edmundo was saying right, the power keeps getting more and more on because you are handling more and more data, and so one way to handle it better, a lot of SSD vendors are trying to differentiate by introducing new form factor, right? Look at Intel, this ruler, this is their way of kind of, they are trying to drive that as a standard, right? And you know if driving a standard in the form factor and a lot of people adopt it, it actually helps them, you know, it's about market share as well, right? So basically, different form, you know, different segmentation and different applications will continue to drive all this form factor on and on. I don't think it will stop, it will keep going on and going, and every generation you see someone is trying to come up with a different form factor and as a way to kind of edge the competition right? You know, I think Intel is trying to take over the data center world by coming with this long ruler. They haven't been very successful yet, but I think that's what they're trying to drive that. 

Keith Schaub: That brings us to the end of the show. I'd like to thank each of you, Stan, Edmundo, for coming on today and helping us understand all the exciting innovations happening in the flash memory market and best of luck with the Flash Memory Summit. 

Edmundo De La Puente: Thanks Keith, that was fun.  

Keith Schaub:  Welcome to Advantest Talks Semi After Dark. Continue listening into the post show discussion. So, at our Advantest VOICE conference we had two key notes, one from David Eagleman, he's a Neuroscientist at Stanford University who wowed the audience with some very cool sensor technology interfacing with the human brain, and then we had Manish Bhatia, the Executive Vice President of Global Operations at Micron Technology, and he also wowed us with the latest memory innovation and what's happening in the memory market. And we've also recently seen companies like Neuralink that have introduced BMI, Brain Machine Interfacing, which directly plugs into something like a pig's brain. So, sort of a futuristic thing, do you think we'll ever see a memory type drive that plugs into human brains and if so, are you excited about the possibility of developing a test solution for that? 

Stan Tsze: Sure, that's a very interesting question. Well, you know, the world is always full of you know, new development and possibility and I actually come to think of it sometimes I actually won't be surprised that would happen right one day. You look at how solid state, you know, today with the recent advancement over the last 10 years, solid state has become more efficient, more affordable, faster, cheaper, smaller in scale. It has gone from you know, database, data centers, to computers. It's you know mobile devices and more recently you're seeing the applications starting to go into cars, automobiles. So why is it, why would it even stop there? You know? And so there is a possibility that one day you and I could be swapping how an SSD drive onto our body, you know as technology progress and that you know for one thing is for sure, SSD is going to continue to you know advance. It will become smaller and reliable, and you know and if AI machine, it's going to be realized it could connect the dots into a human, right? Not only just memory but, you could also bring AI into the human body as well. We can see some application that's happening today. 

Keith Schaub:  Wow, very interesting. Thanks, Stan. Edmundo, how about you? What do you think? 

Edmundo De La Puente: Well thinking about the actual device, right, I would assume that this device would be something very, very small that it's implanted somewhere in the kind of similarly to some of the challenges that we have today, for example testing memory chips that go into packages using TSV where the handling is just very, very challenging. The pitch is so small probing these devices. So, what came to my mind was testing the memory itself probably be just the same. No difference, memory is memory but, whatever interface this device will have, that's where the challenge would be. And if it's a controller type device with some standard interface, somehow it would have to be, okay, how do we handle the device during test, how do we probe it? And once we figure that out, I think it's the rest of it would be sort of the same old traditional memory test, but definitely very, very interesting area because of, I would think that the big challenge is just the size. I'm assuming it has to be something super small that can be implanted. And then, you know, how do you handle that device for test? That's sort of the big thing I would say. 

Keith Schaub:  Great, thanks Edmundo and thanks Stan. That's a wrap, that does it for Advantest Talks Semi After Dark. Join us next time on Advantest Talks Semi. 

Semiconductor Memory
Introduction
Memory vs. Storage
Memory Test Trends
Speed Doubling, Gen 4 to Gen 5, and Lanes
PAM in Gen 6
Challenges with Voltage
AI Applications Driving Speed
Cloud Network Latencies vs. Edge-Based
Addressing New Key Test Challenges
Drivers of Form Factors
Outro
After Dark: The Future of Memory Drives