Advantest Talks Semi

More than Moore or More Moore? Here is what’s next!

September 26, 2022 Keith Schaub Season 2 Episode 3
Advantest Talks Semi
More than Moore or More Moore? Here is what’s next!
Show Notes Transcript Chapter Markers

What's Next in the Future of High-Performance Computing? In this episode, experts will discuss the plans for the coming era of computing and will reveal how the semiconductor industry is continuously evolving to address the looming high performance compute challenges -- moving us beyond Moore's Law. Listen in as John Shalf, Department Head for Computer Science at Lawrence Berkeley National Laboratory, and former deputy director of Hardware Technology for the Department of Energy Exascale Computing Project, helps us uncover how the semiconductor industry will revolutionize HPC over the next decade. 

 KEITH SCHAUB: In 1965, Gordon Moore proclaimed Moore's law stating that the number of transistors in dense integrated circuits doubles about every two years. This law, which is not really a law, but an observation, has largely remained true for the past 60 years but recently has run up against more formidable laws. The laws of physics. Transistors on CPUs are now just a few atoms in size, and the power and heat challenges at these atomic scales have begun to impact performance gains and shrinking further, making them increasingly complex and expensive. Although semiconductor experts have been warning of the inevitable end of Moore's law for the past ten years, our industry has continued to extend the law through innovation, but just how long can we realistically continue? And what are some of the promising technologies on the horizon that could change how we compute over the next decade?

Welcome to Advantest Talks Semi. I'm Keith Schaub, your host, and that's the focus of today's episode and to explain this amazing journey we have been on for the better part of a half-century and where our industry could be headed. I'm joined by John Shalf, the current department head for Computer Science at Lawrence Berkeley National Laboratory and formerly Deputy Director of hardware technology for the DOE Exascale computing project. John, Welcome to Advantest Talks Semi. 

 JOHN SHALF: Thank you very much.

 KEITH SCHAUB: So, John, I know you're giving the keynote this year Wednesday, I believe, September 28th, at the International Test Conference. Congratulations on that. It's titled the future of high-performance computing beyond Moore's law, but before we dive into that, can you tell us a little about yourself and your current projects, the initiatives that you have at Lawrence Berkeley National LABS, and what is the DOE Exascale Computing project?

JOHN SHALF: Sure. I can start with the DOE. Exascale Computing project. It's a $3.5 billion dollar project to reinvent HPC for the future in light of the end of Dennard scaling. When Dennard scaling ended, the parallelism of machines doubled each generation, and we also had new technologies such as GPUs to accelerate computing. Those required a deep rethink of our software or hardware and applications so that we can continue to serve science at the DOE through supercomputing. When I was in that project, I was mostly involved in working with industry to ensure that advanced technologies were delivered to the DOE through the path forward program so that they're able to build the first Exascale machine, which debuted at Oak Ridge National Lab in June of this year. And so, they finally hit the Exaflop. But we had to make 390 million dollars of investment in industry in order to ensure all the pieces were there. In my role right now as department head for computer science where I lead a computer architecture group. And so, we have an advanced group that does advanced languages and systems software like UPC++. We have a performance analysis group that also does advanced algorithms like genomics. And in my group, we are focused on computer architecture and beyond more microelectronics. So we're studying together with our friends who are material scientists, new transistor technologies, new models of computation such as multi-valued logic and even superconducting logic, and trying to understand how that changes the way that we would approach computer architecture and what kind of performance gains could be realized.

 KEITH SCHAUB: Thanks, John; the next decade promises to be one of the most exciting yet in the further evolution of computing. There are a number of developments that will change how we compute in 10 years. The foreseeable end of Moore's law will lead to the exploration of new architectures and the introduction of new technologies and high-performance computing or HPC. Let's start with the foreseeable end of Moore's Law. I think, in my opening, I was remarking on we've been hearing about the end of Moore's law for the better part of the decade, but somehow the industry continues to keep extending it. Can you walk us through what you're seeing in the last five or ten years and what prevents the industry from just continuing to shrink?

 JOHN SHALF: Sure, I get myself in hot water with this sometimes. I do believe in more Moore, but it's going to be through a different means than we have in the past. The digital economy has been driven by shrinking the transistor, and then you get a performance gain from shrinking the transistor, which enables you to sell more products, which enables reinvestment in continuing to shrink transistors. We're seeing that virtuous cycle start to fall apart because the shrinking of the transistor alone isn't giving us the kinds of performance gains that we used to get in the past. You know, the first shoe to drop was the end of Dennard scaling when by shrinking the transistor, we could no longer reduce the supply voltage, and so that caused clock rate improvements to stall. But we were able to continue to gain more performance by doubling the number of processing cores on each chip each generation. However, that's starting to fall apart too. For the past decade, the things that's not often discussed is that the actual size of the devices or the transistors and the metal wire thickness is no longer tracking the actual nanometers that the LABs talk about when they say it's, you know, 10 nanometers or 7 nanometers or five. There's no longer a direct correspondence to those nanometers and the actual number of nanometers and the thickness of those wires. This is leading to each process generation does offer improvements, but they can be things like leakage or other things that are not directly related to the kind of performance we used to see for the past 50 years, and we're seeing this now in HPC if you look at after the Exascale machine was added to the top 500 list. Erich Strohmaier, who is one of the managers of that list over the past 30 years of parallel HPC in the top 500 list, they've observed 1000 X improvement in the performance of HPC systems that are number one on the list. And really, the sum of all the HPC systems on that list every 11 years, we get 1000 X improvement. In the past decade, we've only seen where the new projection is that we will only see 10 X improvement every 11 years. So, we've gone from 1000 X to 10X, and that's the manifestation that the current approach to performance improvement through shrinking transistors is no longer a driving factor. And so that leads us to think of new approaches to continue that kind of Moore's law progression. But it's not simply from FAB processes alone anymore. 

KEITH SCHAUB: So really, we're losing two orders of magnitude of improvement over the next decade.

 JOHN SHALF: If we continue with business as usual. And so that's why the research now is to continue Moore's type scaling but through a different means, and there are many options available, fortunately, but we don't know which one is the winner or what combination will win.

 KEITH SCHAUB: Spend a second to tell us about Dennard scaling.

 JOHN SHALF: Yeah, So, you know, complementary to Moore's observation that it's really a techno-economic theory that the economics would lead them to double the number of devices that could be crammed onto a chip, and it was Gordon Moore's who said "crammed" in the title of his paper Dennard, who was a scientist or researcher,  IDMs observed that when you shrink the transistor well the capacitance of that transistor goes down in proportion to the area reduction and then the amount of voltage that you needed to completely turn that transistor on and completely turn it off would also be reduced. And in so doing, energy efficiency would improve to the point that you could actually double the clock frequency each generation, in addition to doubling the number of devices you could cram on the chip, and all of this was enabled by that voltage scaling. But we finally once, once you get down to like 0.7V between 0.7 and 0.5, it is no longer possible, at least with silicon transistors that we have today, to completely turn the transistor on and completely turn the transistor off anymore. And so, we're no longer able to reduce those voltages, and as a result, the heat density would increase each generation as you cram more devices on. So that ended the exponential increase of clock frequencies that we've seen for so long now. So that ended Dennard scaling. You know the industry has responded by doubling the parallelism of these chips each generation, and so they used their transistor budget for that, and that created a real headache for software technologies that needed to exploit parallelism explicit parallelism and in order to continue to extract more performance from those parallel chips.

 KEITH SCHAUB: I remember back in the early two thousand's when you kind of hit this three, four gigahertz wall and processor speed, and we've kind of stayed there, and now everything is multi-processor, multi-threading, and that's still an active strategy like you said, that creates new problems with software having to manage all of that and then there's something called dark silicon. What is that?

JOHN SHALF: Yeah, so Dark silicon is that you can engineer more functionality like accelerators. I'll take an example of like a cellphone chip but will have dozens of discrete accelerators for different functionalities like image denoising or echo cancelation, things like that. You don't operate all of those accelerators simultaneously. In fact, if you did, you'd melt the chip probably, and so dark silicon is a strategy where you use that chip area to include more functionality than you can actually turn on at any given one time. But that functionality can still accelerate targeted parts of the application. And so that's another strategy. Ubiquitous accelerators but not having all of them on all the time.

KEITH SCHAUB: What's the software to manage those accelerators? Is that done by the EDA vendors, or how is that done?

 JOHN SHALF: The ability to turn them on or off efficiently is part of the enormously complicated software stacks that come with these kinds of chips, and cell phone chips definitely have very well-engineered and very complicated software stacks to manage these accelerators.

 KEITH SCHAUB: We've talked about some of the fundamental physics that we're fighting against. What about the idea of more than Moore where we've got now these gates all around, and we're going up using a 3D packaging or 3D transistors? We've got heterogeneous approaches now that disaggregate these large SOC designs, which is really interesting because when I was doing this in the mid to late nineties, it was all about integrating as much as you can into a SOC, and now we're reversing that trying to break it apart into these chiplets. But can you talk a little bit about what is this more of Moore and how are we getting potential improvements from the TSV densities or the reduction and energies per bit?

 JOHN SHALF: One of the challenges, as I mentioned, is that just because of the fundamental way the transistors operate, they have this 60 milli volt per decade slope to be able to turn them on or off. And once you get below 0.7 volts here, is becomes very difficult to turn the transistor completely on or completely off. And once you get into that sub-threshold region, either you leak a lot, and you consume power, or you have to operate at a lower clock frequency, and the transistors are less efficient that way. But the other part of it is that with that supply voltage being stuck at 0.5 to 0.7 volts, the transmission of information across the copper circuitry on the chip becomes dominant in terms of the power consumed. The longer the distance that you have to move information electronically across those wires, the more power it consumes. This tighter integration by going to 3D. It does have the advantage of it reduces the data movement distances, and so we can save a lot of power on that. But simultaneously, the challenge is heat density. So the transistors are still consuming power with logic, it's very heat-intensive, and when you stack logic on top of logic, you get eventually hit the thermal limit where you can no longer remove heat as fast as being generated, you know. So, you can see the third dimension helps a lot. We can stack cool things like memory technology on top of logic, but there's very little limited depth you can stack before heat density becomes a challenge, and that's why we're looking at these alternative transistor technologies like negative capacitance field effect transistors or MESO (Magneto-Electric-Spin-Orbit) devices that operate at 100 mV instead of 0.5 volts and therefore, we could reduce the energy intensity. Now, it opens up that third dimension for stacking. For gate all around, I think it's really about the transistor to be able to have a better current drive by having more effective field over the active area of the silicon. But again, it's copper wires about as good of a conductor as you're going to get a room temperature. And it doesn't help us with the fact that the copper wire, the resistance-capacitance are not going to improve each generation. These exotic transistor technologies are really interesting, really promising. However, since they are still ten years out, we have to look to alternative methods, and that's where the chiplets and architectural specialization come in. Microprocessors actually are very flexible but also very inefficient beasts. And if you can create a custom processor that targets different algorithms or different parts of your application, you can get enormous benefits in terms of performance and also use fewer transistors to achieve those performance improvements. But the challenge there is like you're saying, with the SOCs and the dark silicon eventually, you know, you only get to use 10% of the silicon for each functionality, and that starts to become area inefficient. And, it also you might end up having an application workload where 90% of your chip will never be used if you try to throw everything in the kitchen sink into your SOC so, chiplets is a way that you can specialize or so create a specialized accelerator but put it onto a chiplet. Then you can arrange them like puzzle pieces or Legos to create SOCs or packages that target different subsets of applications or workloads, and then you recover your area efficiency because you only need to co-package accelerators that need to be used together for that targeted workload.

KEITH SCHAUB: What's equally interesting about that and what Advantest is doing some work in this area is when you break all of these things into these chiplets and think of them as Legos and putting them together, it becomes extremely complex to actually put the right chiplets with the other right chiplets just because you put two passing chiplets together doesn't mean it's now going to function, we are looking at all these machine learning algorithms to help marry correct chiplets with the other correct chiplets so that you get this high yielding product at the end.

 JOHN SHALF: The other advantage of chiplets is known good die. You can test and get your known good die but with smaller dies that have a higher chance of working, and then there's a great paper from AMD from last year's ISCA, which is the Integrated Circuit Computer Architecture Conference. But it explained their logic for going to a chiplets approach they get known good die testing. So, they get the lower risk that the SOC has failed because each of the chiplets has a smaller area. And, then, by integrating them together as chiplets, it consumes like maybe 9 or 10% more area, but it's a fraction of the cost in comparison to a reticle-limited die. They can also build systems that actually have a package that isn't reticle limited, so they can increase their functionality by increasing the die the effective package silicon area.

KEITH SCHAUB: You mentioned energy consumption and that's one of the primary bottlenecks that we have is just moving the bits back and forth is a large part of the problem. I've seen a lot of papers and research being done for this in memory where now if we're using 3D. You can put the memory right on top of the CPU. And I've seen even where the memory and the CPU are built together to sort of have in-memory compute, walk us through, what does that mean and what are sort of the tradeoffs with it.

 JOHN SHALF: In-memory compute again, its data movement when it comes to copper wire, its resistance is fixed. The longer the distance that you move that bit across the copper wire, the more energy it consumes. And the innovation of in-memory compute is the notion that well, then move at a shorter distance by moving the memory closer to the compute or just abolish the distinction between the memory and the compute. That's great. But there's just so many different concepts over time. Since this date back to before von Neumann, he's known for the von Neumann machine. But in fact, all of his machine concepts prior to that involved effectively in-memory compute he went to the von Neumann machine because they realized that it's very difficult to get the correct ratio of memory capacity and compute capacity for the different parts of an algorithm. And so, by pooling the memory together and feeding it through that interface to the processor, they have more flexibility in establishing that ratio between the memory and the computer, and that's where the von Neumann machine came from. But they have things like IBM, Peter Kogge, and his processor and memory in the 1970s. You have data flow machines in the seventies and eighties that kind of abolished that interface. There're so many different concepts; somebody wants to say in-memory compute, they have to be a lot more specific about which of those many, many concepts over the past 100 years they're referring to. It is the notion of if I move the memory closer to the processing element, it will reduce data movement, distance, increased bandwidth, reduce power, but it also creates a host of complications, and there isn't any magic way to make them go away. Such as not being able to right-size the amount of memory for a variety of algorithms, each algorithm will need a different memory footprint, and to some extent, cash is a processor and memory thing. It's just virtualizing your access to the off-board memory. Nice conceptually. But the devil's in the details, and it creates as many complications as it solves.

 KEITH SCHAUB: To pivot over to test for a moment since that's where we actually have a lot of challenges, too, so when we do this at production scale, we call it pop memory, and the processor may go down, and then the memory has to go on top. It's a mechanical nightmare to get that to work. And the same memory with a different processor may not give you the same performance that you got with the previous processor; like you said, the devil's in the details. It does solve a lot of problems, but it creates a lot of new problems and challenges that we're working through in the test industry.

JOHN SHALF: Yeah, I'm not saying it's a bad idea. I'm saying that it's an idea that's been very well studied and continues to be challenging.

 KEITH SCHAUB: We've gone through physics problems and more than Moore and in-memory compute. Let's shift gears here and go back to exploring a variety of new architectures and new technologies. You gave us a sneak peek into some of those when you mentioned them earlier. What are some of the new architectures that you see as promising and that you see that the industry is working on?

 JOHN SHALF: So, I'm really excited to see, you know, data flow concepts coming back into vogue. There's a lot of interesting work in that area. It creates a very interesting challenge for algorithm developers to rethink their problem from the standpoint of an instruction processor where when you write your FORTRAN or your C code that, you're implicitly doing loops and taking, you know, stepping from one line to the next line to the next line and instead actually laying all of that logic out as a data flow graph and flowing the data through that graph. It does make data movement distances very explicit. And so it's kind of exciting that it is a paradigm for expressing an algorithm where you can explicitly reason about data movement distance and reducing it but at the same time creates a lot of challenges for the way that you express programs because it's no longer a straight-up instruction processor. There's also a different direction is like I was talking about with the chiplets would be enabling technologies is a concept of extreme heterogeneity where you have a lot of different kinds of heterogeneous accelerators that, each of which is responsible for a different subpart of your program. And the question is how much specialization you should include when you lose that generality that a general-purpose processor has, you gain enormous efficiency, but then you narrow the space that processor is useful for. What's the goldilocks balance between specialization and generalization? It's clear that there are limits to staying fully general purpose, but how does one manage that in the software environment, and how do we manage the IP costs because the biggest ticket item in terms of cost of new processor accelerator design is the design, the verification and then the software technology that goes with it. The FAB costs are not dominant designing and verifying and software are always the lead cost for these things. But extreme heterogeneity is definitely something that's already happening, it's already taking place in your cell phone, and it's already happening in mega-scale data centers as you see Google and Amazon start to create their own specialized SOC designs to serve their particular needs of their workloads.

KEITH SCHAUB: First chip costs $1 billion dollars, and the rest are free.

 JOHN SHALF: You know HPC had this huge transition back in the eighties and nineties from custom Cray type vector processors to adopting what they called it the attack of the killer micro's that we discovered that we didn't have enough volume in order to have the research reinvestment in order to stay ahead of the broader microelectronic industry. And so there was an economic play which was to adopt the 80/20 rule where 80% of the market supports the development of these microprocessors. That enabled us to stay at the crest of the wave in terms of technology development, but we could innovate on the 20% that really mattered and differentiated HPC systems from the commodity commercial off-the-shelf processors, and we've adopt this for the past 30 years to good success, but now with Moore's law tapering off, it is becoming an economic challenge for us to continue that path forward because we need to move to having custom architectures again which is kind of what it looks like we're moving back to. Then how do we afford to do that? And the answer is actually something that you just said that triggered me when you look at the broader industry, in particular, the hyper-scale data centers. Once they create a chiplets that has a particular specialized functionality to it, you've paid down that billion dollar cost of the first chiplets and so IP reuse, having that chiplets being able to reuse it for different applications now we can actually have our cake and eat it too where you could afford to customize by arranging chiplets and chiplets that you've already made in the past can be reused for other applications and so that's an interesting industry direction. Perhaps we in HPC should learn more about and follow.

 KEITH SCHAUB: The architecture, as you mentioned before. Where are we in the maturity or the go-to-market go-to-production state? Is it still very much R&D, or where are we?

 JOHN SHALF: The big driver in the market right now is, like I said, the hyper-scale data centers and, in particular, Machine Learning, AI, and Deep Learning workloads. Follow the market; the biggest margin in the area for growth is in machine learning. These data flow architectures have actually been developed and targeted at machine learning. Now, we're learning how to generalize them so that we could also use them for other workloads like our scientific workload, and that is kind of in its nascence, but there's a lot of companies doing the data flow stuff you've got SambaNova to some extent the Cerebras is a kind of data flow system. Its got many, many wafer-scale engines is many little microprocessor cores, but each of them acts like a node in the data flow graph, and it's got an event-driven system or moving stuff through that data flow graph that extends all the way across the wafer. They're in production today. And the question is how we can use them for HPC and leverage that kind of architecture for HPC.

 KEITH SCHAUB: Alright, so John, great. Let's then move on to the new technologies. We just talked about some architectures when you and I talked last week talked briefly about silicon photonics, and if we can convert these electrical signals perhaps into photons, then that may solve a lot of problems. What is silicon photonics, and why is the industry investing in this new technology?

JOHN SHALF: With photonics, once you get it from the electrical into the optical domain, resistance is no longer the lost leader. Going to centimeters versus going to the other end of the data center is nearly the same cost. That is why people want to move into the optical domain. The second is why silicon photonics? Well, you know, the observation is that silicon actually can use standard CMOS fabrication techniques to fabricate waveguides onto silicon and with patterning in the old way that we used to do dense wave division multiplexing, which is how you get these super high data rates over a fiber. The old way we used to do is the discrete assembly of gratings and different components to make up wave division multiplexing optical channel. And that's extremely expensive. It requires precision assembly. It's not practical to scale up. But with silicon photonics, I can engineer a bunch of ring modulators. Each one modulates a different frequency of light and then have all those different frequencies of light go into an optical fiber, and now I can have like, say, 16 channels each running at 16 gigabits per second, and I can get 25 gigabytes per second of bandwidth by having the light encoded in different colors and going down that fiber. Likewise, the detectors on the other side. So, the cost of one ring versus the cost of hundreds of rings ends up being in the noise in terms of cost because I'm able to use that lithography process. What this is about is, you know, we get enormous bandwidths by co-packaging electronic things. That's the whole chiplets thing, and in fact, GPUs get high band with memories. They get that memory bandwidth by having them inside of the package so that they're like right next to the GPU. And that enables you to have super high wiring density, so that same kind of parallelism that we use to get performance. And it also has a very short wire length so that we can operate them at a very high frequency without burning a hole in the side of the thing. We can do the same thing with silicon photonics where you could co-package the optics like right next to the GPU or whatever or a switch chip and use that wide and slow approach where you can have hundreds of frequencies of light, each encoding at a lower SerDes and 100 gigabits you can encode at 16 gigabits each. But with that many different frequencies of light, this wide and slow approach, I can get terabits per second into the fiber and then escape it off the chip, and then I can go anywhere in the system with that enormously high bandwidth by using the co-packaged optics. So, this is being pursued first by the network vendors but also being looked at very carefully by the processor vendors to improve their ability to get bandwidth off of the chip because we've kind of hit a limit, a physical limit in terms of how much bandwidth you can escape from a chip just using electrical connections.

 KEITH SCHAUB: I think what I'm hearing is we've gone from this nice to have to more of a must-have. And then, when that happens, the industry gets compelled to solve a lot of these challenges. But you mentioned some other things about the margins and how there's room enough there to actually make these investments. Can you say a few words on that?

JOHN SHALF: Well, so certainly that when we first were contemplating the Exascale competing project, we had discussions with industry about high bandwidth memory, the co-packaged and stacked memory technologies, and we considered it. And we had those discussions in 2009, it was considered extremely exotic technologies, and the industry knew that it would be enormously expensive to realize that in a high-volume production process, saying it probably wouldn't happen. But then, over time, especially, it hit the GPU vendors first. They realized that they could not advance the performance of their product unless they adopted this technology. So, although it was ridiculously expensive and difficult to do, there wasn't any other choice. The other technologies simply weren't going to deliver. And that basically enabled that packaging technology that co-packaged memory to launch off the ground. Because there was the machine learning market generating products that have extremely high margins, and if it's a must-have industry, it will find a way. And so what I'm seeing is this co-packaged optics where they've pushed SerDes about as far as it can go before the power consumed by the SerDes starts to match the amount of power consumed by the compute logic on board, in which case you're kind of hitting a wall there. And so, the industry players that like Broadcom and Cisco and the GPU vendors can see the end of their road maps for this and are looking at co-packaged optics is the way to increase their escape bandwidth to the point that they can continue to advance their product performance.

 KEITH SCHAUB: Okay, so John, that brings us to the end of the show. First of all, I'd like to thank you for coming on today, and it was extremely insightful for me and for the audience on understanding high-performance compute challenges in the industry and what's facing us, and the exciting innovations happening to move us beyond Moore's law.

After Dark: 

 KEITH SCHAUB: Welcome to Advantest Talks Semi After Dark. Continue listening into the post-show discussion. 

 
KEITH SCHAUB: So, John, we had just finished talking about Moore's law and running up against Moore's Wall and heterogeneous packaging and chiplets and all these new architectures and technologies. Talk about that a little bit more. We often hear things like quantum computing will save us or graphene and carbon nanotubes. There is a lot of investments on nanomagnetic logic, which I hear is actually quite promising. So, what are your thoughts on these technologies? Are they replacement technologies? Are they complementary technologies? How do they fit into this story? 

JOHN SHALF: I will address the new models of computation first. Some folks say that, oh, because we hit the end of the road for digital electronics. Then we'll just shift to quantum, and so that will solve our future supercomputing problems by going to a quantum or AI, neuromorphic, you know, machine. And the answer to that is, I think quantum computers are great, but they aren't a replacement technology. It expands, competing into an area of combinatorially difficult or hard problems that cannot be solved adequately using conventional digital computing technology. But it's not a replacement technology for what we do with digital, and for that matter AI also is expanding us into data-rich programs. It's amazing what they can do with data AI, but it also is not a replacement technology. A way to think about that is that they're complementary technologies, so they're on their own paths, but one does not completely replace the other. So, you could think of it like I got I want to balance my checkbook, and I've got excel, and so I put it in there, and it adds up, and I balance my checkbook. A quantum computer will give me the superposition of all possible balanced checkbooks. You know, that's what quantum is good at combinatorially difficult problems. And then the AI machine will say it looks balanced, and if you ask it again, it'll say it looks balanced within an 80% confidence interval. They have their place, but they aren't competitors to one another. They're complementary to one another. When we talk about the other things, the carbon nanotubes or the spintronics stuff. All of those are potential candidates for replacing CMOS, though a lot of people point out that CMOS is very difficult to beat. There may be areas, though, where those technologies outperform CMOS, but in other places where CMOS is better, it's more likely we're going to see co-integration of those technologies with CMOS in the interim to take over the task where they really are probably superior to CMOS, and we're seeing that certainly with the MRAM technology which is magnetic RAMs. It is definitely better storage technology than SRAM, and we're seeing MRAM and conventional CMOS being co-integrated, so that's its entry into the market there. With the carbon nanotubes, the biggest challenge for them actually wasn't the performance of the individual transistors but actually scalable manufacturing. But there's been a lot of investment in overcoming that. Now there are companies like Aligned Carbon that are producing sheets of passivized carbon nanotubes that you can etch with standard litho-processes. There are still challenges with the contact resistance for that technology. They've turned a corner by getting over that scalable manufacturing. And so, research continues, and all these things, like I said, it's ten years from lab to fab, but there's a lot of promising research R&D happening in the lab right now, and I have no idea who's going to win. 

KEITH SCHAUB: Well, the good news is there's lots of promising paths. So, John, I can't let you out of here without first talking about your thoughts on the singularity, if I might, you know Ray Kurzweil. He's a famous futurist, he's written several books, and with all this AI into the new Exascale compute era. We're just 23 years away from the prediction when AI is indistinguishable from humans. Where do you fall on that observation or that prediction? 

JOHN SHALF: I'm not convinced. I think we know so little about how the brain operates that I don't see it that way. I mean, I think to look at the AI that we have today. First off, it does compute. It does do great things, so we use it. But the fundamental theory of how biological brains operate that it's based upon was proven to be incorrect decades ago, but we still use it because it does great things. But if I had like a dog, my dog is not very smart, and it only took me a few days to potty train him so he wouldn't pee in the house. The way that we train AI today, I'd have to we have 10s or hundreds of thousands of examples, training examples in order to train my dog, and that's just not how biological systems work. Very few trainings are required to get the dog not to pee somewhere in the house, and I take my dog to another house, he knows not to pee there, but in AI, the way we do it today, you'd have to retrain it. I am not convinced that AI, as we know it today, is anywhere near getting to the singularity, but if we did understand how biological brains worked better, we might get there, but I don't see it being very near.

KEITH SCHAUB: Alright, John, and with that, we'll have to leave it there. That does it for Advantest Talks Semi After Dark. Join us next time on Advantest Talks Semi. 

Moore’s Law
Introduction
The DOE Exascale computing project
End of Moore’s Law
Dennard scaling
Dark Silicon
More than Moore & Potential Improvements from TSV densities
In-Memory compute
New architectures
Maturity of the New Architectures
Silicon Photonics Overview
Adoption of Silicon Photonics
Outro
After Dark: Quantum Computers
Carbon Nanotubes
Singularity