– So now, another very noncontroversial topic.
Moore’s law is really dead.
Next panel is focusing on what will be the
aftermath of the phenomenon in Moore’s law.
We’re very on it, and moderator for this discussion
is computer scientist in the form of president
of Stanford University, John Hennessy.
John and your panel, please come up.
– Whatever order you like is fine.
So what do we mean when we say Moore’s law
Do we mean that transistors will never get
faster, will never get more dense?
It really helps to go back, and look at Moore’s
In 1965, he wrote the first paper that actually
projected that semiconductor density would
increase very year.
In 1975, he modified it to talk about an increase
every two years.
And in fact, that rate, that exponential growth
rate was maintained for roughly the next 25
years after his revised paper.
And then Moore’s law began perhaps what we
might call dying or slowing down anyway.
We went from a doubling every two years to
roughly a doubling every three years sometime
between 2000, 2005, and more recently we’ve
been close to a doubling every four years.
So we’re slowing down.
We’re reaching the end of silicon technology
as we’ve known it but there’s another key
factor here that rarely gets talked about
except among people who are friends with electrons
as our good friend Chuck Thacker was, and
that’s what’s called Dennard scaling.
Dennard scaling is a property that says that
as the devices get smaller, their energy consumption
also drops at the same rate.
What that meant was for many years, you could
get the same square millimeter of silicon
to consume the same energy, which made it
Dennard scaling ended actually before Moore’s
law actually ended, and it’s been nonoperational
for nearly 10 or 15 years, and that’s created
another problem, created this so-called era
of dark silicon.
We all turned quickly to multi-core.
We thought that was gonna solve all the problems
’cause we didn’t had built faster uniprocessors,
and lo and behold, along comes the end of
Dennard scaling, meaning that more and more
processors cannot run faster because they
Of course we’ve all built and relied on this
tremendous hardware improvement.
It’s made it possible to do things like deep
It’s made it possible to build software that
uses layer after layer and software reuse,
and still not worry a great deal about the
The hardware just kept getting faster and
Now we’re in a different era.
Perhaps we’re entering an era where dark silicon
will mean the dark age for computer science.
Perhaps it will mean we will have to rethink
the way we program or rethink the way we build
To address this problem, we have a great panel
To my left, Doug Burger, former professor
at UT, Austin, now a distinguished engineer
at Microsoft working on accelerating computing
in the cloud.
Norm Jouppi, one of the original MIPS team
members some 30 years ago, and then spent
time at DECWRL, HP, and is now at Google working
on processing in the TPU arena.
Butler Lampson, Turing laureate who’s probably
done more work on machines that changed our
lives in just about anybody I know including
inventing the modern, what we think of as
the personal computer.
Butler is now a technical fellow at IBM, at
Microsoft, I’m sorry.
There’s another merger coming up here, and
also an adjunct faculty professor at MIT.
And finally, Margaret Martonosi, a chair professor
at Princeton who was recently a Jefferson
Science Fellow in the US Department of State,
and her work is focused on power-efficient
systems, a critical issue for this.
So what I’ve been asked is each panelist has
to make a brief opening statement just to
get the ball rolling.
If anybody has any robust objections to any
panelist’s statements, they can jump on them
Then I have a few questions, and the audience
is the critical factor in any great panel
so we’re expecting you to ask definitely challenging
questions here, and the students will be passing
out index cards.
Let’s start with you, Doug.
First of all, I’m more of a stickler for the
definition of Moore’s law as we are discussing
Moore’s law was about a rate.
It got adjusted once about 40 years ago, and
if I thought that it would get adjusted again,
and keep going at that rate for a while, I’d
say that Moore’s law is alive and well but
the precise definition is about a rate, and
I think we’re kind of in the end game of a
predictable rate, and there are only so many
generations left before we hit really hard
atomic limits, and that number is not very
many, and the causes will grow quickly.
What does it mean for a 50-year exponential
We’ve been in this exponential for our entire
An analogy I like to draw is global warming,
We’re just in a new regime that’s gonna be
transformational, and we know it’s there.
It’s like a slow-moving force but we can’t
attribute any one event to it.
So a major acquisition by a semiconductor
company, a shift in architecture, consolidation,
is that because of the failure of Dennard
scaling, or is Moore’s law being dead or dying
depending on where we are in the rate.
I think we don’t know but those oscillations
are going to increase in the amplitude, and
I think 20 years from now, the industry will
be unrecognizable compared to what it is today
because of those oscillations, those computing
stacks, architectures, languages will look
very different, at least if you have high-performance
So what should we do?
I have six approaches.
I’m not gonna go through them in detail ’cause
I don’t have time.
– [John] We have 45 minutes prepared.
– That’s right, that’s right.
Cut me down 90%.
So I’ll list three directions forward.
I call them the obvious, the ugly, and the
evolution, and then three new directions,
the smart, the crazy, and the wild.
So the obvious is to improve performance within
our current paradigms so mining the fat out
of the software stacks, layers of interpretation
as John said earlier today, get our general-purpose
processors running faster.
They’re less than 1% efficient.
A floating-point operation on a modern processor
is 30 picojoules, and the instruction to do
that is 10 nanojoules to factor of 300 different
so I think there’s a lot of opportunity left
there but we’re not really focused on it as
a community but that’s within our current
Okay so the ugly is I think an inexorable
trends towards domain-specific architectures
in stacks so we’re gonna create frankensystems
with just many, many stacks of different silicon,
different languages, okay, and it’s gonna
be ugly but the industry is big enough now,
and these are important enough problems that
we can accommodate that.
So that’s gonna happen but it’s gonna be ugly.
And then there is an evolution.
I think we’re moving to new architectures,
and some of my current work has been towards
what I like to call spatial computing.
It’s not a new idea.
I didn’t invent the term but CPUs are really
You have a small working set of data, and
you stream and structure through those, and
if you’re changing that working set out, you’re
Spatial computing is the transpose of that.
You fix the instructions down, and you stream
data through, and that’s one reason we made
such a big investment in FPGAs at Microsoft
’cause you could put down these functions
or instructions, and then stream data through
at lime rate, lime rate, and that happens
lots of place in the cloud so that’s I think
a new paradigm.
It’s hard and it’s ugly and the languages
aren’t there yet but I think there’ll be a
lot more of that, and FPGAs are useful ’cause
you can actually change the function while
things are evolving rapidly.
Okay so then going to the three new paradigms,
I gave a talk a few years ago at Microsoft’s
What’s the long-term opportunity?
So one is neural or AI, deep neural networks.
I hate working in hot areas ’cause there’s
too many smart people.
This one feels real so I’m violating my rule
because I think it’s really, really important.
There’s something deep and fundamental here,
and we don’t really understand it, and so
the CPU performance gains are going like this,
deep network requirements are going like this
for performance, and that gap is why we’re
seeing this Cambrian explosion of new architecture,
some of which are famously enormous, and I
think one really interesting thing in that
space is that we’re doing vector computing,
we’re doing matrix vector multiply.
You can reduce precision.
You can make those more efficient.
You can benefit from the silicon scaling.
That’s gonna replay it over the next three
or four years, and what comes after that I
think is a really important question.
There is a huge gaps, many, many orders of
magnitude between that and the brain so that’s
gonna be a fascinating, fascinating time.
And then there is the crazy so that’s the
The crazy is quantum.
I think the challenge there is the algorithms.
I think we’ll build them.
Can we solve really important problems with
Is that a general-purpose thing or a very
I think that’s TVD, and it really relies on
the theorists and the algorithm people to
figure that out.
And then for the wild, I think this is the
last point, it’s programmable biology.
That’s something that really needs computer
architecture and computer science thoughts
behind it so protein pathways, gene expressions
leading to protein pathways.
There is an architecture there that we don’t
Understanding is very sparse.
Getting a handle on that will be really important.
I think that’s the longest term one but that’s
much longer term.
So I guess to conclude, I think I’m at time,
Moore’s law has given us a free ride in performance
with existing paradigms for five decades and
more, and that free ride is just about over
so we’re entering a wild, messy, disruptive
time, and it sounds like a lot of fun.
– Margaret, why don’t take it next?
– Sure, thanks.
So I wanted to use my time to tell most of
you who are not circuits and architecture
people why you need to care, and you need
to care because it’s gonna be a wild time,
and it’s not just gonna be the hardware people
who are having a wild time here.
I think that’s very important.
We’ve actually already seen the start of this.
Over the past 10 to 15 years, we saw this
upsurge in the use of on-chip parallels in
which it dramatically changed software already,
and will change it much more as we adopt more
parallels, and within applications.
And then the second wave has been the adoption
of heterogeneous specialized accelerators
on chips, which again sort of pushes things
in a new direction, and dramatically changes
software so there’s something else that’s
roughly 50 years old besides Moore’s law and
me, and that is the instructions and architecture
so for 50 years, the deal was that Moore’s
law was for delivering transistors relatively
regularly, and hardware people were working
hard for you to create faster and faster processors
that sat underneath a durable abstraction
layer, a hardware-software contract such that
the software above it could get relatively
free performance improvements with relatively
few changes, relatively little need for porting
That has changed.
So as one contrast, about 25, 30 years ago,
processor chips hit a previous power spike
and power density but we were able to shift
That power spike is what drove the SimOS adoption
to a large degree, and software people didn’t
really see that shift as far as I can tell
at all whereas today’s shift is quite different
because we don’t have the good alternatives
so they’re queued up and ready to splice in.
So inside our computer systems today, we’re
increasingly likely to have many ISAs present,
and this is true whether you’re in mobile
or cloud or anything in-between.
Your phone typically has maybe six ISAs on
the processor chip, six different processor
languages that are being spoken in there,
and many accelerators.
Half the area is accelerators that have no
durable instructions and architecture at all
so what that means is we know how to build
the hardware, we haven’t come up with good
new ways to program it.
In particular, when we move from one implementation
to another, an awful lot of the software has
I call it the post-ISA or post-CPU era not
because we’re done with ISAs or CPUs but because
they don’t have the durable abstraction, overarching
abstraction powers that they used to have.
The amount of reworking varies but it’s often
pretty broad rewriting of software for new
mixes of CPU, GPU, and accelerators, sometimes
shielded by libraries or APIs but someone’s
dealing with it, and it’s not fun, and that
brings me to the second big issue, which is
about correctness and verification.
We all know how hard it is to get a computer
system right, both the hardware and the software.
We’re increasingly worried about security
and reliability as well, and the current technology
trends are gonna make this much worse because
we’re building systems that are more heterogeneous,
Major software changes are happening more
often as we rewrite things to port between
things that don’t have good abstraction layers.
And last thing is that because things are
changing quickly, and because we’re experimenting
with new technologies, we don’t have well-specified
enduring interfaces against which to verify,
against which to check correctness, check
security, and so that is gonna create a situation
where we’re building systems that are harder
to keep correct and secure, and yet wanting
more and more that we do so.
The final aspect of that is, aside from FPGAs,
a lot of the other specialized hardware that
we’re using is baking functionality into hardware
in a way that makes it very hard to push out
a patch when you find a bug so that’ll be
yet another issue for the security and correctness
and verification space.
And the third thing I’ll stress, and maybe
this is where Doug and I differ is I think
that application-driven approaches are kinda
cool, and not necessarily gross or Frankenstein
or whatever you call them.
– [Doug] Ugly.
– I think it’s an interesting opportunity
where instead of the 1975 layering where you
have the architects, the compiler people,
the applications people with these horizontal
layers that we’ve often drawn, I think we’re
gonna flip it, have these domain-specific
slices through those layers where there are
gonna be DNN people who are very good at reaching
down and understanding how the hardware implies
something about what they should be doing
up high and vice versa.
So we need to be exploring design processes
that handle that flip well to be domain-specific,
languages, and so forth, and we need to be
training students who have good skills up
and down that stack as well because if we’re
gonna shift everything about our field, and
then not think about how the curriculum and
the pedagogies should shift, then I think
we’re gonna be in real trouble so a main takeaway
is this technology trend is not super new
although the end game of it is new and somewhat
mysterious still, it’s definitely not something
the hardware folks are gonna solve under the
I think that’s already been the case.
It reminds me of the frog.
There is the story about if you put a frog
in boiling water, it will jump out but if
you put a frog in lukewarm water, and gradually
turn up heat, supposedly it stays still.
I don’t know.
I haven’t experimented.
But I feel like all the software people of
the world, you’re that frog.
You’re sitting in water, and it’s warm enough.
– All right.
– Okay, we all feel so much better after that.
– [John] Besides, I like my frog sauteed rather
– Hardware people are okay.
They’re turning up the heat.
– That’s a good one.
So when I saw the title of the panel, Moore’s
law is really dead, it reminded me of the
Monty Python pet shop skit where with the
parrot, he’s only resting, pining, and lots
of other interpretations.
I think we still got a few more years but
the way I think about Moore’s law is in terms
of order notation, and so if you think about
making something more precise in one dimension,
that seems to turn out to be linear and increase
in cost but for many years, we’re getting
a free lunch ’cause with optical lithography
where we’re getting that N factor in two dimensions
so we’re getting N-square, and we’re having
Dennard scaling, that was another factor of
N in terms of device performance and power
efficiency, and so we were getting this N-cubed
for the price of N. That’s really a good deal.
But now some of those factors are starting
to go away, and cost will be a limiting factor
I think before device physics will be but
it’s neck and neck, it’s kinda close.
So right now, what’s happening in optical
lithography is if you want to make something
finer by a factor of N, you have to have each
mask be finer by a factor of N, and you have
to have N times as many masks so your N-squared
just went away because you’re paying for twice
as many masks for 2X scaling, and each mask
has to be 2X more fine.
If you think about that, there is many different
applications, and so Moore’s law is not gonna
be a binary event.
You can’t say Moore’s law ended June 22nd
on this year or something like that, instead
it’s gonna depend on the application so some
applications benefit much more from transistors
A classical thing from computer architecture
is if you scale cache size by a factor of
N, it reduces the miss rate by a factor of
square of N.
If you’re dealing with something where you’re
not getting a lot of transistors, and having
to pay a high price for them, that’s probably
not a good thing, and have just bigger, bigger
Domain-specific architectures, I consider
them a work of art myself but they’re really
good, at least the ones that we’ve done so
far are really good at easing transistors,
and so if you have N-squared increase in transistors,
you can get N-squared in value out of them,
and of course just like any other system,
you can waste the transistors doing needless
operations but you do get that increase in
So I think we’re gonna be seeing a lot more
domain-specific architectures going forward.
If you think about other fields, and how they’ve
matured, if you look at aviation, in 1970,
the 747-100 could carry 500 passengers at
almost the speed of sound, and nowadays, we
have things like the Dreamliner, which are
much more power efficient.
They have bigger windows, all these other
things but it’s not dramatically different
but the efficiency is much better, and so
I’m hopeful that we’ll be able to refine the
software as well as both the hardware.
I think we’ve gotten a little sloppy in the
Just racing and keeping up with Moore’s law,
and I remember the very careful designs we
did when we didn’t have many transistors,
and they are much more efficient than what
we have now so I think both on the hardware
and the software side, there is room for mining
that efficiency for a long time.
– I’m not really a hardware person although
I have done some hardware design.
I’m here as a substitute for Chuck Thacker
who has most of you know probably died a few
weeks ago but fortunately I’ve been working
with a group at MIT to try to figure out what
the consequences are gonna be for computing
as Moore’s law tapers off, and our slogan
is there’s plenty of room at the top.
This is a play on Richard Feynman’s famous
lecture, There’s Plenty of Room at the Bottom,
which he gave in 1959 to the American Physical
Society where he predicted most of the things
that have happened in electronics and nanotechnology
up to now.
So what does it mean?
There is software, there is algorithms, and
there’s hardware that make up the computing
stack, and there’s room in all three of these
above the level of the devices.
It’s not gonna be as good as Moore’s law because
in the days of Moore’s law, you get more and
faster transistors down at the bottom, and
everybody up the stack could benefit without
having to do anything ’cause the changes were
not visible functionally.
You just got better performance at the same
price, and that’s not gonna be true anymore.
As a result, progress is gonna be much more
sporadic, much more opportunistic, and much
more bounded than it was in the case of Moore’s
law, and also as several other people have
said, the changes are gonna be much more visible
throughout the software stack, which is gonna
have very serious consequences for the way
things get developed.
So on the software side, we know there’s a
lot of software bloat ’cause we’ve been getting
bloat for many decades.
It occurred to us that there is an interesting
way to look at it, the theorists like to understanding
problems in terms of reduction.
If you have one NP-complete problem, you can
show that another problem is NP-complete by
showing how if you could solve the second
problem, then you could solve the first one
too, which you already know you can’t do.
So after people do reduction too, another
name for it is software reuse.
Instead of writing a program to solve some
problem from scratch, you write a program
that solves the problem using some already
existing piece of software, which is usually
cheaper and a more reliable thing to do on
the development side but it’s definitely gonna
consume more computing cycles and more memory,
and when you stack these things up 10 or 20
levels deep, which we definitely do nowadays,
there’s a huge amount of bloat and a lot of
scope for getting rid of it at a price of
course in development cost.
Algorithms, history tells us that at least
in many domain, the improvements in performance
produced by better algorithms have been quite
comparable to the improvements in performance
produced by Moore’s law but of course the
algorithms are quite problem-specific, and
it’s also the case that improvement tapers
off after a while, often because there is
some upper bound on how good the performance
can get that you can actually prove.
Hardware, the iron laws of physics as they’re
currently manifested in hardware tell you
that if you want to get the maximum amount
of performance out of the transistors that
you can put on a chip, you have to have the
parallels and then locality of the computation
exposed in such a way that the hardware can
take advantage of it, and historically we’ve
not been very good at doing that.
The other avenue for exploiting advances in
hardware is specialization, also known as
domain-specific architectures, and it’s pretty
clear that there are very important domains
in which you can get a factor of 100, that
way at least.
Final thing I want to say is a consequence
of the fact that changes are not gonna be
invisible anymore is that you need a strategy
for propagating the consequences of the changes
through the software stack.
At the highest level, the only strategy I
know for doing that is what I like to call
big components, that is, you make the big
thing that consists of a bunch of hardware
and a bunch of software, and you give it a
very stable interface, and then inside there,
you can innovate much more rapidly than you
could, you can make changes up and down the
stack inside of the big component much more
rapidly than you could if the consequences
of changes at a low level had to propagate
through an entire uncontrolled software ecosystem.
So a good example of that I think is the way
Google has been doing the TensorFlow thing
where they’ve defined a high-level interface,
and they don’t give the users of the chip
access to the chip itself, they only give
access to this high-level interface, and inside
of that, they can evolve things much more
So to sum it all up, we think there really
is plenty of room at the top both in software,
in algorithms, and in hardware, and it’s gonna
take a big-component architecture to exploit
these opportunities but I don’t think there’s
any doubt that there are several orders of
magnitude, more performance that can be gained
by pursuing these ideas.
Thank you, Butler.
So let me start by saying if anybody wants
to go out on a limb, and declare whether or
not there is a silver bullet solution here
or any of these ideas, whether it’s die-stacking
or CrossPoint Technology or quantum, are they
about to solve the problem for us, and we
just have to hold our breath until we get
to that magic point?
– I did want to say that I know nothing or
much about device physics in spite of having
been a student of physics in my youth.
I don’t actually believe that Moore’s law
There’s lots of physical phenomena, it seems
to me, that have the potential to make it
possible to have higher and higher performance
computing devices for a long time to come
but it’s not gonna be silicon and SimOS, and
up to now, silicon and SimOS has definitely
been the place to put your money so people
have not worked really hard on these other
things, it seems to me, but that, I’m just
talking off the top of my head.
I don’t actually know anything except that
there are all these phenomena.
There’s atoms and spin and all kinds of cool
stuff out there.
– What do you think, Norm?
– You’ve thought about some of these things,
and Margaret I know has done some work on
– Yeah, Doug and I, we’re talking earlier
about how much fun we’re both having fun at
work, and we didn’t we’d be having that much
fun a decade ago because, sorry, there are
so many different novel things to look at,
and that they can all play a part in the overall
It’s not a silver bullet.
There’s a lot of things that we should be
investigating and benefiting from.
– You were so sad 10 years ago.
I can see that you’re having fun.
– What about quantum, Margaret?
I know you’ve thought some about this issue
– I’ll say something about quantum but I wanted
to react a little bit to what Butler said.
Moore’s law, as we all know, is not a law
of physics, it’s a business or economic law.
– [John] It’s empirical observation, yeah.
– It’s about will there be reasons to keep
pouring money into a process to maintain a
doubling, and so while there may be physics
still to tap, the question is at what cost
will it come, and can we find physics that
reaps enough revenue benefits to warrant the
investments so that’s that.
You prompted me for quantum, and it’s been
close to 10 years since I wrote my first paper
on quantum but I keep a side thing in a little
bit secret because architects traditionally
work on things that are very real, and quantum,
10 years ago, was kind of a weird thing to
even work a little bit on 10 years ago, and
so the good news is that I think quantum computing
today, from a physics perspective, is amazingly
close to being real.
There are folks at Google and other companies
who say that a 50 to 100-qubit machine in
terms of the physics will be available in
the next year or two.
There are even people who say that they will
write programs for that 50 or 100-qubit machine.
It will show speedup over classical, which
is the so-called quantum supremacy or quantum
advantage point so that the physics side of
the story for very narrow problems is starting
to become credible in a way that wasn’t true
for many years.
The harder part is that the applications that
one could run on a 50 or 100-qubit machine
There is a huge gap between the number of
qubits you need to build viable, interesting
applications using the quantum algorithms
that we currently have, and the number of
qubits that the physicists will be able to
build into reliable operational systems any
time soon so that’s one issue.
There is this gap between qubit counts that
we might want and qubit counts that we’ll
get any time soon.
– Will it still be good enough to wreck public
– The likely outcome that many people feel
is that we’ll come up with quantum-resistant
crypto, and make that somewhat mooch.
It’ll be a while before it can wreck public
That needs more qubits from what most people
The second thing is it won’t general, it’ll
be in a coprocessor role but many of the things
we’re talking about are coprocessor roles,
and I guess the third thing I guess back to
this law of physics thing is Moore’s law was
amazingly providential in a way that we came
up with many, many intermediate ways of making
money off of transistors that caused that
doubling cycle to sustain itself over 50 years,
and the people who are thinking hard about
how to create Moore’s law for quantum computing
or having trouble thinking about what might
be those intermediate points, when you get
past the kind of Sputnik or moonshot kind
of initial bragging rights on 50 to 100 qubits,
and you try to move from there to a 10,000
or a million qubits, there is an awful lot
of money and engineering that will need to
go into those phases, and path isn’t clear
for who would pay, and for what applications,
and there’s another panel tomorrow morning
so they know more answers, yeah.
– Norm, you touched on the cost issue with
respect to lithography but there are lots
of other cost issues, and part of what’s causing
this slowdown is the cost of fabs is just
going up by leaps and bounds, and the number
of fabs in the world are shrinking dramatically
as a result.
Do you think that this becomes the actual
barrier that simply you can’t afford to build
many advanced fabrication capabilities?
– Doug was mentioning consolidation earlier
but yeah, I think as long as we have three
or four stable players, I think there will
be good competition, and like with the new
iPhone every year, they have to have a better
chip in it, and so there’s a lot of market
pressure to come up with the next best thing.
– There’s also a lot of price pressure on
the cost of that iPhone.
so they got to also be able to build it.
So Doug alluded to the franken architectures
as a way to describe this meshing together.
It’s clearly the case I think as several of
you pointed out that a heterogeneous computing
model is certainly more complicated from the
viewpoint of verification, from the viewpoint
Is this going to fall as a giant burden on
programmers or are we going to come up with
some magical software technology to sweep
What do you think, Butler?
– If it’s not gonna fall on the programmers,
where is it gonna fall?
Maybe there is some magical technology that
can do the mapping somehow or extract.
– The Fortran of heterogeneous computing?
– Yeah, something or some high-level thing
that can extract the structure.
To some extent, TensorFlow is a step in this
direction for a limited range of applications,
– I think all of these things are for limited
ranges of applications.
The whole nature of heterogeneous, of domain-specific
architectures, they’re for limited domains.
That’s what it’s all about.
So then where is the burden of, “I’ve got
this giant piece of software “with multiple
levels in the stack, lots of software reuse
“but somewhere in there are some pieces “that
can be sped up”?
– It’s the big component story.
You have to find inside it, something that
you can wrap in a stable interface, and if
you can’t do that, then all you can do is
proofs of concept.
You can’t do anything real.
That’s an absolute requirement.
So does this mean that the range of potential
applications will be rather limited?
Doug, you started by mentioning deep neural
That’s obvious one.
GPUs is another.
Are there lots of these domains out there?
– [Butler] Deep neural network is not an application
– No, it’s a domain.
– I think one important thing with DNNs, and
Norm, you should comment here too, is that
they are surprisingly general like they have
moved into lots of different domains, not
the same algorithm but the class.
Back when people were doing speech and vision
and this and that, they were all different
class of algorithms that have been tweaked
over the years, and the deep networks just
kinda swept through, and replace the whole
hog so there’s something very, the risk of
making a pun, deep here, and we don’t yet
have the von Neumann architecture for deep
If we can find that or come up with that,
that’s a huge direction forward because this
is a very general thing.
I think that’s why there is so much energy
Yes, hype but also momentum kind of feel right
– Yeah, I think it is one of the biggest nuggets
in domain-specific architecture area ’cause
it can do all those different things but I
think what your question, John, was actually
similar to the original Moore’s law paper
where Moore said that he didn’t think there
was a market for a calculator chip or this
or that chip but he could build a microprocessor,
and microprocessor could be programmed to
do all those different application.
So some application areas or domains will
benefit from this, and others might not so
I think we’re gonna get more inequality in
terms of application speedup.
And then I think another thing, going back
to what Margaret was saying with all the different
components is it’s also easy to get Amdahl’s
law bottlenecks between the different components
so it’s a system architect kind of problem
– [Butler] Let’s hear it architecture.
– So it is a system problem, and so then the
question becomes question of where is the
bottleneck in this?
Is it the programmer’s problem?
And we’re going from an area where basically
people just wrote code.
Look at how much energy went into taking the
x86 architecture, and making faster and faster
and faster versions of it so we didn’t have
to touch the software at all.
– One thing I’d like to comment on, hearkening
back to the previous panel, is there was a
lot of talk about data preservation, and well-specified
interfaces, and ways of describing, and sorry
to toot architects more but I think we did
– [Butler] Better than anyone else.
– Better than anyone else.
x86 is what?
47 years old now.
– And it’s unbelievably complicated but it’s
actually kind of well-specified.
– and it still executes.
– So the failure for code to run year after
year is because of other parts of the system
we didn’t specify to that same degree, the
operating system, the I/O interfaces, and
So this notion of specifying interfaces well,
and then building around them is a very powerful
– Unfortunately, programmers hate it.
But they’re gonna have to learn better.
There’s gonna be no alternative.
– If you want to get faster, you’re gonna
have to do it.
All right, let’s take a few of these questions
’cause some of them are quite provocative.
This is a great question for you, Doug.
Are FPGAs a fundamentally important computing
device or just a crutch for companies and
engineers without the courage or skill to
build custom chips?
Okay, A-plus question here.
– I want to come back with something witty
but my mind is just blank.
I draw a graph sometimes when I give talks
On one axis is the rate of change of the algorithm,
and the other axis is the proportion of workloads
in your cloud device family that the accelerator
can benefit, and if you look in the cloud,
you have 10s of millions of customers.
It’s just an incredibly general-purpose thing,
and even the big properties don’t run on more
than one or 2% of your service, and then for
the big online services like Bing and Google
Search, they’re changing weekly or monthly
so that’s just really tough, and I think that’s
even too fast for the FPGAs but you can at
least get a handle on some of that, and upgrade
your algorithms so I do think there is something
general there but it’s still far too hard
to program, and we’re making progress, and
the effort is coming down but the barrier
is still too high to make it really general.
– So they really do get flexibility.
Norm, what do you think?
‘Cause you’ve gone the custom route, right?
In the FPGA.
So I think the custom can tackle those big
nuggets but the tail is basically wagging
the dog in many of these data centers so I
think the FPGAs can be really useful there
as almost like a microprocessor for those
applications, and something that can programmed
– Maybe just to follow up on that.
There is a three segments to that curve.
There is this stuff that’s running at really
large scale and stable.
You’re hard on that.
There is the stuff that’s changing too fast
or is too small-scale to justify the NRE,
to put on an FPGA, and better tools bring
that down, and then there is the stuff in
the middle where the economics work so those
three buckets are sort of changing in size,
and so we’ll see what happens.
– All right.
This is a perceptive question.
Margaret speaks at turning the traditional
stack of horizontal layers on end but my experience,
says this questionnaire, over the past several
decades is that fewer software engineers have
the requisite deep vertical background that
this would require.
Haven’t we been teaching and moving in the
wrong direction given this change that’s upon
– Aside from our esteemed moderator, I’m actually
the only academic here.
By the way, don’t believe the hard copy brochures
that says I’m at Google.
I’m not at Google.
I’m at Princeton.
I like Google but we’re just friends.
So I think it is.
We need to tell the story of these verticals
in a way that lets students see the impact
of the full set of systems design challenges.
They see the application layer just fine.
That’s all around them, and that’s what they’re
drawn to, and that’s great.
It’s wonderful to see.
But they need to know that cloud computing
doesn’t run on actual clouds.
They need to know there is hardware under
there that someone has built, and I think
sometimes in some departments, they’ve lost
track of what’s supporting this massive revolution.
– It’s extremely difficult to keep it on track.
MIT tried very hard for a long time, and then
they’re gradually giving it up because you
just can’t get the students to pay attention.
– But isn’t this partly ’cause our field has
exploded by leaps and bounds?
You can’t imagine having a student that doesn’t
have some exposure to machine learning now
as part of their undergraduate curriculum,
and everything they’ve got to learn has just
blown up by leaps and bounds.
How do we get them enough knowledge about
the lower levels of the system including the
software levels so that they have a better
understanding, and still get it done in four
years, and have them graduate rather than
drop out ’cause we burn them out?
– I think we have some nice examples of textbooks
and classes that merge some hardware and software
into a single systems-oriented class.
I think that’s one avenue.
Another avenue is that as they’re learning
about DNNs or other more application-focused
topics where they’re flocking to, making sure
that there is enough of the underlying support
systems built into those classes as well.
– One question here is what is the impact
of Moore’s law on Frasier’s law, which I wasn’t
sure what Frasier’s law is but luckily it’s
defined, which says that the cause of computation
drops by a factor of 10 every five years.
So are we gonna see the computation improvements
in terms of cost of hardware?
We are used at hardware dropping at least
in cost or getting faster for the same cost.
Are we gonna see that slow down and end?
– I think it depends on the application area.
Some application areas, I think it’ll continue
for the next decade but other applications,
it’s gonna be very slow and very small progress.
– I guess I don’t need my cellphone to run
faster because I can’t speak any faster.
– Video is really good at sucking up the transistors.
– [John] Video sucks up.
– Also I think, this isn’t the right way of
What people actually care about is that the
cost of running the application they care
about is dropping, and that means the improvements
in software, improvements in algorithms as
well as improvement in hardware can all contribute
to that, and that was kind of the point of
the story about there’s plenty of room at
– So talk about this efficiency issue, Butler.
I think if you look at a large software system
running on a modern piece of hardware, whether
it’s in the cloud or on a big server, the
inefficiency is spread all over the place.
There is certainly inefficiency at the top
with multiple levels of software, especially
if they’re writing a scripting language or
something else but there is lots of inefficiency
in the underlying hardware, and in the exact
interface between that hardware and software.
Do we have to go on an expedition to mine
that inefficiency out piece by piece in order
to really get the kind of performance we need?
– I don’t think we really know except in certain
fairly specialized domains, the motivation
has not been there to really dig into this
The most that people typically have been willing
to do is to try to make better compilers but
I think we have a lot of experience by now
that tells us that’s by no means efficient.
– No, it’s a hard problem.
We haven’t made the quantum leap that we thought
we might get in compiler technology.
– And we definitely haven’t dug into it seriously
in my view.
– And it’s also easy to lose a lot of performance
in large-scale distributed systems so yeah.
– Although that’s certainly a domain where
the limitations of Moore’s law by no means
is so compelling.
There has been so much fat in the hardware
and low levels of software that run distributed
The communication part of distributed systems
were gradually learning how to take that fat
out but I think there is still a lot of opportunity
– Here we have a question.
Current commodity architectures are very problematic
from a time predictability point of view.
The worst case, at execution time, it would
be considerably worse than the best case expected
Are there any hope that the changes to come
will enable better time predictability in
terms of computation?
– [Butler] It depends on the application.
– This is where specialization really makes
You can build a very deterministic specialized
pipeline for an application, get great performance
predictability but you lose generality.
You spend all this time building it that all
those caches that we like to complain about
give us generality, and then they just make
If you could merge those two and have great
predictability and great general-purpose performance,
you’d be in great shape but that’s a Holy
Grail that doesn’t exist.
– Predictability often involves designing
for a tail.
– Architects like to make the common case
– Caches work great when they work great,
and when they don’t, it’s disaster, right?
It’s the classical kinds of problem.
– But in some domain-specific architectures
like GPUs and even in the TPU, we don’t have
– Right, right.
New SimOS-compatible devices are coming.
Example, TFETs and 3D stacks so the hypothesis
from this question is that the evolution will
There’ll be no big crisis.
No big changes will be visible.
Is this simply too optimistic a viewpoint?
– Or is it the silver bullet maybe?
– Too optimistic, Norm?
– I think some of those things will work out
but it’s gonna be a long painful process.
If you think about when different gate dielectrics
were introduced, they thought it was gonna
be at like 90 nanometers, and it wasn’t until
like 45 or something because there were reliability
We’re seeing the same things in non-memory
People thought they had it down, and then
the error rates were higher, and endurance
wasn’t as good as they thought, and so these
new technologies are often more difficult
than they appear.
– We haven’t talked at all about DRAMs and
memories but DRAMs are really near the end
of their lifetime as we know them, right?
They’re really near the end.
The prediction where the next DRAM just got
shoved out another year before it’s ready,
and there is no path after that next revolution
in DRAMs so we’re not gonna have memory capacity.
How are we gonna deal with that?
It seems like we’ve used DRAMs to hide a lot
of sins in terms of the amount of memory we
What happens when you don’t get any improvements
– It unbalances the architecture, and so it’s
yet another thing where these things are improving
at different rates, and that will contort
systems so it’s a full employment offer for
You need to balance.
You need to re-architect the system.
– Blast it.
I’ve a little bit lose track of what’s been
happening last year or two but there’s no
doubt that if you look at technologies like
flash, they were originally deployed to store
pictures and cameras, and the interfaces that
were provided to the basic technology were
incredibly poorly suitable for computing.
Because the camera market was much bigger
than the computer market for flash initially,
it’s been a fairly long, slow process to fix
My belief is it’s still the case that we by
no means have the best possible interfaces
to the flash so it may well be that the fact
that the DRAM, it’ll be okay to just treat
DRAM as a terabyte-sized cache, and integrate
it much better than we currently do with the
next couple of levels up, which don’t have
the same gigantic gap that we used to have
between DRAM and disk.
– Yeah, and speaking of silver bullets, I
think vertical-NAND is the closest thing that
we have to a silver bullet because flash was
supposed to stop scaling at 20 nanometers,
and now we’ve got 64-level vertical-NAND flash
that’s coming on the market.
– So you’re betting on vertical-NAND flash
rather than some kind of CrossPoint Technology?
– I think you need to put your money on lots
of different things.
– [Butler] And then you’ll take what you can
– [Doug] Put your money on every number.
– Yeah, exactly.
– [Butler] Then the house wins, right?
– [Norm] House always wins.
– But your winning are less when you win.
Is there a place for portable high-level languages
in the world of domain-specific architectures,
and if so, what might those languages look
I think there is some really interesting work
pushing on domain-specific languages.
The Frankencamera work at Stanford is one
nice example of that, and automatic compilation
down to different and diverse hardware platforms.
I think the best hope is actually to go for
the domain specificity at the language level,
and then have it manage the heterogeneity
that’s under the covers down there.
So I think that’s a huge opportunity going
forward to have the applications be specified
in something that’s high level, and somewhat
agnostic to lots of hardware or software,
or what kind of hardware or software, and
yet specific enough to the application that
the compilers can actually get some traction
on mapping it to hardware.
– The economical example of success in this
domain has been query languages for databases
over the last 30 years.
We should all aspire to do as well as that.
– It’s possible that some of the interfaces
that we think about won’t be languages in
a way we thought about them in the past but
more environments, interface environments
over which everything is compiled.
– Butler, what about moving up the software
It seems to me there has been this tension
between functionality and performance efficiency,
however you want to think it, and for the
past 30 years, functionality has triumphed.
Functionality has even trumped correctness
More important to get it out there even if
it doesn’t work quite right.
– Its slogan is worse is better.
– Yeah, exactly.
Do you think this will change?
Or will there really be a lot of pressure
on the companies to get new functionality
out more than to get it to work efficiently
or to get it to work correctly?
– It depends on the application.
I like to say there is two kinds of software,
which I call precise and approximate.
Precise software has a spec whether or not
it’s written down carefully, and if you don’t
satisfy the spec, the customer is unhappy.
Approximate software has no spec.
There is no spec for Facebook or Google search.
It just doesn’t make sense to think in those
It’s not that one kind of, computer scientists
tend to think that of course precise software
is better for obvious reasons.
My view both kinds are just fine but it’s
very important to know which kind you’re writing
because if you are writing approximate software,
and you think it’s precise, you’re gonna do
a huge amount of engineering that your customers
are not gonna appreciate, and the other way
around, your customers are gonna be pissed
about the fact that the software doesn’t work
but the reason, the whole reason the web was
such a success is that it doesn’t have to
And it doesn’t work.
My personal experience is when you click on
a link, and there is at least a one or two
or maybe 5% chance that the wrong thing happens,
and it’s also true that if you click on it
again, there’s maybe a 30 or 40 or maybe an
80 or 90% chance that it’ll work the second
time but in the whole thing, it definitely
doesn’t work, which is not a criticism.
You think it’s a criticism but it isn’t.
– But wait a minute.
I get to my bank account.
– That’s a particular application of the web
that’s been done.
I’m talking about the web as ordinary people
They don’t distinguish the internet part,
the HTTP part, the server code part, the yada
You’re distinguishing those things.
The bank ain’t gonna be much aware of it too.
– Where does AI software fit in this?
– It depends on the application.
– Okay, good.
If it’s a self-driving car, that’d be precise
about knowing where it is and the other parts
of the road.
– Your bank account looks like this.
– We’re talking about the software that’s
Excel is precise software.
People are very upset if the numbers are wrong.
– It’s hard to predict where your software,
where your systems are gonna get used over
So for example, underneath the covers of the
IV machines in your hospital, there’s typically
some Windows XP running.
Let that sink in, right?
Did they intend for that to be on one side
of the precise, non-precise line when they
wrote it, and when they shipped it?
– Windows is definitely a precise software.
It doesn’t mean that it always does the right
thing but it does have specs, and people get
upset when the specs aren’t satisfied.
Windows is definitely precise.
– [John] It’s supposed to be precise software.
– No no, this is a way of thinking.
Does the software have a spec, and does the
customer care about the spec?
It has nothing to do with how good a job you
did at building the software.
– I hold no brief for Windows XP by the way.
I had nothing to do with it.
– I know.
– But it’s an easy thing to smite that, and
I don’t think that’s particularly sensible.
– It’s all supported.
To have an IV machine in today’s hospital
– Whose fault is that?
– But that’s my point is that people are glomming
these things out of other things–
– Sure, of course.
Yeah, of course.
Software lives a lot longer than anybody ever
thought it would live, right?
– My favorite story is once upon a time, there
was a 370 that was running in 360 mode.
The 360 was running in 7090 emulation mode.
The 7090 was running in 704 mode.
The 704 was running a program, the emulator
now again 650.
And the IBM 650 program was emulating a CPC,
a card program calculator.
And down in the bowels of this thing, cards
were flowing through the, simulated cards
were flowing through the simulated calculator.
That was much faster than 150 cards per minute
that they ever flowed through a real one.
– Yeah, and it was precise too.
– So this is both horrible and amazing.
– [Doug] Right, it is amazing.
It is amazing.
It is amazing.
– All right.
So we’ve said here that domain-specific architectures
may be a big part of the way forward.
Can any of you identify a small number, say
three application, three domains, that would
account for a significant amount, say 30%,
of the world’s computing load?
– The GPUs probably already do that.
– They’re not 30% of the world’s–
– If you’re measuring floating-point operations
executed, they probably are.
– Floating-point operations they could do
or floating-point operation they actually
– Even actually do.
There’s a hundred million Sony and Microsoft
gaming consoles out there.
Many of them being used pretty heavily.
They’re doing a lot of floating-point.
It isn’t a very sensibly posed question.
As is demonstrated by the fact that–
– Does the person who asked this question
want to raise their hand, please?
– As is demonstrated.
But I just got to reinterpret it plausibly.
– All right, that went fast.
I thought it was a pretty good question actually.
– Even when you don’t like my interpretation.
– No, your interpretation is okay.
– From a user perspective, aside from web
browsing, I don’t think so but if you look
in the cloud as an example, if you just take
software-defined networking, the processes
you’re doing, if you’re running a software
on a CPU to follow those protocols, rewrite
those flows, it’s an enormous amount of computation.
I think if you take your cellphone apart,
I’ll bet you there are more cycles in your
cellphone devoted to running cellular network
and running WiFi than there are even in the
– There are special image processing pipelines.
There is a lot of different ISAs in there.
– Right, right.
– That’s already hidden.
– For camera, same thing.
You taking a video with your cellphone, you’re
doing enormous numbers of operations.
– The depressing point about it is kind of
the low-hanging fruits are already gone.
– It’s already done.
That’s maybe the interesting, yeah.
– Software-defined networking is an interesting
example of DSAs in reverse.
Things that used to be done in hardware–
– Yeah, are being done for software.
And now there are big policies moving into
– Yeah, that’s the wheel of reincarnation.
– The wheel on those, its data center keeps
– All right.
I need another highly.
If a big part of the future of information
technology is the cloud, and if the future
of computer architecture is domain-specific
architecture, will server chips of the future
be more likely to be signed by cloud companies
or chip vendors?
Given that we’re turning the entire stack
on its side, are we gonna re-verticalize the
After we basically had a vertical computer
industry, we turned it into a horizontal?
Are we gonna be re-verticalize?
– I think the answer is yes.
A little bit of both, right?
These domain-specific architectures connect
to servers, and so they use traditional server
It’s very expensive to all flow the whole
There is a lot of random code, and little
things that have to be taken care of so I
think there is a place for both.
I think it’s a flowering, this Cambrian explosion
where you get more and more diversity so some
will be done by traditional, the silicon vendors,
and others by cloud providers.
– But will Google design its own RISC-V chips,
and stop buying x86 chips?
– You can’t say.
– One thing you can say is that we consolidate
it on a few processor vendors in some ways
for ISA reasons, and so if x86 doesn’t dominate
it, if ISAs don’t dominate, it’s a reason
to buy a particular company’s chip, then you
could imagine the spreading out over more
– [Butler] That’s the whole idea behind RISC-V,
– Then I guess the counterargument is there
is a whole lot of specialized expertise let’s
say in the chip design component, portion
of the job, and that concentrating that in
a single company, which supplies multiple
vendors then is more cost-efficient in the
– Maybe that company has got a foundry.
Then it’s just a foundry, and it doesn’t add
any value in terms of higher level silicon.
– For example, the architecture teams at Intel
and other microprocessor vendors are very
large for that reason you just mentioned because
diverse applications, they studied many, many
different ones, and they created a lot of
complexity at the system level, and so getting
that experience is important for building
– It’s incredibly hard and very expensive,
and yet the server market is consolidating
on cloud vendors, which are going very large,
and so how those two forces play out remains
to be seen.
Since we have about a minute left, I want
everybody’s final robust and aggressive thought
about the future so that the audience goes
out full of energy.
– You put me on the spot.
Frankenstein’s monster was understood.
It’s that crude bottle to Margaret.
I don’t see it as a bad thing.
I think we’re gonna see the neural stuff really
I think the hype is at that risk of being
in the top of the hype curve.
I think it’s real.
– [John] Norm.
– And I think domain-specific architectures
are work of art whether they’re A6 or FPGA-based,
and the party is not over yet.
The parrot is not dead.
– [John] Butler?
– You already heard my final thought.
There’s plenty of room at the top.
– [John] Yeah, plenty of room at the top.
– I think we didn’t talk about storage at
all, and in a world where we’re generating
data at vast rates that are still exponential,
we need to come up with better storage technologies
We talked a lot about compute, a little bit
The storage thing is fascinating.
– [Butler] DNA will save us.
Thank you all.
Are we out of time?
We’re out of time, yes.
Thank you all for your attention.
– I took away 15 minutes of our scheduled