It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.
What was quite striking this year was the greater difference of opinion between the speakers. I guess that in 2007, most of the discussion was on the level of “ouch, here comes multicore and what are we going to do about it”. This year, we got a bit deeper and with one more year of experience and massive research work, the collective world of multicore have made some progress and gained insights. And that’s when the differences start to show up; the fact that we have differences of opinion tells us that we are starting to dig into details and turning up different answers due to different viewpoints and user experiences.
So where were the differences this time?
- Heterogeneous vs homogeneous cores (on a single chip). Kunle Olukotun clearly supported the heterogeneous style (which is what you with Sun’s Niagara that he designed the basis for). Erik Hagersten was more interested in the difference between thin and fat cores of the same basic ISA, and Anant Agarwal was strongly in favor of completely homogeneous systems (which is what they build at Tilera). In my biased view, I think the argument for heterogeneous in pure energy efficiency is always going to prevail. See some of my previous blog posts on this topic, for some background:
- Domain-specific vs general-purpose programming languages. The same sides here, with Kunle advocating domain-specific languages, and Anant and David Padua more in the general-purpose camp. I like domain-specific better, it seems to rhyme more with what I see people actually doing today to increase programming productivity overall.
- Memory bottleneck or not? The most interesting discussion came when memory bandwidth and cache sizes were discussed. One quite common school of thought over the past few years teach that caches per core will shrink, and bandwidth to get data into and out of a chip is going to be a severe restriction on what can be done. Not all in the panel agreed with this, there was the idea (mostly from Kunle) that in some way the massive bandwidths and low latencies achievable within a chip (compared to between chip in a classic discrete-processors multiprocessor) could make this less of a problem. Personally, I think this is going to be some kind of problem, but maybe not as much as passing data around faster might reduce the need to store it temporarily. Despite the need for more bandwidth, nobody really agreed with Erik’s thought that maybe it makes sense to build chips that do not max out on the number of cores they contain, but rather try to balance core count with achievable IO bandwidth. That idea has some merit.
- Core counts. Moore’s law tells us there are going to be thousands of cores on a chip fairly soon… but if we do not manage to make good use of them, maybe the growth in core counts will slow soon. Putting four or six or eight cores into a general-purpose system makes sense today, but more than that might turn out to be a waste for the vast majority of users that do not have problems to solve and programs to run that can make of more than that. In the same sense, maybe it is better with slightly fewer more powerful cores than a maximum amount of minimalistic cores, considering the state of software available today. So it sounds like a fairly divergent future here.
- Shared memory or local memories? Most of the seemed to be in the camp proposing that shared memory is too convenient not to have, even when it really is bad for you. Several bad jokes comparing shared memory to alcohol, and the moderator of the panel suggesting that a good way to avoid the hangover of shared memory is to stay drunk… whatever that means in practice.
Somethings were generally agreed upon, though.
- Programming is an issue, shared-memory or local-memory or whatever. the idea for the solution varied, however, as discussed above.
- Cores will still be plentiful and that operating-systems focusing on sharing time on a single very valuable core is an idea of the past. The keyword for the future is spatial sharing and reducing the overhead of management (I have some previous blog posts on this topic, especially on the subject of IMA and real-time control when cores are free).
- Virtualization and isolating partitions of a multicore chip from each are necessary mechanisms. Running multiple different operating systems on a single chip will be quite normal, probably under the control of some global hypervisor.
Any comments on this from my small audience? I think the topics under discussion are quite fascinating and the kind of issues on which the success of major chip design projects will be decided. A good architecture with a good programming model has a great chance of success (as long as it looks like a continuation of something existing :)).