Epiphany-V: A 1024-core 64-bit RISC processor

e5_epiphany

I am happy to report that we have successfully taped out a 1024-core Epiphany-V RISC processor chip at 16nm.  The chip has 4.5 Billion transistors, which is 36% more transistors than Apple’s latest 4 core A10 processor at roughly the same die size. Compared to leading HPC processors, the chip demonstrates an 80x advantage in processor density and a 3.6x advantage in memory density.

Epiphany-V Summary:

  • 1024 64-bit RISC processors
  • 64-bit memory architecture
  • 64-bit and 32-bit IEEE floating point support
  • 64 MB of distributed on-chip SRAM
  • 1024 programmable I/O signals
  • Three 136-bit wide 2D mesh NOCs
  • 2052 separate power domains
  • Support for up to One Billion shared memory processors
  • Support for up to One Petabyte of shared memory
  • Binary compatibility with Epiphany III/IV chips
  • Custom ISA extensions for deep learning, communication, and cryptography
  • TSMC 16FF process
  • 4.56 Billion transistors, 117mm^2 silicon area
  • DARPA funded

Chips will come back from TSMC in 4-5 months. We will not disclose final power and frequency numbers until silicon returns, but based on simulations we can confirm that they should be in line with the 64-core Epiphany-IV chip adjusted for process shrink, core count, and feature changes. For more information, see report below:

Epiphany-V Technical Report

Cheers,

Andreas


This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

94 Comments

  • Nick says:

    Ripper!

  • Deano says:

    Impressive achievement from such a small team!

    Love to hear more about the deep learning ISA extension, potential a killer applications for such a chip.

    Good luck with phase from tape out to working chips in your hand.

  • Petar says:

    just one word: WOW

  • Stéphane Zuckerman says:

    Congratulations! I hope I’ll be able to get my hands on this very very VERY soon. 🙂

  • Badhri says:

    Funtastic achievement

  • Jose M Monsalve says:

    Oh man! So exciting!!!!

  • valache says:

    I would like to have one in the first lot…

  • Richard G Wiater says:

    How much will the Epiphany-V: A 1024-core 64-bit RISC processor cost?

  • Kevin Leary says:

    Congrats. Impressive.

  • Mx says:

    Will it support arduino IDE?

  • dspx90 says:

    Is there any pricing out yet? Would like to have one for graphic development, so i hope its not top notch pricing.

  • Arumaraj says:

    So happy to hear this +

  • Witek says:

    Finally!

    Congratulations. Awesome job.

    Doubling of local SRAM from 32 to 64KB per cores, is really good thing. But I also hope bigger programs / kernels can be loaded into few neighbours and used as a “mini cluster” of semi-local memory. Would be nice to have separate network for that (other than the one used for off chip communication).

  • Witek says:

    Assuming 1GHz for the IO clock, 192 Bytes / cycle, translates to 192GB/s total extranal bandwidth, or 179MB/s per core. In big systems I would expect half of the IO used for memory, and half for interconecting other CPUs, giving 90MB/s per core on average. Which isn’t very high in memory intensive applications (like machine learning for example).

    I know 1000 IO pins is a lot, but might still not be enough.

  • saeed says:

    it is good!!!!!

  • saeed says:

    it is good!!

  • Robert Fontaine says:

    Congratulations will you continue to provide an fpga alongside your lovely little chips?

    You may have been a bit too forward thinking with the parallela but it seems that the hpc community and even big data is catching on to the fact that some of their algorithms can be fundamentally more efficient on an slow fpga than on a fast cpu/gpu.

    Well done!

  • Nancy says:

    Congratulations! Way to go!

  • Gregory Fowler says:

    Congratulations!!! This is an outstanding achievement!! Well done!!!

  • roberto says:

    After so many time i don’t give a dime to possibility to see this 1024 corese’s chip and i marked parallela as “good effort but failed”.
    I am *SO* happy to see i was wrong.

    chapeau!

  • Jonah Probell says:

    Impressive. Congratulations.

  • Maximiliam Luppe says:

    I’m very happy to be part of this achievement as a Kickstarter backer. Congratulations!

  • Brian Miller says:

    I am backer #2262. When I received my cluster, I was expecting to be able to buy the 64-core variant later on. And so I have waited for over two years. Now that you’ve announced the 1024-core Epiphany V, I look on with enthusiasm. However, since I can’t buy the 64-core variant, I’m sure I won’t be able to purchase this.

    How do I feel? I am on the outside of the candy store looking in. Back in 2007, Intel showed off an 80-core CPU. And I thought to myself, it would be so great to work with that. After all, I had worked with the Celerity, the first minicomputer to use a 32-bit microprocessor (two processor boards, each with two FPUs and an integer coprocessor). And so here comes the Parallella, and I thought to myself, “Yeah! This will be fun!” And … No interconnects. No 64-core, except for the initial Kickstarter. Supercomputer.io came and went.

    Yes, I have Pi’s, I have the Nvidia Jetson TK1, and I have other small strange things. I’ve supported Seti@Home and BOINC. Even though over 10,000 boards have shipped (how many now?), it’s still as if there’s a chicken-and-egg problem.

  • daniel says:

    Awesome, Congratulations!

    Any plans to deliver a Paralella around this new CHIP?

  • Thomas says:

    Promises are nice. I hope this will be a stable product this time… I’ve burned my hand with designing with a 64 core version, when it went to EOL before it got into mass production…

  • Santosh says:

    Fantastic……can’t wait to get hold of this

  • Jack says:

    Checking the blog for months waiting for the big news! =)

    Awesome but don’t forget to add a decent memorycontroller on your next board.
    All competing boards do either have a massive amount of fixed memory or SODIMM slots along side their cpu’s.
    It would be awesome to have this amount of computational resources and a space to park a ton of data.

  • edward says:

    This is a huge step forward for the Parallella project, as the smaller chip didn’t really offer much of a difference relative to intel’s offerings, which are up to around 20 cores. 64 limited power cores vs. 20 full power cores was not a compelling proposition. The thing that this chip offers finally, is a resounding price/performance advantage over intel’s rather expensive “big iron”. For example, in the press release for Intel’s E7-8890v4 chip (which costs over $12k/chip), they talk about 8 sockets boards, that have 196 cores that “start at $200k”. If you imagine 10 of the Parallela 1024 core chips, suddenly you have something that is very competitive. A 10,000 core system, at under $20k, would be a boon to research everywhere. Really at this stage of the evolution of massively parallel machines, the key thing is to get these systems into the hands of graduate schools around the world, so that new programming languages can be worked on. I know Tucker Taft’s project for a parallel language, called Parasail, hasn’t gotten much traction because hardly anyone has 1000 cores to program, so i am hoping the best for this new chip. The nVidia massive core systems are not easy to program, having been retrofitted from 3D graphics chips, and they present so many weird quirks to the programming. The parallella architecture is much cleaner and tremendously simpler. It remains to be seen how many algorithms can live within 64MB, but i suspect that the doubling of the RAM in this 5th gen chip will do the trick for most people. Once you start manipulating images, the memory gets eaten up fast. This whole process is going to take years to play out, but is a great step forward.

  • bob says:

    I am a bit worried money to developed came from darpa, they are well known to not be a onlus organization, everything they do is because the have some “grey” plans on it: on other side, community was not able to provide all money needed, so the “necessary evil” must be accepted by parallella. All in all, this 1024 cores chip seems to be “the next big thing” in “number crunching for everyone” field and we could count the time BEFORE it and AFTER it as, for “embedded systems” field, we start to do in 2012 with raspberry. Low size, low energy, 16nm, are good, little size of ram is a question mark for many kind of calculation but it is a good begin. Also scaling of efficency is unknown , it depend by many factors (type of algorithm, data do be moved inside mesh network, etc). only “test on field” will answer to this question. What we need to know now is price of chip: if it came for 1000$$ it is not a democraticizing of massive calculation, if it came for 100$$ it is. of course parallella must have back all efforts they placed inside the project as money reward. Hope parallella will do the right middle size between money reward and attention to community when they will set final price. hope with part of money from 1024 core chip’s selling they will have enough revenue to do in autonomy develop of parallella-VI with 16384 cores with 1MRAM dedicated to every single core chip 🙂

    we can do some speculation: if parallella-IV 64 core at 28nm run at 500Mhz, 1024 core at 16nm should reach 8-900Mhz
    for sure it will need at lest a heatsinnk , in worst scenario also a fan
    aside single chip board you will do, are planned some luxury version, i.e. mainboard with 4 socket where can be placed S.O.M. with parallella-V to have a scalable from 1K to 4K (or more) core system?

    Hope we will receive some anticipation before spring 2017.

  • Andy says:

    If it has “extensions” for Deep Learning (not a hardware/chip design person, myself), how will programmers make use of it? I have searched for ways to make use of epiphany with tensorflow or theano, but gotten nowhere. At least for me, developing machine learning models with C++ on bare metal is not an option.

  • Mike Ross says:

    I share bob’s concerns over DARPA funding and also daniel’s question about a new board based on this chip. If this anouncement is for real then a board based on this chip would be simply astonishing. The potential for advanced algorithm development would be phenomenal, especially if it could be made low cost and affordable. However, I really doubt that DARPA or NSA would like to see this technology in the public domain. I suspect that any production version being made available to the general public would probably be scaled down in some way. Anyway, I don’t get too excited about these announcements anymore. I still remember when they said 64 core boards would be available but it seems only few got those. The rest of us had to make do with 18. Sorry to be a kill joy folks but all this just sounds too good to be true.

  • bob says:

    mike ross, you are right. Too much often i heared “we will do” instead of “we already did”. too much often people promise (im not referring to parallella, just talk in general) “miracles” that fail on real. we must be realistic, lets wait chip arrive. let wait board is ready. let wait bencmark are done. only there we can say “it is a success”.

  • Neel Gupta says:

    Finally !
    So, when can we buy it ?

  • Neel Gupta says:

    wait… “DARPA funded” ?
    What would be the repercussions of that ?
    Will we actually be able to buy it ?
    Will it have backdoors, like all Microsoft products ?

  • kamikaze says:

    no real board in stores – no trust

  • […] McKee). This work continued in 2016 as we needed a way to validate our design decisions for the 1024-core Epiphany-V.  Debugging with the simulator is an order of magnitude easier than with hardware, so you should […]

  • jeff says:

    wow, great job

  • James Preisig says:

    Andreas,

    I work at another end of the spectrum needing from what this chip seems to be designed for (TFLOPS of processing power). My systems are embedded and I need them to come in at about 2 watts for the core processing functions. In that, I need about 150 to 160 GFLOPS (single precision floating point) of real processing capability. This may be at the low end of the what your new chip can do. If so, I would hope that cores can be disabled to save power when they are not needed. You are calling this a SOC. Will there be any control processor or even a soft core processor on an FPGA that can “run” the system or will this chip need a companion chip that handles its interface to the outside world?

    Looking forward to seeing the new chip and its performance and power consumption. What is the size of the new chip?

  • James says:

    I really like the improvements. I was expecting Adaptiva to go the full 4096 cores and shrunken to 14nm, but that wouldn’t leave much in the way for debugging, optimisation and other improvements due to the substantially increase in cost.

    I’d love to see how well this scales up in performance compared to the previous generation and would be happy to test it 🙂

  • Roger says:

    congrats man. you guys have come a long way and this sounds really impressive.

  • Ali Azarian says:

    Congratulations! That’s a great news !

  • Carlos Perez says:

    DARPA funded the internet. They fund a lot of stuff.

    As I understand this is 64k per processor. So that’s 64k for both code and data?

    It is an interesting architecture and will have to warp our minds to figure out how to use this in embedded applications.

    It is not going to work well for training deep learning because of the memory bandwidth bottleneck, However, it should work okay for inference assuming that we can exploit the estimated 1 teraflops capability.

  • Eugen Leitl says:

    64 kByte embedded SRAM in each node has not much of a bottleneck. Accessing remote node memory is penalized by latency, commensurable with distance. So this assumes access locality, which is quite often a given for many problems.

  • Amit says:

    This might be very well for training deep learning too.. Just need to push the envelope enough 🙂 .. Waiting for the actual product which we can buy and experiment.

  • Victor says:

    Congrats! Anything you can release on the new deep learning capabilities will be widely appreciated.

  • Asterion says:

    I/O signals not pins

  • Asterion says:

    Yes, vapour wear until it turns up on the market.

  • dast says:

    what would be the price !? could we sold it now !?

  • dast says:

    where could we buy it now !?

  • Kie says:

    I want one too!!!

  • Traroth says:

    Any news yet?

  • SeyedRamin says:

    Where could we buy it now !? How much does it cost?

  • Tom says:

    Are we there yet? I feel like a 5 year old waiting for this!

  • David says:

    Any chance for getting a status update? Especially on availability and estimated cost?

    It has been 6+ months since the announcement on the tape out. At the very least, when we should check back in for an announcement? Perhaps a mailing list we could add or emails to to make sure we get the announcement when it comes out?

  • name says:

    so? from announcement to now quite 7 months are gone!
    no update? no info? no nothing?
    shall we derubricate this post from “good news” to “vaporware”?
    if there is delay it can be acceptable, ok, but al least advise us!

  • Jeneva says:

    Mann, das geht ja richtig zu Herzen. Schnief. Bin sonst nicht so leicht zu beeindrucken. Spende hiermit Trost und wünsche Licisbngsdeutleher im Exil alles Gute! Möge er seine neue Tussi bald in die Arme schließen können.FUNKY

  • Cash says:

    Heck of a job there, it abuslotely helps me out.

  • R.L. Flores says:

    We are finalizing the resurrection of our S&L banking, financial and investment A.I. engine and application. Our system was developed and implemented using the Texas Instruments’ “Explorer” Lisp engine and accelerated/processed by AMD’s 2900 slice processor family. We are building a new engine that will run our inference engine, pattern recognition engine, etc. We’d like to begin design-in of an Epiphany V cluster of 128/256 and need an availability date. Please advise.

  • Hello very cool website!! Guy .. Beautiful .. Amazing .. I’ll bookmark your blog and take the feeds additionally…I am satisfied to search out a lot of helpful info right here in the put up, we need work out more techniques on this regard, thanks for sharing. . . . . .

  • My Blog says:

    Quality content is the secret to invite the users to pay a visit the web site, that’s what
    this website is providing.

  • Patrick Law says:

    Cannot wait for this to come out so I can do a review for my reader! This is going to redefine the supercomputer market again.

  • Very nice post. I just stumbled upon your weblog and wished to say that I’ve
    really enjoyed surfing around your blog posts.
    After all I’ll be subscribing to your rss feed and I hope you write again soon!

  • Baracat says:

    Any fresh news? =) Regards,

  • neil says:

    Getting close to a year now, and I’m getting scared.
    I was around for the initial kickstarter, however was a senseless teen at the time with no brains to save up for the original board.
    I’ve since then (rather recently) come up with a great idea for personal use with a 64 core or higher chip.
    Now here’s where my fright enters the room. Usually, I can find things I once wanted to purchase within a few hours, however after days of looking I’ve only since been able to find the 16 core chips for sale, and at an inflated price of ~$150.
    I did find your page explaining the price jump, and I understand. But after reading this, I was excited to see a 1000+ core chip, yet a year later there is no news. My hopes of getting ahold of anything above 16 cores are drying out.
    This was the first piece of hardware I was ever excited about it’s release back in 2012. Starting to look like I’ll be once again, moving on.

  • Ladislav Jech says:

    But what is the performance is really another story here… I wasn’t able to find any realistic comparison..

  • Ladislav Jech says:

    This is now lonely project, I hope you replace the leader and get into 1024 clusters production.

  • xman says:

    Hello

    I want to purchase few new devices. Where to get them?

  • J.G. says:

    It’s dead, jim!

  • Send Help says:

    Damnit! I want this! No updates?!?!?

  • DylanChief says:

    I have checked your site and i have found some duplicate content, that’s why
    you don’t rank high in google, but there is a tool that can help you to create 100% unique articles,
    search for; boorfe’s tips unlimited content

  • If you do not make C-newbie-level examples, without wider set of boards (kids spent thousands $ of parents money on GPU’s, so luckily they spent money a 500$ for innovative board as well), you still be at educational / kickstarter stage. Do not go RPi way, as long as they truly make money on “shields” – not the RPi’s itself. Your processors should be reliable on motherboard in a minimal way (this is one – from many reasons – why AMD is not evolving it’s CPU’s), and prototype – friendly. The lack of next – generation boards is discouraging. Also processor comparisons should be more readable, as long as none proc could be the greatest at everything. People are interested in:
    – no of cores,
    – language support (users prefer Java; vendors prefer C/C-based C++) – I would choose C-based C++11 (templates, some other useful stuff),
    – size of memory(less important), and its throughput(very imortant),
    – COOL LOOKING COOLING SYSTEM AT BOARDS – at least six heat-pipes, and biggest fan – as one can imagine. RGB LED’s are standard (it is not joke – hardware must be “sexy”),
    – introduce SATAIII interface, or make some trivial battery-backed DDR4-SSD hybrid (via super-capacitor for power off handling) for proper OS deploying. Lubuntu runs smoothly on 4GB HDD. Also transfer of data, and instruction within in-RAM-DMA manner should make some basic enhancement,
    – no one take care if where is single CPU Epiphany V, or 32 Epiphany V CPU’s on a single die – numbers above makes impression, despite they are pointless (Intel AMD64 uArch has peak memory throughput of few GBps, but they make commercials with some 54GBps-and above… ;D). Do not go such way – publicate real numbers,
    -Intel sell’s already fabricated CPU’s as non-cut silicon dies, some cheapest Celeron could be good enough for proper work deployment to N-dimensional mesh of your coprocessors. There are a lot of already checked such models within GPGPU coprocessors,
    -PCIe coprocessors are easier to maintain (and its libs), than separate boards.
    Post Scriptum: I am only an amateur – do not rely on my opinion, as far as I can unintentionally mislead.
    Post Post Scriptum: With all respect to founder of Adapteva, but any single-man-based business is not optimal.

  • NGGRPhTN says:

    So, when are we going to get them for our cell phones?

  • Steven Douglas Gould says:

    I would like one also when do these go on sale?

  • Congratulations on the article, I would like to share it with my facebook friends.

  • Very good content, I enjoyed learning

  • Nice weblog right here! Additionally your web site rather a lot up very fast!
    What web host are you the use of? Can I get your
    affiliate link to your host? I want my site loaded up as fast as
    yours lol

  • Hi, I like your article, this kind of article helps me a lot.

  • Arturo says:

    That is really interesting, You’re a very professional blogger.

    I have joined your feed and stay up for searching for more
    of your magnificent post. Also, I’ve shared your site in my social networks

  • Samantha says:

    It’s truly very difficult in this active life to listen news on TV, so I just use
    web for that reason, and get the newest news.

  • Sam says:

    where to buy this ?

  • At tһis time I am ցoing away tߋ do my ƅreakfast, aftеr havіng my breakfast coming ovеr again to read furthеr news.

  • Lex says:

    If the administration of this site are reading this, it would be wise to renew or to acquire a new certificate for your website, so people aren’t turned away from your site.

  • It is an open source RISC based ISA along with open source implementations of example processor cores. Then you could have had a processor that was completely open and did not include any proprietary code. The chip is about the same size as the Apple A10, so in terms of silicon area it s in the consumer domain, but price will only come down to consumer levels if shipments get into millions of units. Big companies take a leap of faith and build a product hoping that the market will get there. Small companies get one shot at that. With University volumes and shuttles, we are talking 1 costs. So the $300 GPU PICe type boards become $10K-$30K with NRE and small scale productio folded in.

  • Marie Maurer says:

    Any update on this great chip?

  • Fatima says:

    What’s Happening i’m new to this, I stumbled upon this I have discovered It positively useful and it has aided me out loads.
    I’m hoping to contribute & help different customers like its aided me.

    Good job.

  • Dennis says:

    What’s up everyone, it’s my first pay a quick visit at this
    web site, and piece of writing is truly fruitful designed for
    me, keep up posting such articles.

  • Minnie says:

    І have read so many content regarding the blogger loѵeres however thіs posst is reаlⅼy
    a gopd article, keeep it up.

  • Good site you have here.. It’s difficult to find high-quality writing like yours nowadays.
    I honestly appreciate people like you! Take care!!

  • Nureddin says:

    Eğer doğruysa 1000 adet satın almak istiyorum

  • It was excellent and good, completely useful. Thank you for this information instead of you. May your success increase, my good friend

  • Michael DeByl says:

    Virtually useless without DMA shared memory access.

Leave a Reply