Wikipedia:Reference desk/Archives/Computing/2019 May 10

= May 10 =

What do you call the "super Raspberry Pi" class of computers?
...well, that's how I think of them anyway - basically a small, boxy thing without peripherals but serious processing power. I'm looking for something to run month-long single processor simulation models on, and if that pairs an i7 processor and RAM with one USB slot and nothing else, it'd be fine. I just don't know what term to even search for here. Suggestions? -- Elmidae (talk · contribs) 16:47, 10 May 2019 (UTC)
 * The RaspberryPI is a single board computer, however that says nothing about the onboard computer power. The terms "MiniPC" and Nettop seems to yield a large number of hits as well. Searching for your CPU of choice in combination with either term will likely point you to where you want to go. WegianWarrior (talk) 17:44, 10 May 2019 (UTC)
 * The Intel Next Unit of Computing is one famous extreme Small form factor line of computers with versions that go up to i7 although these aren't really comparable to the Raspberry Pi line in either design or size. (Also they tend to have fancier GPUs. The super fancy ones super fancy GPUs.) And given the performance you're demanding, very different in terms of prices. There are lots of other similar things both similar form factors by others e.g. Nano-ITX [//www.aaeon.com/ru/p/nano-itx-motherboards-nitx-skl1] or Pico-ITX [//www.aaeon.com/en/p/pico-itx-boards-pico-kbu4] or custom designs e.g. Gigabyte BRIX, Zotac ZBOX, Asrock Deskmini, Asrock iBox-R1000. And even smaller form factors e.g. the Intel Compute Stick although those never had anything close to an i7. The Intel Galileo was probably the most similar in design to the Pis, which of course also means in terms of performance, actually a little lower than even the equivalent generation Raspberry Pi 1 I think. As WegianWarrior indicated, your question is fairly unclear. What do you actually need in terms of size, performance, power consumption, price, cooling etc? A super small device with no real cooling is naturally not going to provide so much performance. And of course, the more performance you demand at a lower TDP the more you need to pay. Also, especially if you are only using something single threaded, an i7 may not actually provide much advantage. They can be clocked slightly higher and tend to have more cache, but it depends on precise model. And very importantly, given you are looking at a small device but I assume very high usage rate at least of one core, don't expect such devices especially the very small ones to have maximum performance constantly. It's quite likely they will have to spend a fair amount of time throttling to keep thermals under control or at a minimum won't be able to keep turbo up for long despite you only using one core. This is why careful consideration of your demands is paramount. For example, a device intended to have limited or no noise for a living room or office or similar situation (which it's clear from the design etc is the intention of many of the devices earlier) may not be necessary for something that can be dumped in a garage, attic, cupboard or whatever. Likewise a device intended for a server room where vibration and heat generation and to some extent noise management with all those device is probably important. (If your program is not single threaded, you should be considering how many cores you can meaningful use. And I'm not sure why you mean single processor. I mean it's quite likely something single processor makes the most sense considering price and other factors, but OTOH, the difference between a 8 core single processor vs 2x 4 core processor may not be so extreme except for certain highly specific usage patterns so again considerations of price, performance, heat, power usage etc come into play.) Nil Einne (talk) 04:49, 11 May 2019 (UTC)
 * BTW, to emphasise why the above matters, if you just want a cheap high performance device but don't actually need it to be super tiny, completely noiseless or extremely power efficient, a good bet tends to buying ex-lease computers. These are often fairly cheap for the computing power, are smallish and have low noise. They often don't have extra GPUs especially at the low end although I don't see that this or other stuff matters except as it affects price. Random e.g. [//www.ebay.com/itm/Dell-Optiplex-7010-Desktop-w-Core-i7-3770-3-40-GHz-Activated-Windows-10/254225938091] Especially given the current state of Intel's year by year advancements, they could potentially be high performing or at least not much worse than even some of the fairly fancy NUC devices once the effects of thermal throttling etc come into play. They obviously aren't as small or as power efficient as some of the NUC like devices but are far cheaper. Note in particular, don't assume 'small device=simple=cheap'. In reality you generally pay more for getting too small. Nil Einne (talk) 05:11, 11 May 2019 (UTC)


 * Blade server?  Pis don't have " serious processing power". Andy Dingley (talk) 10:15, 11 May 2019 (UTC)
 * When I've needed more power than a Pi in robotics applications I've found that the Odroid range isn't too bad, although it does get the power through multiple cores. I Haven't had any experience with anything similar to a Pi using an i7, but the Minnowboard has the Intel Atom, and I've been attracted to the Nvidia Jetson range for real time graphics processing applications, so there might be something there. I search for "single board computer" and "i7" did turn up some hits - I'd be worried about cooling, but there may be options in that as well. - Bilby (talk) 10:57, 11 May 2019 (UTC)
 * I think one important question is if you have checkpoints (at least daily) in your code, so you can cope with failure. If not, you need very high quality components and probably a UPS for backup power. Or a high-end gaming laptop that can go into sleep if it runs out of juice. There have been cases of hyperintelligent pandimensional beings very pissed off when their computer failed 5 minutes before delivering the final result... --Stephan Schulz (talk) 11:22, 11 May 2019 (UTC)

Thanks guys, that is lots of material to ponder! I admit the question was vague - that's because I'm not even sure what characteristics will be desirable. The use case here is agent-based models that run in an environment which is wedded to a single memory space (built in MASON (Java), FYI). As distributing the model is thus not an option, it all depends on local processor speed; and since we are looking at weeks to months per run here, it seems sensible to put the money into a fast processor and save on graphics card (there's no graphical output), peripherals etc. Hence my eyeing what I perceive as stripped-down workhorse systems. If that ultimately comes out as more expensive than just hijacking an ex-lease gaming rig, then it's probably not the way to go.

- Related question: The MASON documentation states: MASON is not a distributed toolkit. [...] MASON was designed to be efficient when running in a single process, albeit with multiple threads. It requires a single unified memory space, and has no facilities for distributing models over multiple processes or multiple computers. - I'm interpreting that as being able to use multiple cores within one system for threading (as AFAIK thread distribution to cores is handled by the OS anyway, and cores share one memory space); but no functionality of distributing to something that makes use of physically separate RAM. Does that sound right? -- Elmidae (talk · contribs) 17:22, 11 May 2019 (UTC)


 * A multiple socket system is still multiple cores within one system albeit generally with each socket/physical CPU having its own local RAM in a Non-uniform memory access design for most modern x86-64 systems. (Older systems without IMC i.e. where the memory controller is not on the CPU are probably different e.g. Xeon equivalents of the Core 2 using LGA 771.) But there's no guarantee a system with a single socket has Uniform memory access anyway. Notably a number of the AMD Threadripper CPUs do not [//www.anandtech.com/show/11697/the-amd-ryzen-threadripper-1950x-and-1920x-review/3] [//www.tomshardware.com/reviews/amd-ryzen-threadripper-2-2990wx-2950x,5725-2.html]. I believe it's also accepted that Intel is likely to have this in the future when they too start to make MCM chips see e.g. the discussion to this article //www.servethehome.com/intel-cascade-lake-ap-is-this-4p-cascade-lake-xeon-in-2p/] or this [//www.tomshardware.com/news/intel-cascade-lake-xeon-ap-e-2100,38017.html]. As noted there and discussed in more detail here Intel also have their sub-NUMA designe [//www.anandtech.com/show/12409/intel-launches-xeon-d-2100-series-socs-edge/4] [//software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview] and before that their cluster on die design [//kb.vmware.com/s/article/2142499]. As I understand it and somewhat implied by the later name, this does affect a single socket/CPU (although I think it's only something that is present in systems designed for multiple sockets). This is a complicated area of software and CPU design [//superuser.com/questions/916516/is-the-amount-of-numa-nodes-always-equal-to-sockets] and I barely understand it, but I again suspect it makes sense to go back to the basics. Does your workload actually need the sort of memory access speeds where NUMA will make a difference? If not, then it seems to me it doesn't matter. If it does, then besides ignoring a NUMA design, it probably makes sense to consider your memory access performance requirements in purchase decisions. For example how much difference will a dual channel system make over a single channel one? (Of course the performance difference between non local and local memory I think may often be an order of magnitude or more so it may be it NUMA matters but multiple channel won't matter. But if it does making sure your system is dual channel or whatever is often just a matter or making sure it uses 2 RAM sticks rather than one so normally is very low cost.) Nil Einne (talk) 03:08, 12 May 2019 (UTC)
 * Forsooth, this more and more sounds like material for the actually hardware-savvy people on the project (not me). I'll punt it over to them and see who flinches first :] -- Elmidae (talk · contribs) 03:18, 12 May 2019 (UTC)
 * P.S. I read slightly more about AMD's Threadripper design and I think for those processors that support NUMA, although there is obviously a difference between local and 'remote' memory access (as implied by NUMA) the actual difference is fairly small [//www.guru3d.com/articles-pages/amd-ryzen-threadripper-2950x-review,4.html] as the processor to processor interconnect on the MCM is fairly advanced (and short) so 'order of magnitude' is not likely to be accurate. It may be true for multi socket systems relying on Intel QuickPath Interconnect or HyperTransport though. Not sure how things are for Epyc multisocket systems using Infinity Fabric [//www.nextplatform.com/2017/07/12/heart-amds-epyc-comeback-infinity-fabric/] or Xeons with Intel Ultra Path Interconnect [//www.tomshardware.com/reviews/intel-xeon-platinum-8176-scalable-cpu,5120-4.html]. That said, I perhaps didn't emphasise enough that I find it unlikely something that will need to concern you. My main point was to consider whether there was a performance reason to restrict yourself to single socket systems, in reality even buying refurbished ex lease systems these tend to very expensive so probably aren't what you want to look at. (Discounting stuff too old to be worth it like Core 2 era systems.) And besides even if they are in your price range, before worrying about NUMA, it's probably worth making sure your workload can reasonably take advantage of 16 or probably more threads you'd expect from such a system.  BTW I earlier mentioned performance of the memory subsystem, but of course performance has various measures like bandwidth and different aspects of latency. And these aren't always proportional in fact increased bandwidth can often come at the expense of latency.  P.P.S. Reading the MASON thing you quoted, my read is it's only really talking about trying to run multiple copies of the program perhaps on physically separated computers, clusters etc. It may be you will have performance disadvantages from running on a system with multiple NUMA nodes since it probably doesn't know how to manage this, but maybe not if the workload isn't highly dependent on memory access performance. Remember NUMA/UMA is about uniform memory access. AFAIK all such systems will still have a unified address space and so programs are free to simply ignore that it's NUMA if they want, just with possible performance disadvantages. [//software.intel.com/en-us/articles/optimizing-applications-for-numa] (The OS should hopefully recognise it's NUMA and try to schedule things accordingly as best as it can but I would imagine this can be limited when the program itself is using all the threads.)  Nil Einne (talk) 18:13, 12 May 2019 (UTC)
 * OK, that's what I was hoping they had in mind. Will have to find out a little more about it though, I guess. -- Elmidae (talk · contribs) 19:18, 12 May 2019 (UTC)


 * Elmidae, just rent a server for a month. Try here for some good offers.  67.164.113.165 (talk) 02:42, 14 May 2019 (UTC)