Talk:RAID/Archive 6

Minimum number of drives
The column "minimum number of drives" is just plain incorrect. First of all, it appears to refer to a nondegenerate minimum number of drives which is completely different from the actual minimum number of drives (for example, a 3-disk RAID-1 can be considered a degenerate 3-disk RAID-6; this is important for implementation since it may be useful for reshaping), but second of all, requiring 4 drives for RAID-5 is flat-out incorrect. Hpa (talk) 02:39, 3 January 2012 (UTC)
 * Yes the minimum number of drives for RAID 5 was indeed wrong, and is now fixed. Robert Brockway (talk) 01:43, 4 January 2012 (UTC)

Triple-mirror? +N mirror?
Is there a standard for triple-mirroring? This is basically the same thing as a standard two-drive mirror configuration, except the data is written to three drives at once, and can be read from any of the three. Any two drives can fail and no data is lost.

This would be an acceptable alternative to RAID-6 when you have huge 300+ gig hard drives, but only say 100 gig of server data. No parity engine is needed, just write duplication across independent parallel controller channels for maximum performance.

+N Mirror basically means data could be mirrored across however many drives you want, up to the controller's interface limit. Eight redundant drives? Sure, why not.

DMahalko (talk) 03:27, 24 February 2012 (UTC)

Array failure rate
The Array Failure Rate, as defined in the table, lists the probability of the data loss in the given period under assumption that no drives are replaced during this period. This metric, while in itself valid, is not what one would typically expect as the "failre rate". Leaving the dead drive without replacement is against any maintenance practice and/or recommendations. I suppose we should at least add another footnote clarifying this? Alexey V. Gubin (talk) 19:44, 12 March 2012 (UTC)

RAID 2
According to Patterson and Hennessy 2012, RAID 2 is Error Detecting and Correcting Code, and RAID 3 is Bit-Interleaved Parity. This is different from what is in the article. Espertus (talk) 00:40, 29 March 2012 (UTC)

Neutrality in RAID 5 vs RAID 10 for databases?
If the section concludes, "In short, the choice between RAID 5 and RAID 10 involves a complicated mixture of factors. There is no one-size-fits-all solution, as the choice of one over the other must be dictated by everything from the I/O characteristics of the database, to business risk, to worst case degraded-state throughput, to the number and type of drives present in the array itself. Over the course of the life of a database, one may even see situations where RAID 5 is initially favored, but RAID 10 slowly becomes the better choice, and vice versa.", I don't see how the neutrality of the section can be called into question. — Preceding unsigned comment added by 68.104.214.16 (talk) 15:39, 30 April 2012 (UTC)

"RAID Array" terminology
Isn't saying "RAID Array" just like saying "ATM Machine"? Why stop at array? Why not call it "RAID Array of Independent Redundant Discs"?

173.13.189.193 (talk) 22:15, 17 September 2012 (UTC)
 * We report the way it is covered in reliable secondary sources. Taking a casual survey of sources provided in the article,    are just some of the articles which use the term "RAID array". So I am sorry to burst your bubble, but this is how Wikipedia works. It is not perfect, but when the rules are consistently applied, it is predictable. Elizium23 (talk) 23:18, 17 September 2012 (UTC)


 * Yes and no.. A RAID subsystem is made up of multiple components, primarily one or more controllers and 2 or more drives. When referring to multiple components, a collective name is required, which is generally in this case also array.  One could also make the same complaint of "A redundant RAID configuration" or "The RAID disks", but they're as valid as "The RAID Array".  One could also make the argument that saying "The RAID failed" is ambiguous since it doesn't differentiate between the controller and the storage, or could be confusing to someone not aware of the context, though personal I suspect that would be fairly thin ice to jump on.  There is also an argument to be made for the fact that a bare RAID isn't as nice to say RAID Array.  Personally I'd guess that it's come from an interbreeding with "Storage Array", maybe via "RAID protected storage", want of that collective noun, and once used it simply stuck, particularly since the "contraction" of RAID would be array in much the same way as ATM "contracts" to machine. Having said all that, "Good English is what makes one understood", which means we're as likely to be stuck with "RAID Array" as we are with "ATM Machine", regardless how much correcting is attempted. (On an unrelated note, this is probably the first time I've logged in to this account in several years, so welcome back :P) Nazzy (talk) 02:20, 16 November 2012 (UTC)

Any citations to support claim for RAID 5 in IBM S/38?
In the history section the claim is made that: "In October 1986, the IBM S/38 announced "checksum" - an operating system software level implementation of what became RAID-5" which is one year before Patterson defined the term RAID. I have done a pretty thorough search for literature evidence for this and can find none- does anyone have a reference. Otherwise I will remove it in the future. 121.45.215.68 (talk) —Preceding undated comment added 04:35, 2 December 2012 (UTC)

Section "RAID 10 versus RAID 5 in relational databases" problematic
The section "RAID 10 versus RAID 5 in relational databases" has several problems- the main one is that RAID5 is no longer recommended or used for any large data set, presumably including most databases. Could this section be generalised to compare parity versus non-parity (mirrored) RAID schemes? Also, there is a lack of references. 121.45.215.68 (talk) —Preceding undated comment added 22:44, 3 December 2012 (UTC)

Duplicate sentence
The sentence "a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt" appears twice in the description of RAID 5. Somebody should fix it. 173.34.179.126 (talk) 14:19, 22 December 2012 (UTC)
 * Done. 121.45.193.118 (talk) 22:09, 22 December 2012 (UTC)

Various improvements - feedback please.
I've just made a few edits to improve clarity.

I'm looking for constructive criticism regarding the following ones:

(and feel free to make them for me.

CONFUSING:

"Background scrubbing can be used to detect and recover from UREs (which are latent and invisibly compensated for dynamically by the RAID array) as a background process, by reconstruction from the redundant RAID data and then re-writing and re-mapping to a new sector; and so reduce the risk of double-failures to the RAID system[42][43] (see Data scrubbing above)."

Idea:

Add a subheading to this section and the one above what's CONFUSING

and

Mitigations
REWORD THUS:

Double....


 * Background data scrubbing is a RAID feature that [mitigates the problem|attempts to address the issue] by having a background process read drive blocks when a drive is idle and whenever a URE occurs, using the redundant RAID data to reconstruct, re-write and re-map the logical block to a new spare block, and so...

or perhaps


 * Background data scrubbing is a RAID feature that [mitigates the problem|attempts to address the issue] by having a background process read drive sectors when a drive is idle and whenever a URE occurs, using the redundant RAID data to reconstruct, re-write and re-map the logical sector to a new spare sector, and so...  Where the read triggeres a Recoverable Read Error, the drive itself reconstruct, re-write and re-map the logical sector to a new spare sector...

Section on Atomicity: Wow, this is confusingly written. Perhaps it should kick off with the simple statement, like "Databases are designed to maintain data consistency despite the non-atomicity of the write process of the disk drives on which they normally store data, even when there is unexpected power loss." If I'm not mistaken, either disk drives can't perform atomic write processes, period, or ones that can are rare enough to merit little more than a parenthetical.

Write case reliability:

P(aragraph)3 doesn't mesh with P1 and P2. In fact all 3 are about write-back, but only P3 calls it that. Merge 'em.

Hardware Labeling:

Some systems have a feature intended to reduce operator error and facilitate drive failure management by having an LED by a failed drive indicate that it is a drive that has failed.

History:

REMOVE:With S/38 checksum, when a disk failed, the system stopped and was powered off. Under maintenance, the bad disk was replaced and the new disk was fully recovered using RAID parity bits. While checksum had 10%-30% overhead and was not concurrent recovery, non-concurrent recovery was still a far better solution than a reload of the entire system. With 30% overhead and the then high expense of extra disk, few customers implemented checksum.[citation needed]

doesn't make sense.

It's so confusingly written I don't know what it's trying to say or how to fix it.

--Elvey (talk) 11:42, 30 December 2012 (UTC)

Edit revert
I have reverted two edits by ( http://en.wikipedia.org/w/index.php?title=RAID&diff=523846804&oldid=523363466 and http://en.wikipedia.org/w/index.php?title=RAID&diff=523846913&oldid=523846804 )due to the fact they removed the entity of the Standard Raids section, removed part of the cite in the nested/hybrid section, and added no content. I'm going to assume this is either vandalism or accidental, though the IP does have other changes elsewhere that look less suspicious. Anything needed other than undoing the edit? Nazzy (talk) 15:58, 19 November 2012 (UTC)


 * Nope! If an edit seems accidental or otherwise incoherent, and is not explained in the edit summary (the editor you mention didn't leave edit summaries), simply undoing it is the best solution. On a similar note, always providing edit summaries helps others understand your edits. – voidxor (talk &#124; contrib) 08:07, 19 January 2013 (UTC)

Claim that RAID 3 is theoretical and not used in practice is incorrect.
The article claims RAID 3 is a 'theoretical' RAID level and 'not used in practice.' This is incorrect, EMC Clariion and VNX support a RAID 3 option. — Preceding unsigned comment added by 98.229.192.30 (talk) 21:28, 14 December 2012 (UTC)


 * Yes true, there do appear to be several older implementations which used RAID 3- the article text has now been changed to say that RAID 3 is not widely used in practice, with a link to the BSD Unix implementation. I believe that is an accurate assessment of the current usage- is the RAID 3 option still supported in the product lines examples above? It would be good to find a reference giving estimated usage of these RAID levels in contemporary products.

121.45.193.118 (talk) 12:09, 20 December 2012 (UTC)

You can see the RAID 3 reference, as well as 0, 1, 1/0, 5, and 5 in the current (1/7/2013) datasheet for EMC VNX on their website. http://www.emc.com/collateral/hardware/data-sheets/h8520-vnx-family-ds.pdf — Preceding unsigned comment added by 168.159.213.60 (talk) 20:02, 7 January 2013 (UTC)

RAID 10 compared to RAID 5 section moved here for improvement
The section ==Mirroring versus parity RAID levels in relational databases== is very problematic- e.g. the claim that "Given the rare nature of drive failures in general, and the exceedingly low probability of multiple concurrent drive failures occurring within the same RAID, the choice of RAID 5 over RAID 10 often comes down to the preference of the storage administrator" is incorrect-- UREs are extremely likely to happen with large disks and thus RAID 5 is not recommended for large databases by the major storage manufacturers (as noted in the main article). The references given are to a non-authoritative web site and do not support the claims in any case. A well-referenced section on this topic could be useful, but the current section is misleading to users of RAID. The current text is copied below for improvement before possibly returning to the main article. 121.45.213.224 (talk) 06:54, 28 January 2013 (UTC)

RAID 10 space efficiency calculations seems off
It seems that RAID 10 and RAID 1 should have the same space efficiency formula, namely 50% since they both only use mirroring to tolerate single drive failures. — Preceding unsigned comment added by 99.245.3.15 (talk) 19:40, 1 February 2013 (UTC)


 * Typically, two-way mirroring is used. In that case, you are correct about 50%. However, mirroring can be setup with triple or quadruple redundancy for RAID users who are especially paranoid. Space efficiency would then be 33% or 25%, respectively. Since the size of a RAID array can scale like that to use more than the minimum number of drives, the comparison table uses the variable n:
 * For a RAID 1 with the minimum two drives: 1/n = 1/2 = 50%
 * For a RAID 10 with the minimum four drives: 2/n = 2/4 = 50%


 * – voidxor (talk &#124; contrib) 05:15, 2 February 2013 (UTC)

I'd Just Like To Interject For A Moment
What you’re referring to as Linux, is in fact, GNU/Linux, or as I’ve recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX.

Many computer users run a modified version of the GNU system every day, without realizing it. Through a peculiar turn of events, the version of GNU which is widely used today is often called “Linux”, and many of its users are not aware that it is basically the GNU system, developed by the GNU Project. There really is a Linux, and these people are using it, but it is just a part of the system they use.

Linux is the kernel: the program in the system that allocates the machine’s resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Linux is normally used in combination with the GNU operating system: the whole system is basically GNU with Linux added, or GNU/Linux. All the so-called “Linux” distributions are really distributions of GNU/Linux. — Preceding unsigned comment added by 175.100.80.20 (talk) 10:33, 4 February 2013 (UTC)


 * This is a controversial issue which, naturally, has its own article on Wikipedia. Please see GNU/Linux naming controversy. – voidxor (talk &#124; contrib) 08:09, 5 February 2013 (UTC)


 * Ah, the war rages on. Always new meat adding to the mess on both sides. The simple fact is that GNU is no more critical to the operating system than the Linux kernel is. The difference is that the kernel is far harder to replace than the coreutils set.
 * You can start by replacing your shell with busybox (or, if you aren't quite as aggressive on the subject, zsh is a lovely shell)
 * gcc is nice, but clang is superior in some respects
 * glibc valid alternatives, ulibc being one, but is quite hard to get rid of for most users. Android manages quite nicely though, as does much of the portable software targeting BSD.
 * I use KDE and it's QT base in preference to Gnome and GTK+ (incidentally, the G there is for gimp, the G in gimp is for gnu).
 * I have a solid dislike of the autotools chain and make a point of avoiding them when possible. SCons is a nice alternative, CMake avoids some of the pain.
 * I use Vim in preference of Nano and hate Emacs for much the same reason most other people hate Vim... arcane key bindings.


 * I could continue, but the point is that GNU provides a core set of software, some of which is easily and regularly replaced. Most users interact very little or not at all with that core software. That software by itself is completely useless. The naming is done by the way the distribution is made:
 * Select your kernel type (there are many, such as hypervisor or micro kernel)
 * Select the kernel flavor (the Linux kernel is popular, but there are others, Darwin for instance ... there are also patches on the kernel like SuSE, and you can write your own if you're bored)
 * Select a bootloader that will step between bios and kernel (grub is popular, but this selection is architecture based. Android, Alpha, MIPS and ARM, for instance, use other bootloaders. PXE is also an option for x86 platforms and doesn't use grub so far as I remember)
 * Select a compiler (gcc, clang and icc are fairly popular)
 * Select a C library (good luck ... you're pretty much stuck with glibc for now unless you are on BSD or Mac)
 * Select a basic userland (this is the coreutils set, wget, less, etc. None of these are actually irreplaceable.)
 * Select a package format and manager, then start filling out your software tree.
 * See how GNU software is actually a small part? You design your OS around the kernel and interface, not around the coreutil set. You choose a kernel and interface based on the purpose of the system. You choose the rest of the software by looking what you need to support your choice of kernel and interface.


 * So why am I posting this here instead of on the page voidxor linked to? Because this flame war has been running for nearly 2 decades and doesn't show signs of stopping soon. With RMS on ones side and Torvalds on the other it's got another decade left in it yet. Posting there just adds my contribution to the mess, where as posting a reasonably neutral response here will hopefully show people not involved with the war that it really doesn't matter what name you use. Linus has it best... if you want something called GNU/Linux, you're free to make a distribution called that. Everyone else is free to call their distributions whatever they like so long as they don't claim ownership of code they don't own.


 * Now for something that is relevant... The references to Linux within this page are one of two technologies:
 * Linux MD raid / Linux software raid. This refers specifically to the software raid implemented in the Linux kernel and supported by the userland software mdadm. The userland code was authored by Neil Brown of suse and does is copyrighted to him, I assume the kernel portion is also authored by him though that is irrelevant. This software is not authored by GNU, they have no claim over it.
 * Linux LVM. The history here is a little more murky, but so far as I can tell the code is hosted and maintained by RedHat, both the userland tools and kernel code I think. Not a GNU project.


 * Using "GNU/Linux" in these two contexts is completely wrong, they are very much concerned with the Linux kernel's implementation and, to a lesser degree, with the supporting admin tools. As such, the original request is spurious and can be safely ignored.


 * Nazzy (talk) 17:17, 5 March 2013 (UTC)

RAID 1 nX Read Performance?
Hello! I'm concerned the table might be slightly wrong. Specifically, under column "Read performance" for level RAID1, it is stated "nX", which supposes the performance will scale as read is split among the drives (same process as RAID0). I believe this is not accurate.

The article itself states, under the RAID1 section: "the read request is serviced by either of the two drives containing the requested data".

So which one is it? Does RAID1 allow split reading on numerous drives, or can only a single drive in the array be read? From what I know, and from what I found searching, this seems to be entirely dependent on the RAID controller being used. Findings in this blog is that RAID1 read speeds are either slightly lower or equal to the read speeds of a single drive (that was either both a Windows Software RAID1 and an Intel ICH5R integrated Motherboard (aka Fake RAID) controller). I wasn't able to find actual proof that some controllers do in fact support split reading, but some people do mention that certain OS are able to do it. Quantos88 (talk) 16:07, 29 March 2013 (UTC)


 * RAID1 doesn't say you can't so far as I know, it's implementation specific. Windows RAID and Fakeraid aren't exactly the best examples though, maybe see if something real like an LSI / 3ware card manages it? You also have to consider the controllers involved and whether the silicon implementing the RAID (controller or primary CPU) has a good enough link to the drives.  If you have a single drive tying up the transfer layer the second drive won't get to talk as much.  Less of a problem now we use SATA so much, I think. Nazzy (talk) 03:22, 2 April 2013 (UTC)

Data key for "Comparison"
The figures used in the "Comparison" table (currently section 2.9 as I write this) are very useful, but only to people who already understand the technology. For anybody else -- say, somebody trying to use this page as if it were out of an encyclopedia -- the lack of explanatory keys makes the right-hand column much less helpful than it could have been.

Specifically, we need a brief explanation of what "A1", "B2", etc, mean in this context. Yes, somebody familiar with RAID knows that these represent storage blocks (or other sizes depending on the RAID level), but there's no obvious "redundancy" in the images. Even for, say, RAID 5, it's not obvious that any of that data has in fact been stored more than once.

The reason I didn't try to add such a sentence is that I wouldn't know how to write it. I had pointed a colleague at this page when they asked for a breakdown of RAID levels, and they came back more confused than they had been at the start. 99.102.75.166 (talk) 14:43, 15 April 2013 (UTC)

Citations needed for array failure rate formulae in main table
In the main table formulae are given of array failure rates but no citations are given to confirm them, and indeed they have been changed several times such that I have no confidence that they are correct. Explicit references for each of them is needed- I added a "citation needed" previously but this has been removed. 74.73.169.211 (talk) 16:43, 21 April 2013 (UTC)

Fault tolerance of raid 10
Under the table in 'Comparison' it is stated that **** Raid 10 can only lose 1 drive per span up to the max of 2/n drives Should this be n/2 drives? Michaelcochez (talk) 09:02, 15 July 2013 (UTC)
 * Yes of course. Although that also assumes only one mirror. Two mirrors (three disks per span) would tolerate 2n/3 etc. That begs the question of defining "n" of course, and generalizing to more than one mirror. I think? W Nowicki (talk) 23:56, 15 July 2013 (UTC)

reliability terms mistake
in the section "reliability terms" it says: "System failure is defined as loss of data and its rate will depend on the type of RAID. For RAID 0 this is equal to the logical failure rate, as there is no redundancy." however wouldn't the faliure rate be higher, asuming each drive in the array has a chance of faliure and the data is spread across more than one drive? ~patrick

Mirroring versus parity RAID levels in relational databases
A common opinion (and one which serves to illustrate the dynamics of proper RAID deployment) is that RAID 10 (a non-parity, mirrored RAID) is inherently better for relational databases than RAID 5, because RAID 5 requires the recalculation and redistribution of parity data on a per-write basis.

There are, however, other considerations which must be taken into account other than simply those regarding performance. RAID 5 and other non-mirror-based arrays offer a lower degree of resiliency than RAID 10 by virtue of RAID 10's mirroring strategy. In a RAID 10, I/O can continue even in spite of multiple drive failures. By comparison, in a RAID 5 array, any failure involving more than one drive renders the array itself unusable by virtue of parity recalculation being impossible to perform. Thus, RAID 10 is frequently favored because it provides the lowest level of risk.

Modern SAN design largely masks any performance hit while a RAID is in a degraded state, by virtue of being able to perform rebuild operations both in-band or out-of-band with respect to existing I/O traffic. Given the rare nature of drive failures in general, and the exceedingly low probability of multiple concurrent drive failures occurring within the same RAID, the choice of RAID 5 over RAID 10 often comes down to the preference of the storage administrator, particularly when weighed against other factors such as cost, throughput requirements, and physical spindle availability.

Calculation of Array Failure Rates?
The formulas for the Array Failure Rates in the Comparison Table were changed recently (April 2013). Does anyone have a reference for the calculations? Mobileseth (talk) 03:23, 24 August 2013 (UTC) mobileseth


 * I have seen edit warring over that math for awhile now. I would try asking the editor who last changed them (you mention April 2013?). Otherwise, if you have a reliable source for performance, space utilization, or failure rate formulas, please don't hesitate to replace the standing formulas with ones that you can cite! – voidxor (talk &#124; contrib) 06:24, 25 August 2013 (UTC)


 * Thanks for responding! I believe that A) the current formula for "Array Failure Rate" rate calculation as given is incorrect (it probably should be a summation based on the binomial theorem) and B) The "Array Failure Rate" isn't a particularly interesting way to think about the problem. Reference 3 A Case for Redundant Arrays of Inexpensive Disks (RAID) is a good source.  In Section 6 the authors analyze the reliability of an array based on both its Mean Time to Failure (MTTF) and Mean Time to Recover (MTTR).  They calculate the probability that an array can be recovered in time before another failure occurs.  Perhaps "Array Failure Rates" column should be replaced entirely with MTTF? Thoughts? Mobileseth (talk) 16:57, 25 August 2013 (UTC)mobileseth


 * I oppose using MTTF because it is in units of time and thus make and model dependent. Probability formulas are definitely the way to go here. Question is, are the current formulas correct and can we find a reference for them? – voidxor (talk &#124; contrib) 20:50, 25 August 2013 (UTC)


 * I would like to understand your reasons for opposing MTTF a bit better. Could you elaborate?  It was my understanding that the "drive failure rate" already contain units of time.  The text states that "each of three drives has a failure rate of 5% over the next three years" which indicates that the failure rate has units of %/year.  It seems to me that this value would also be dependent on the make and model for any specific drive.  The formulas given in A Case for Redundant Arrays of Inexpensive Disks (RAID) using MTTF are still algebraic in nature, and use MTTF as an independent variable. Mobileseth (talk) 15:53, 26 August 2013 (UTC)mobileseth

Drive error recovery algorithms
Biassed in putting SAS/enterprise drives in italics. Reduces the error handling to nothing more than setting TLER. ("How to not let a drive with bad blocks drop out so you need to replace it. Instead keep running with it! Yay!") — Preceding unsigned comment added by 2001:A60:11FA:6701:993E:921B:42E4:8DFE (talk) 19:41, 2 December 2013 (UTC)


 * I changed the italics to scare quotes. If you other constructive criticism to offer, please be civil and a seasoned editor should be able to help. – voidxor (talk &#124; contrib) 01:03, 4 December 2013 (UTC)

Article seems biased towards software raid and lowend uses
It doesn't even mention stuff like 520byte vs 512byte formatting or that normally drives in a hw raid get updated firmware that was tested for reliability. I guess next week someone will update it showing that only ZFS knows how to "check" a disk. — Preceding unsigned comment added by 2001:A60:11FA:6701:993E:921B:42E4:8DFE (talk) 19:08, 2 December 2013 (UTC)
 * Well, if you think most of the consumer/cheap RAID, including the Intel stuff, does much in hardware, you're deluding yourself pretty badly. It's mostly software/drivers, a bit of stuff in firmware, and minimal hardware. The article does back such claims with this source. If you know otherwise, cite your sources... Someone not using his real name (talk) 10:38, 14 December 2013 (UTC)

Semi-protected edit request on 3 April 2014
Please change 2/n to 1/2 in (1) the "Space efficiency" column of the Raid 10; (2) "Raid10 can only lose one drive per span up to the max of 2/n drives"

Oyang2010 (talk) 19:11, 3 April 2014 (UTC)


 * ❌: Thanks for pointing that out! Correct value is "n/2" as $n$ is the number of drives,  the article that way. &mdash; Dsimic (talk | contribs) 18:51, 6 April 2014 (UTC)

Justify Write Performance Claims in Table for RAID levels 4,5 and 6
The table claims that the write performance for RAID levels 4,5, and 6 are (n-1)x, (n-1)x, and (n-2)x, respectively. presumably n is the number of disks in the set and x is the write performance in IOPS of one such disk.

These claims are contrary to industry norms, which are x, nx/4, nx/6, respectively in the above list.

The article should provide a reference for the claims or they should be removed. — Preceding unsigned comment added by 168.159.213.208 (talk) 17:31, 16 April 2014 (UTC)


 * ✅: I have tagged these Citation needed. Hopefully an expert can revamp the expressions. Thank you for noticing it. – voidxor (talk &#124; contrib) 05:07, 17 April 2014 (UTC)

"Data is" vs. "data are"
I undid Dsimic's last edit, where several instances of "data is" were again changed to "data are". Numerous editors keep editing this article to change these back to use the singular verb form, in what can only be described as edit warring. Unlike traditional warring, though, I've observed Dsimic versus the world instead of just one editor.

While I agree with Dsimic that the plural Latin nouns usually end in "a" and single Latin nouns in "um", Wiktionary offers a very insightful a third possibility. Basically, "data" can also be used as an uncountable noun—which is the norm in computing contexts like RAID. Surely that's because individual bits number in the trillions in today's machines and thus counting becomes a vain exercise. Similarly, water is made up of individual molecules but "a glass of waters" is poor grammar. I believe this to be why the vernacular supports the singular form, and why we should stick with it here. – voidxor (talk &#124; contrib) 05:40, 14 June 2014 (UTC)


 * Hello there! I'm glad this is discussed, as "data are" sounds awkward no matter how correct (or not) it is.  I'm perfectly fine with using "data" as an uncountable noun, and "data is" sounds much better to me.  At least now I have a place to point other editors to once "data vs. datum" is raised again. :) &mdash; Dsimic (talk | contribs) 05:53, 14 June 2014 (UTC)


 * Not sure it will be a good place to point people to if you want "data is", since Voidxor's water example is going to convince people that supporters of "data is" can't think straight - "water" isn't the plural "molecule" abd besides no-one is going to talk about "a glass of molecules", but "data" is the plural of "datum"; it some contexts "data" is non-countable, in other contexts it isn't - probably it in most contexts it is, but to insist that it is non-countable in every context is precisely as silly a piece of pedantry as to instist that it is never non-countable. Michealt (talk) 16:16, 22 August 2014 (UTC)


 * Hello! Well, quite frankly, I'm a bit tired regarding "data is" vs. "data are", but—in the domain of Wikipedia articles, of course—an article should be fine as long as "data" is used consistently either as a countable or as an uncountable noun. &mdash; Dsimic (talk | contribs) 10:29, 25 August 2014 (UTC)


 * I agree with Dsimic; both are correct. The most important thing is that we're consistent. With that said, we should probably go with the vernacular—which appears to be the uncountable "data is" according to the edit history. – voidxor (talk &#124; contrib) 05:25, 28 August 2014 (UTC)

Reliable single disks
A recent edit by describes super-expensive SLED devices as “super-reliable” and the smaller, less expensive disks as “comparability unreliable”. That's not actually true, and I think it skews the description of the basic idea behind RAID. The leading motiviation to develop RAID was the higher performance which could be realized; associated benefits include larger and more flexible storage capacity, and reductions in equipment cost, size, and power consumption. Single disk reliability wasn't a big factor. Unician &nabla; 07:52, 1 September 2014 (UTC)
 * 1) No one in 1987 thought the reliability of disk storage was a big problem.  Although rotating disks, as moving parts, are more likely to wear out under normal use than purely electronic parts, that failure rate was still not very high.
 * 2) The original RAID paper by Patterson et al. cites about the same mean-time-to-failure rating for all of the disk drives mentioned, tens of thousands of hours of operation; in fact, the biggest-and-most-expensive mainframe disk (IBM 3380) and the smallest-and-least-expensive PC disk (Conner CP 3100) have exactly the same MTTF rating (30,000 hours).
 * 3) The emphasis on redundancy in RAID is because the overall reliability of an array drops by a factor of the number of drives involved.  One small drive may have a MTTF of three or four years, but an array of 100 small drives might be expected to have a drive failure within two weeks.  That's why redundancy was necessary, not because the individual drives were unreliable.
 * Well spotted! I suspect that they *suspected* that the inexpensive drives were less reliable (despite having the same stated MTTF figures), but the key issue re of an array of inexpensive disks is not that the drives themselves are less reliable, but that the array *itself* is less and less reliable the more drives it contains.Snori (talk) 01:53, 2 September 2014 (UTC)

Early RAID 2
The 3330, actually the 3830 storage control used a Fire code appended to the end of each block for error correction. It was not a Hamming code as used in RAID-2. More significantly, the Hamming code works in real time while the appended Fire Code works after a block has been buffered; so the reference to the 3330 should be deleted, regardless.

I am very certain that the 353 did use a Hamming code so it that sense it was similar to RAID-2. It was actually a Redundant Array of "Independent" Heads, 40 heads transferred in parallel, 32 data, 7 Hamming code and 1 parity. That's why the term "similar approach" is used in the article. I'd have to dig a bit to find a reference if this is disputed, but I am sure i can find one. Tom94022 (talk) 01:55, 15 January 2015 (UTC)

e.g., see page 157 of IBM 7030 Data Processing System Reference Manual; it discloses 32 data bits plus 7 ECC bits in the "high speed disk storage unit" which I am pretty sure is the IBM 353. Tom94022 (talk) 18:27, 15 January 2015 (UTC)


 * If you cannot find a clear and credible source that states what the article states, it doesn’t belong. Personal interpretations of technical material isn’t suitable. Strebe (talk) 07:33, 16 January 2015 (UTC)


 * There is at least one reliable source that the 353 was the disk file on the 7030. Tom94022 (talk) 08:27, 16 January 2015 (UTC)


 * That has nothing to do with it. The question is what any of this has to do with the article. Unless the source specifically notes the similarity of the 353 to RAID, then what you have is unverifiable WP:SYNTH and WP:OR. The debate going on above clearly demonstrates this. Strebe (talk) 11:00, 16 January 2015 (UTC)


 * What is left to debate? It is well established that both the TM and the 353 had 32 bit data words with parallel real time 7 bit ECC. It is also well established the 3330 used a serial (Fire code) ECC. 17:47, 16 January 2015 (UTC)


 * That entire list claiming “each of the five levels of RAID named in the paper were well established” has to go. The Wikipedia article is creating an account of history, and that is not allowed. Strebe (talk) 07:01, 17 January 2015 (UTC)


 * Hello, and sorry for my delayed response.
 * Regarding the "well established in the art" sentence, it's pretty much fine as it explains that the technology behind different RAID levels already existed in various products before "RAID" itself was coined as a term and the RAID levels were defined. Speaking of Wikipedia's history collection role, that sentence actually supports it by providing a more detailed timeline description.
 * Thank you very much, Tom94022, for clarifying the whole thing!  two references into the  section that confirm that IBM 353, as part of IBM 7030, used ECC codes. &mdash; Dsimic (talk | contribs) 16:18, 17 January 2015 (UTC)


 * This article expresses research done by editors (WP:OR) in the form of collecting information about certainly early computers. Then it draws connections (WP:SYNTH) between the topic (RAID) and the computers that those editors believe express RAID characteristics. Those connections have to be drawn by credible sources, not by Wikipedia editors. Please read those policies so that you understand what I mean and why the practice is not appropriate. Strebe (talk) 22:39, 17 January 2015 (UTC)


 * Finding reliable sources is not original research and synthesizing from them is permitted. Tom94022 (talk) 23:53, 17 January 2015 (UTC)


 * Yeah, otherwise we'd be just copying and pasting from sources, what wouldn't make much sense. In the end, even rephrasing the sources could be seen as some kind of a synthesis. &mdash; Dsimic (talk | contribs) 02:15, 18 January 2015 (UTC)


 * I don’t know how to be more plain, particularly if you are unwilling to even try to understand the policies. Have you read them? These edits draw conclusions beyond those found in the sources. I do not dispute the sources, but the sources say nothing about RAID. Tying RAID to the sources is WP:SYNTH. It is not allowed. What •is• allowed is for you to research sources that describe the historical background of RAID, and to state the conclusions of those sources. Whether rephrasing could be seen as some kind of synthesis is irrelevant; I’m not interested in debating Wikipedia policy. Policy is as clear as feasible, and it contains answers to such questions. Instead, I’m interested in the article having reliable sources. The synthesis of editors is not a reliable source. Please find sources that state these connections between the listed computing devices and RAID. Strebe (talk) 03:12, 18 January 2015 (UTC)


 * Yes, I've read the guidelines long time ago, and I still know them. Long story short, I'll try to find more references to additionally tie everything together.  Please, just give me some time. &mdash; Dsimic (talk | contribs) 03:38, 18 January 2015 (UTC)


 * Ok, found and a really nice reference in which Randy H. Katz, one of the authors of the RAID publication described in  section, clearly confirms it:
 * We were not the first to think of the idea of replacing what Patterson described as a slow large expensive disk (SLED) with an array of inexpensive disks. For example, the concept of disk mirroring, pioneered by Tandem, was well known, and some storage products had already been constructed around arrays of small disks.
 * This clearly takes WP:SYNTH out of the equation. Hope you agree. &mdash; Dsimic (talk | contribs) 05:10, 18 January 2015 (UTC)

Dsimic nice find with the Katz quote. I believe I heard him say the same thing at some point but didn't bother to look for it because Strebe mis-states what the article says and then mis-applied WP:SYNTH. The RAID paper defines RAID-1 as disk mirroring. The literature of the far earlier DEC product discloses disk mirroring. It is no violation to then say disk mirroring existed before the term RAID-1 was defined. Ditto for each of the other RAID levels and for the introductory sentence. But that is moot now, thanks to yr great work. Tom94022 (talk) 07:46, 18 January 2015 (UTC)


 * Thank you very much. The whole thing clearly wasn't a case of WP:SYNTH or WP:OR even before this additional reference, but hey, now we know for sure that the sky is usually blue. :) &mdash; Dsimic (talk | contribs) 08:02, 18 January 2015 (UTC)


 * That’s considerably better. Technically it’s still not sufficient, but I doubt you’ll get too much blowback, though it would not surprise me if some “cite” templates sprouted somewhere along the line on individual examples. I’m afraid your opinions about WP:OR and WP:SYNTH do not reflect policy. In order for that section to be up to snuff, a credible reference must state that all the elements (not some) of RAID were already in place by the time the concept was formalized, and the specific examples in the list would have to appear in some reference as examples of RAID, not as examples of disks that Wikipedia editors have themselves deemed meet RAID characteristics. Without those qualifications being met, anyone can come along and question whether the edits comprise WP:NOTABLE material by pointing out that, if Wikipedia editors had to make those associations (WP:SYNTH), it means the material is not WP:NOTABLE. I’m not going to spend more time on this topic, but again, I urge you to hone your understanding of the policies and their purpose since any similar edits you make are likely to be challenged eventually. Strebe (talk) 02:27, 19 January 2015 (UTC)


 * Please don't get me wrong, but I see that as unproductive nitpicking. While the  section is very well covered by references, there are numerous other not-so-short articles with very few or even zero inline references, and nobody questions them.  If we'd go strictly by the guidelines, such articles would be giant WP:SYNTHs or WP:ORs and should be immediately deleted. &mdash; Dsimic (talk | contribs) 03:53, 19 January 2015 (UTC)

Merger proposal
I propose that Standard RAID levels be merged into RAID. I think that the content in the Standard RAID levels article can easily be explained in the context of RAID, if it is not already. I cannot justify the duplication of effort when it comes to listing RAID levels. One example is the of a table of levels to the Standard RAID levels article, when a much more established table of differences already exists in the RAID article. Listing basic RAID levels in two places is redundant, so I removed it. I felt bad when doing so because an editor worked hard on the new table, having not seen the existing one. It's easy to see how one would land on Standard RAID levels and assume they are on Wikipedia's one-and-only RAID article. – voidxor (talk &#124; contrib) 06:49, 23 January 2014 (UTC)


 * Oppose: From one side, that totally makes sense, as that way some content would be deduplicated.  Though, if we do that, then merging Nested RAID levels and Non-standard RAID levels is also on its way, what would simply make RAID article too large.  Also, there's more room for improvements to the Standard RAID levels article, in form of more content to be added, what additionally supports the counterargument of RAID article becoming too large. &mdash; Dsimic (talk) 14:53, 23 January 2014 (UTC)
 * I disagree that this merge would be a gateway to merge Nested RAID levels and Non-standard RAID levels; those topics aren't nearly as duplicated on RAID as the standard levels. Also, while your to RAID help to clarify that it is not the authoritative article on the levels, perhaps you should have waited until this discussion was closed before you changed the status quo! – voidxor (talk &#124; contrib) 06:48, 24 January 2014 (UTC)
 * I apologize, doing that edit was too much of "going ahead" from my side. Sorry! :(  At least, I hope you agree that it made the layout of RAID article much cleaner.
 * How about, maybe, a different approach – moving the content of  section into the Standard RAID levels article?  That would be another approach to deduplication of the content, while keeping (and improving) current relation between these four articles.  Also, we could add a hatnote to the "Standard RAID levels" article (and to "Nested RAID levels" and "Non-standard RAID levels" as well), pointing out the existence of an important "umbrella" article to anyone landing there.
 * Another reason behind opposing your proposal is that the merge would make "Nested RAID levels" and "Non-standard RAID levels" articles (and their summary sections) visibly much less valuable, thus more likely to be skipped by the readers. On the other hand, if we move the "Comparison" section as I just proposed, the "umbrella" article would provide an equal treatment to all categories of RAID configurations, achieving the content deduplication at the same time.
 * Thoughts? Once again, I apologize for running too fast. &mdash; Dsimic (talk) 19:44, 24 January 2014 (UTC)
 * I agree that your edits made it cleaner (which is normally a good thing), but I hope it does not affect the balance of this debate. Otherwise, apology accepted. Relocating the Comparison section into the Standard RAID levels article is another option for voters, but I am the nominator and thus recuse myself from commenting on it. Again, I would not worry about Nested RAID levels and Non-standard RAID levels right now as the majority of readers are here for levels 0–6. – voidxor (talk &#124; contrib) 07:01, 25 January 2014 (UTC)
 * Thank you. Let's see what the other editors are going to say. &mdash; Dsimic (talk) 17:18, 25 January 2014 (UTC)
 * Oppose: This is used by people in the workplace who want a quick reference. It works well and I refer to it 2 times a week.  — Preceding unsigned comment added by Lkshoe (talk • contribs) 17:01, 14 February 2014 (UTC)
 * This discussion is debating whether the information from both articles should be merged into one; nobody is suggesting deleting the reference information. – voidxor (talk &#124; contrib) 00:24, 20 April 2014 (UTC)
 * Oppose: The article's overly-long as it is; arguably more of it should be split to sub-articles. Chris Cunningham (user:thumperward) (talk) 13:06, 8 April 2014 (UTC)

RAID 10 is a nested (hybrid) level
Hello, 67.6.185.65! As I've noted in my, RAID 10 is a nested (hybrid) level, not one of the standard RAID levels. Thus, it is wrong to add RAID 10 into the section; please see  section for further information. &mdash; Dsimic (talk | contribs) 20:01, 14 April 2015 (UTC)

Comparisons of RAID 5 and other RAID levels are incorrect.
In the summary of RAID 4, it is contrasted with RAID 2 and 3. A quote "As a result, more I/O operations can be executed in parallel, improving the performance of small transfers.[2]" This is only true for reads. It is untrue for writes. RAID 4 serializes writes and can perform only 1 at a time. This is because all writes, by definition, must update their respective parity data and in RAID 4 this parity resides on a single spindle. RAID 4 does one write at a time, whether small write or full stripe, exactly like RAID 2 and 3. Worse it does so without their bandwidth advantages.

Further the reference to NetApp's RAID-DP is misleading. It's not RAID-4. Its closer to RAID 6 since each stripe has two parity blocks and one is diagonally formed.

Next, the key advantage of RAID 5 is that its parity is distributed over all the members of the raid string. Unlike RAID 4, a write is not serialized by a single spindle. On a RAID 5 array the number of simultaneous writes can be as a high as 1/3 of the number of members in the string, e.g. a 15 drive string could have 5 simultaneous writes occurring. A RAID 4 string can never have more than 1 regardless of its size. The article leads you to believe RAID 4 has some advantage, when it's all the other way.

Lastly, all of the negative points you make about RAID 5 are true for RAIDs 1, 3, and 4 as well. A read failure during rebuild or degraded operation (one drive failed) can happen for any of them and would be equally unrecoverable. You don't point any of that out.

The comments on RAID 5 have a 'sky is falling' tone and lack the above perspectives. — Preceding unsigned comment added by 73.16.141.73 (talk) 00:29, 7 December 2015 (UTC)

Could add something about assumptions: who is expected to benefit from this? No mention of Windows 7 Pro. User trying to do mirroring (RAID 1) having come across it in the create and format hard disk partitions section of control panel. Well the "system crash" section in this Wiki may be a warning to the newcomer, but Dsimic has labelled my point about power interruption as pretty much a nonsense. Is Dsimic suggesting the assumption that a UPS is what everyone would already have before thinking about RAID? Where would they have come across that knowledge? On a non-mirrored system a power failure is overcome easily at the next boot. It does not require a message sent to Microsoft &c, the way a "crash" may. Soundhill (talk) 12:00, 19 January 2016 (UTC) Soundhill

RAID 1 needs more
The article says: > The array will continue to operate so long as at least one member drive is operational. There is no description of the failure scenario. For people that are new to RAID, we could use a more complete description of what happens when a drive fails in RAID 1. For instance, is there a light or alarm, etc? Just another line or two would complete the description. Consider that a RAID 1 enclosure for two drives is the least expensive and most marketed RAID enclosure, so the Wikipedia description of RAID 1 is going to bring a lot of views. Thanks 75.110.98.103 (talk) 23:38, 4 February 2016 (UTC)


 * Well, any additional description would be highly system-dependent and confusing at least. For example, many people use software-based RAID, as a functionality provided by the operating system, so there's no fancy warning light in case of a drive failure. &mdash; Dsimic (talk &#124; contribs) 10:19, 5 February 2016 (UTC)

Reference websites and other stuff
Hello, ! Regarding, there are three things: With all that in mind, I've and improved the wording a bit further. Hope you agree. &mdash; Dsimic (talk | contribs) 22:30, 20 April 2015 (UTC)
 * Nobody owns anything, as you've described it for some reason; what is called peer review and articles actually benefit from it.  Moreover, it would be great if many more articles had active peer reviewers.
 * Article-level consistency should be more important than using all of the possible variants, and almost all references in this article have domains or hostnames for the values of their website parameters (where applicable, of course).
 * Wording can always be better, and small mistakes or typos should be discussed or corrected in a friendly manner instead of beind called "cruft" that's "proudly" put into an article. If there's anything causing a disagreement, we're here to discuss it.

Any chance of mentioning the 'old' name for RAID; which was "Redundant Array of Inexpensive Disks"? 203.214.22.116 (talk) 01:13, 14 March 2017 (UTC) Oops! I just found it... Sorry. 01:15, 14 March 2017 (UTC) — Preceding unsigned comment added by 203.214.22.116 (talk)

RAID 6 minimum number of discs
I just dumped the citation needed for "Needs 4 discs". There's a good explanation on the Standard_RAID_levels page, but the summary of a 3 disc system would be: Disc 1: Data Disc 2: XOR of all the data blocks. With one disc that's just a copy of the data. Disc 3: Clever correction based on the data. That need be no more than a copy. (then rotates around the discs for the other data blocks) That's just a 3-way mirror. — Preceding unsigned comment added by Number774 (talk • contribs) 14:38, 5 July 2017 (UTC)

RAID 1 offers no parity
The article states RAID 1 offers no parity, but a mirror is an even parity. I think it adds no value to explain this to the reader, but maybe it can be corrected? — Preceding unsigned comment added by 145.53.72.151 (talk) 18:54, 19 January 2016 (UTC)


 * While technically correct, RAID 1 is never used as parity. To use as parity, the data from both disks would have to be constantly compared -- a large overhead on reads.  But even if a miscompare is found, data recovery isn't possible -- there's no indication of which bit is the wrong one.  In theory, this might still be done to detect data errors that were never reported by the disks, but for such errors to be common assumes that the disks are unreliable, which RAID 1 would only make worse by having more disks.  So, being overhead to implement and providing little help for a type of error that is assumed never to occur, RAID 1 as parity isn't done.  --A D Monroe III (talk) 16:20, 5 July 2017 (UTC)

Actually the above comment is wrong on several counts. First a RAID 1 array could have more than 2 members. HP OPENVMS provides this, up to 8 in fact in what they call a shadowset. With more than 2 disks you can catch inconsistent copies. Further it implies that parity systems check parity on reads, they don't. This would require reading the entire stripe, which is too much overhead. With RAID-5 this would be useless as you wouldn't know which version was correct. With RAID-6 some implementations scrub parity in the background, since there are three sources for each block, the block itself, first parity, and second parity. This allows for a voting mechanism. Uncommon RAID systems such as RAID-3 might actually check parity on each read but it is an academic point, there are very few if any commercial versions of them.

As well the article claims RAID-1 read performance is "slower than the fastest drive". In virtually all implementations a read is queued to the disk member with the shortest operations queue. In practice RAID-1 sets give superior read response times and total read rates as compared to single drives. — Preceding unsigned comment added by Sciencia61 (talk • contribs) 12:35, 19 June 2018 (UTC)

Originally it was inexpensive.
You can see the original paper here: http://www.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf   RAID was originally a cost reduction method, to allow for the use of cheaper disks than were sold by the then large vendors (IBM and DEC). Parity was added to make up for the perceived unreliability of the cheaper disks. Later when RAID was commercialized, and made expensive by the likes of EMC and NETAPP, marketing efforts to justify the now high prices required the change from inexpensive to independent. You can read one of the original authors (Katz) commenting on the very issue in the last paragraphs of this paper: https://web.eecs.umich.edu/~michjc/eecs584/Papers/katz-2010.pdf — Preceding unsigned comment added by Sciencia61 (talk • contribs) 12:53, 19 June 2018 (UTC)

Added further quote from SNIA about Write Cache reliability
I added a further reference from the SNIA document explaining that while write caches had a power fail vulnerability, good implementations design for this and add methods to ensure no data is lost. In practice this is almost always a battery backup system, usually with redundant batteries. — Preceding unsigned comment added by Sciencia61 (talk • contribs) 14:07, 19 June 2018 (UTC)

RAIS/RAIN Relevance?
Footnote 82 is 404 (broken link) — Preceding unsigned comment added by 2607:FEA8:560:3F6:C534:53BD:159A:817C (talk) 14:47, 18 January 2018 (UTC)

This write-up is completely missing the transition from disk controller level RAID to computer system level RAID in which computer systems are virtualized as disks and entire computer system failure data was masked via fail-over.

SeaChange International invented and patented a system level RAID in 1995. Patent number #5,862,312.

By 1999 Hewlett Packard and Larry Elison's nCuBE start-up successfully challanged the SeaChange patent's "network switch" claim in Delaware superior court, which led to wide use of the patent in computer cluster systems which is often called RAIN for Redundant Array of Independent Nodes. For instance, this is the original design idea of the first Isilon Systems product launched in 2001 accdoring to Wikipedia.

The Wiki articles on RAIN, RAIS (systems) etc. also do not document the history of this virtualization from disks to anything replicatable that needs redundancy. Today we even have RAIN overload: Redundant Array of Independent NAND which you can find inside every SSD ! — Preceding unsigned comment added by SeaChanger (talk • contribs) 20:38, 19 January 2019 (UTC)
 * The above was moved into its own section from its original posting in this talk. FWIW its not clear to me that the Redundant_Array_of_Inexpensive_Servers is at all relevent to this article.  Also it is pretty clear that the earliest RAIDs (Apollo, Tandem mirrors) were system level and not controller level.  Tom94022 (talk) 00:46, 20 January 2019 (UTC)


 * Not relevant here. There are some superficial similarities but the technical concepts are quite different. We could add See also entries though due to the name similarity. --Zac67 (talk) 08:22, 20 January 2019 (UTC)

Grammar?
Shouldn't the following: "Consequently, using RAID for consumer-marketed drives can be risky, and so-called "enterprise class" drives limit this error recovery time to reduce risk...."

be changed to: "Consequently, using consumer-marketed drives for RAID can be risky, and so-called "enterprise class" drives limit this error recovery time to reduce risk...."

?

--176.20.208.76 (talk) 13:40, 22 February 2019 (UTC)


 * Thanks for the suggestion, I've changed the text. --Zac67 (talk) 08:11, 23 February 2019 (UTC)