Independent Reference Model

The Independent Reference Model (I.R.M.) is a conceptual model used in the analysis of storage system: disk drives, caches, etc.

Under this model the references to stored objects are independent random variables.

Description
The motivation for coming up with this model (and others like it) is to compensate for the lack of "traces" in such storage devices.

A "trace" is simply a logging of a set of data regarding the performance of the storage device, focusing on the I/O requests: how many read/write operations, the size of each request, the exact address (LUN-wise), and a time-stamp.

Accurate, valid and detailed traces of actual storage systems are very difficult to get for the purpose of academic analysis (for reasons that are beyond the scope of this article), and that is why such models are a necessity.

Usually, the data that is available is much coarser and low in quality, in comparison to a full trace: For example, the data might record for every unit of time T, the number of I/Os that took place at each LUN (or track), along with the total hit/miss ratios.

For example: For a disk drive with 4 tracks, A, B, C and D and after 15 minutes of work, the I/O requests were the following: 7600, 20, 50, 6000 for A, B, C and D respectively.

It is easy to see why this data is insufficient to determine the actual workload: Consider a second even more simple example: Two tracks, A and B, which each have 1,000 I/Os during 15 minutes.

To answer the simple question, "How hard did the disk work during those 15 minutes?" then consider these two following scenarios: It is easy to see that, at each of these scenarios, the amount of work done by the disks is very different (in the first scenario, the case being that the disk did a minimal amount work, not having to travel between tracks more than once, and in the second scenario, a maximal amount of work).
 * (I) The disk first received and responded to all 1,000 I/O requests at track A, and later, all 1,000 I/Os of track B.
 * (II) The disk received and responded to an I/O request from a different track interchangeably: First at A, then at B, then A again, alternating A/B up to 1000 times.

The I.R.M. was first introduced by E. Coffman and P. Denning, and it is still in active use today. It is the most simplified model.

In this "memoryless" model, every I/O reference represents an i.i.d multinomial random variable, whose outcome is the location of the next reference track. Thus, arrivals to any given track must occur at a specific average rate, which is directly proportional to the probability of requesting the track. More specifically, the arrival rate of requests to a given track equals its access probability, of being referenced, multiplied by the overall rate of requests.

That is, we denote N as the sum of all the I/O requests (both read and write) and assign to each track the probability the number of I/Os, which came from it, divided by N. In the case of our first example: N = 7600 + 20 + 50 + 6000 = 13,670 and we'll assign the following probabilities to each track:
 * A → 7600/N, B → 20/N, C → 50/N and D → 6000/N.

The benefit of this model, other than being simple and easy to work with, is its conservative property. This means that when analyzing the worst-case-scenario, we cannot be off by very much in the result of the model, as illustrated in the following example:


 * Going back to the second example:
 * In the best-case-scenario - the disk only traveled once from A to B, and in the worst-case, the disk traveled 2,000 times back or forth.
 * Using the I.R.M. model (a technical computation that will not be brought here), then the expectancy is 1,000 travels between tracks.
 * That is: the result was off from the worst-case-scenario by a multiple of two, while in the best-case-scenario, it actually spared a multiple of 1,000!

Indeed, it can be proven that the I.R.M. model always satisfies that it is always "off" by, at most, a multiple of two.