Talk:Direct memory access

PCI Express Section
The PCIe section doesn't make much sense. It contains a single sentence, "PCI Express uses DMA. The DMA engine appears as another function on the upstream post with a TYPE 0 configuration header." First of all, obviously "port" is meant, not "post". But PCI Express doesn't use DMA and devices don't need to implement DMA to comply with PCI Express; PCI Express is a protocol which can be used for DMA. DMA engines aren't PCIe functions. Depending on architecture there may be a one-to-one, many-to-one, or one-to-many correspondence between DMA engines and PCIe channels of a device. For example, in an SR-IOV device, there could be many virtual Functions sharing one (or a few) engine(s).

I'm not sure whether the section should be replaced with something more informative, or deleted altogether. 198.70.193.2 (talk) 17:44, 8 April 2010 (UTC)


 * I have removed it for now. -- intgr [talk] 18:57, 9 April 2010 (UTC)

IO Accelerator in Xeon
"in CPU utilization with receiving workloads, and no improvement when transmitting data.[4]" The source cited seems to indicate the improvements are more complex than simple CPU utilization measurements indicate. In particular, this seems relevant: "This data shows that I/OAT really benefits from larger application buffer sizes. There is a CPU spike at 2K, although also increased throughput." Which seems to indicate that I/OAT is enabling greater throughput and CPU utilization with buffers <2K. —Preceding unsigned comment added by 68.50.112.195 (talk) 18:14, 30 March 2009 (UTC)

"Principle"
This section was either lifted directly from http://www.avsmedia.com/OnlineHelp/DVDCopy/Appendix/dma.aspx, or visa-versa. —Preceding unsigned comment added by 74.229.8.169 (talk) 11:31, 11 January 2008 (UTC)


 * This section existed on Wikipedia verbatim back in 2006 (and probably much earlier as well). According to web.archive.org, that page appeared in May 2007, and their copyright also states 2007. -- intgr [talk] 22:56, 11 January 2008 (UTC)

"[...] and skillfully created applications can outperform cache."
I've removed this sentence because it makes no sense to me.
 * "DMA transfers are essential to high performance embedded algorithms and skillfully created applications can outperform cache."

How can an application outperform cache? Did you mean an application (implementation) of DMA? If so, perhaps the term "implementation" should be used, because "application" certainly reminds of the concept of a software application. Even then, can DMA outperform cache? Aren't we comparing apples to oranges, or at least aren't we unless the context is made clearer?

LjL 21:59, 28 Apr 2005 (UTC)

Is UDMA related to DMA?
Question: Is UDMA related to DMA? — Preceding unsigned comment added by 88.105.167.50 (talk) 00:07, 23 May 2006 (UTC (UTC)

Yes. UDMA is an advanced DMA for hard disks and CD/DVD drives. — Preceding unsigned comment added by 82.155.155.251 (talk) 02:33, 1 July 2006 (UTC)
 * I would rather phrase it like this: UDMA is the name of the capability of non-ancient ATA chipsets to use DMA to transfer data directly to/from system memory. Saying that "UDMA is an advanced DMA" makes it sound as if UDMA is some new, special DMA technique, which it is not. ATA chipsets use normal PCI bus-mastering to do DMA, just like any other PCI component. It's just that ATA chipsets lacking UDMA don't do DMA at all, and have to be bit-banged by the CPU. --Dolda2000 23:01, 31 August 2007 (UTC)

"[...]slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM."
Can someone explain this to me? How is it that the process of accessing I/O devices can be slower than normal system ram? Did the author mean slower than accessing RAM? Exactly what is doing the copying, and from where to where? And which process is slower? Dudboi 12:21, 5 November 2006 (UTC)


 * It's slower since the CPU would be occupied chunking out bits and bytes whenever it's directly communicating to a device. When using DMA, the CPU will just send out a DMA command and the device will be able to act on its own. Also, this way the CPU can request data transfers between two devices without fetching the data into its own cache. -- intgr 16:09, 5 November 2006 (UTC)


 * Oh, ok I think I get it now, thanks for clearing that up. I think the sentence needs to be reworded though. It's not very clear, especially since it compares the speed of a process (accessing the devices - verb) with that of a hardware component (RAM - noun) —Dudboi 23:48, 5 November 2006 (UTC)


 * Just for future reference, I thought I'd make it a bit clearer (hopefully ;). Say that an IDE controller has just fetched a sector from disk. The sector then resides in the IDE controller's buffer, and needs to be copied to system memory. Without using DMA, the CPU would have to have to ask the IDE controller for each individual word (32 bits on PCI), and then copy it to system memory (during the transactions from the IDE controller to the CPU, the CPU would be the bus master). Using DMA, however, the IDE controller would request PCI bus ownership and transfer the sector on its own to system memory, and signal the processor when it's done. It is worth noting that with PCI, the DMA-less process needs not necessarily be slower in terms of bandwidth, since both the transfer of one word from the IDE controller to the CPU having the CPU as bus master and the transfer from the IDE controller to RAM having the IDE controller as bus master would not necessarily have to take more than one PCI clock cycle (15 or 30 ns depending on bus speed). However, using DMA, the CPU is free to do whatever it wishes in the meantime, rather than having to idle while waiting for PCI transactions to complete, increasing parallelism greatly. It is worth noting, of course, that ATA PIO is a lot slower than ATA UDMA, and while I know too little about ATA to speak with authority, I suspect it to be because the ATA PIO protocol simply does not allow sequential reads to fetch successive bytes from the buffer, but uses some other PCI protocol. It could also be because of some ATA protocol peculiarity between the controller and the disk itself of which I know nothing. --Dolda2000 23:15, 31 August 2007 (UTC)

Dubious "counterexamples"
Note that I'm not very knowledgeable about low-level hardware interaction, but I found two out of the three bullets in the "counterexamples" section dubious: Not entirely true – at least AGP/PCIe graphics cards these days come with an IOMMU (GART). Not sure about other kinds of devices.
 * PC architectures after the ISA lost it ability to use DMA for memory defragmentation or initialization.

Is it really "so expensive on a PC" or does it just have more overhead considering today's processing power? I don't know about 2D hardware, but mapping textures to surfaces is very trivial in 3D hardware, these days.
 * DMA commanding in a PC is so expensive and dumb that it is not used for bit blit.

This bullet says "later in history" – but later than what? -- intgr 16:09, 5 November 2006 (UTC)
 * The ATA hard disk interface moved from programmed input/output (PIO) to direct memory access (DMA) only later in history.


 * The 'counterexamples' section has since been removed (diff) -- intgr 19:09, 9 November 2006 (UTC)

Strcpy example
I don't think the strcpy example was a particularly good one about DMA engines, and I don't think mention of DMA engines deserves a place in the lead section, either. strcpy is a particularly problematic function because:
 * The length of the string is not known in advance - it's terminated by a null byte. I would be willing to bet that DMA engines do not generally have any logic to search for a terminator, as they are designed for bulk transfers.
 * Strings are typically very short and likely to be in the CPU cache anyway whenever a copy is issued.
 * The overhead of simply copying bytes is much less compared to (1) making a syscall; (2) sending an I/O request, halting the requesting process, context switching to another process; (3) handling an interrupt and re-scheduling the process.

I can see, however, that DMA engines can be beneficial when copying large buffers, or when building for example, network packets within the kernel. And indeed, such copies are not currently offloaded since today's computers lack such a device. Thus I've created a new section, 'DMA engines' for this. It's still a stub, though; Intel's I/OAT certainly deserves a mention. -- intgr 17:12, 12 November 2006 (UTC)


 * I see you added the blurb about I/OAT. I thought I would mention I changed that because I/OAT (code name Crystal Beach) is not implemented in the processors but rather in the chipsets. Since there is already I/O DMA via PCI bus-mastering, I/OAT (as you probably know) is designed for memory-to-memory DMA and as such is best implemented in the memory controller (which is usually in the MCH/north bridge on Intel chipsets that implement I/OAT). It is very nice to have the memory controller create a device that commands can be sent to copy memory blocks about. 64.122.14.55 (talk) 03:27, 25 April 2008 (UTC)

a bad idea?
This seems like a bad idea to me. How would the CPU know when the memory is being written - what if the device is in the middle of updating a couple of KB of data in memory and the CPU reads off the whole range and gets half the new values and half the old values? Why not just have a dedicated component on the CPU for data throughput that shares the CPU clock and makes the appropriate information available to memory protection systems? --⁪froth T C  17:20, 28 November 2006 (UTC)


 * The device will send an interrupt when it's done with a DMA request, and only after that will the CPU attempt to read that data or do anything with it. I can't see what exactly you have in mind with the I/O component. If its only job was to mediate between devices an the memory while guaranteeing memory protection, DMA through an IOMMU would essentially achieve the same, except that each bus can have its own IOMMU operating at the native clock rate of the bus, and the data would not even have to congest the CPU or its bus at all (except when explicitly read by the CPU). I think DMA is a brilliant idea. :) -- intgr 18:24, 28 November 2006 (UTC)

32-bit address bus
"A modern x86 CPU may use more than 4 GiB of memory, utilizing PAE, a 64-bit addressing mode. In such case, a device using DMA with 32-bit address bus is unable to address the memory above 4 GiB line."

I don't understand what the phrase "32-bit address bus" means in the context above. Is this referring to the device's own addressing ability, or some bus external to the device, or something else? -- AzzAz (talk) 20:27, 28 February 2008 (UTC)

DMA Channels
So DMA channels are ISA-specific, right? In other words they would not be applicable to PCI, since any PCI device can bus-master? Also, what is the "Direct Memory Access Controller" shown with Channel 4 in msinfo32.exe on Windows? -- AzzAz (talk) 20:27, 28 February 2008 (UTC)


 * No, DMA is a generic concept. PCI bus-mastering is a type of DMA. It is true PC architecture (but by no means all computers) has ISA DMA controllers. It is also true that with the advent of PCI bust-mastering there is another type of DMA that is sttndard in PC architecture now. 64.122.14.55 (talk) 03:21, 25 April 2008 (UTC)

History Section?
Can someone please create a history section for this? I am curious as to when DMA came into existence. sweecoo (talk) 21:41, 12 November 2008 (UTC)


 * I concur. I've traced introduction of DMA to PDP-1 in "Computer Engineering. A DEC view of hardware systems design " by Gordon Bell et all. The book references "High speed channel data transmission" that is documented in "PDP-1 input-output systems manual" and "PDP-1 Handbook". Both can be found on PDP-1 Specifications page at computer history museum. The book by G. Bell also notes that there were special I/O processors such as IBM 7090 channels that were as expensive as PDP-1. Roolebo (talk) 01:08, 15 February 2019 (UTC)
 * The first IBM computer with data channels was the IBM 709, as per ; that came out in 1958, so it predates the PDP-1. If "DMA" is defined as "transfer between an I/O device and memory without any CPU instructions doing the transfer", it dates back at least to the 709.
 * (I specifically refer to "CPU instructions" because 1) the transfer may involve stalling the CPU briefly if it needs to access memory while the transfer is in progress - which was the case for the PDP-1's high speed data channels - and 2) data channels on, for example, the lower-end models of the IBM System/360 line implemented the data channels specified as part of the System/360 architecture using CPU data paths and microcode, so, while no CPU instructions are involved in transferring the data, the CPU isn't busy fetching and executing instructions while its microcode is busy doing a data transfer. In the first case, the memory access may be "direct" in the sense of "not using CPU data paths and cycles", but it's "direct" in that it doesn't require the CPU to execute instructions to do the transfer - microinstructions, yes, but CPU instructions, no, and the cycle stealing probably runs faster than would an interrupt with a service routine that transfers data.  The 709 data channels didn't use CPU data paths in that fashion, and there was no microcode to use, so all you'd have would be a memory bus contention stall.)
 * For those special I/O processors, see channel I/O, although 1) as noted in the parenthetical note, depending on how you define "direct", some channels don't "directly" access memory without CPU involvement and 2) in minicomputer-style I/O (as used by minis, superminis, and most microprocessors), instead of handing an I/O program to a programmable data channel, the CPU just sets up some device registers with I/O instructions or memory-reference instructions that refer to memory-mapped device registers, the last of which starts the operation, which eventually does a DMA transfer. Guy Harris (talk) 06:35, 17 August 2022 (UTC)

What memory types can DMA work on?
The article could explain what memory types can be read or written with DMA. For example does it work to/from flash memory?--85.78.29.213 (talk) 08:38, 13 January 2011 (UTC)

Rewrite needed
DMA controllers don't do checksum calculations; if it was that smart, it would be an IO processor, not a simple DMA controller. Need to talk about bus mastering and cycle stealing; in the PCish world, you can either use the DMA controller on board or become bus master and write to memory directly. ( I can kind of see how this would have worked in the old days, but have no idea how modern designs do this.) --Wtshymanski (talk) 02:38, 23 August 2011 (UTC)

Diagram needed
We have a diagram to explain cache coherency but not one that explains how DMA works. This could be two panels; first panel shows CPU doing reads/writes to IO device, and data passing through a CPU register to/from memroy. Second panel shows CPU doing something else and DMA controller doing the transfers. Rainy day project for me if I can't find one on Commons. --Wtshymanski (talk) 13:58, 30 March 2012 (UTC)
 * I agree. Other related pages also lack illustrative diagrams.
 * I think that more coherent Hardware Architecture articles should be written.
 * This and other related articles are too long and hard to understand.
 * Rewriting everything structured in a more general way with the appropriate diagrams, from general concepts to different variants of the problems that emerge in the integration of computer components, could be better explained.
 * DMA is one solution to integrate computer components in a modular way, this can be shown using diagrams comparing different circuit layouts, and how the components interchange data in each layout.
 * The Intel CPUs are everywhere but are not the only architecture in the world. It is important to depict the general concept of DMA and related topics first.
 * Then how actual CISC and RISC systems applied those concepts. Also how they evolved by developing industry standards to ease the modular integration of parts.
 * The diagrams should depict the layout of components connected by buses paying attention in how synchronization problems are solved. Also how different architectures evolved to support more CPUs and how peripherals were connected in the past and today.
 * Synchronization issues, can be explained with diagrams and simple programs in an abstract very general language.

Diagram is needed for dma Lingeshwaram (talk) 15:19, 9 November 2022 (UTC)

Scatter/gather and Vectored I/O the same?
I added a wikilink from scatter/gather to vectored I/O. Are these two terms indeed referring to the same thing? — Preceding unsigned comment added by Jimw338 (talk • contribs) 16:03, 8 August 2012 (UTC)
 * Vectored I/O is about scatter/gather at the OS API layer, such as the readv and writev calls in UN*Xes and the ReadFileScatter and WriteFileGather calls in Windows. Those calls might be implemented using scatter/gather at the hardware I/O layer, but there may be other places where scaller/gather at the hardware I/O layer is used. Guy Harris (talk) 23:13, 17 August 2022 (UTC)

PCI-part: Modern design: Soutn+North-Bridge?
The article is talking about modern architecture and uses there the North and the South-Bridge. In modern architectures is only one "Hub" left. — Preceding unsigned comment added by 94.175.83.222 (talk) 22:20, 22 October 2014 (UTC)

Undue emphasis
We don't need a lot of undue emphasis on "DMA Attacks" in the main text, do we? We might as well write that the keyboard is a security vulnerability because it allows the user to delete files, alter system parameters, or even shut down the computer. There's no protecting a computer from random hardware plugged into it. --Wtshymanski (talk) 05:52, 4 November 2018 (UTC)

India Education Program course assignment
This article was the subject of an educational assignment at Department of Electronics and Telecommunication, College of Engineering, Pune, India supported by Wikipedia Ambassadors through the India Education Program&#32;during the 2011 Q3 term.&#32;Further details are available on the course page.

The above message was substituted from by PrimeBOT (talk) on 19:58, 1 February 2023 (UTC)