User:Tlwiechmann/Sandbox/MUMPS

MUMPS (Massachusetts General Hospital Utility Multi-Programming System), or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed to make writing database-driven applications easy while simultaneously making efficient use of computing resources. It was adopted as the language-of-choice for many healthcare and financial information systems/databases (especially ones developed in the 1970s and early 1980s) and continues to be used by many of the same clients today. It is currently used by the world's largest electronic health record systems as well as by multiple banking networks and online trading/investment services.

Because it predates C and most other popular languages in current usage, it has very different syntax and terminology. It offers a number of features unavailable in other languages, including some rarely used programming and database concepts. Variations of MUMPS are available which have extended the language to include object orientation and even use of SQL statements.

History
MUMPS was developed by Neil Pappalardo and colleagues in Dr Octo Barnett's animal lab at Massachusetts General Hospital (MGH) in Boston during 1966 and 1967. The original MUMPS system was, like Unix a few years later, built on a spare DEC PDP-7.

Octo Barnett and Neil Pappalardo were also involved with MGH's planning for a Hospital Information System, obtained a reverse-compatible PDP-9, and began using MUMPS in the admissions cycle and laboratory test reporting. MUMPS was then an interpreted language and incorporated a hierarchical database file system to standardize interaction with the data. Some aspects of MUMPS can be traced from Rand Corporation's JOSS through BBN's TELCOMP and STRINGCOMP. The MUMPS team deliberately chose to include portability between machines as a design goal. Another feature, not widely supported for machines of the era, in operating systems or in hardware, was multitasking, which was also built into the language itself.

The portability was soon useful as MUMPS was shortly adapted to a DEC PDP-15 where it lived for some time. MUMPS was developed with the support of a government research grant, and so MUMPS was released to the public domain (no longer a requirement for grants), and was soon ported to a number of other systems including the popular DEC PDP-8, the Data General Nova and the DEC PDP-11. Word about MUMPS spread mostly through the medical community, and by the early 1970s was in widespread use, often being locally modified for their own needs.

By the early 1970s, there were many and varied implementations of MUMPS on a range of hardware platforms. The most widespread was DEC's MUMPS-11 on the PDP-11, and Meditech's MIIS. In 1972, many MUMPS users attended a conference which standardized the now fractured language, and created the MUMPS Users Group and MUMPS Development Committee (MDC) to do so. These efforts proved successful; a standard was complete by 1974, and was approved, on September 15, 1977, as ANSI standard, X11.1-1977. At about the same time DEC launched DSM-11 (Digital Standard MUMPS) for the PDP-11. This quickly dominated the market, and became the reference implementation of the time.

During the early 1980s several vendors brought MUMPS-based platforms that met the ANSI standard to market. The most significant were Digital Equipment Corporation with DSM (Digital Standard MUMPS), and InterSystems with their ISM (InterSystems M) on VMS and UNIX, and M/11+ on the PDP-11 platform. Other companies who developed important MUMPS implementations were:
 * Greystone Technology Corporation with a compiled version called GT.M,
 * DataTree Inc. with an Intel PC based product called DTM,
 * Micronetics Design Corporation with a product line called MSM for UNIX and Intel PC platforms (later ported to IBM's VM operating system), and
 * M-Global with MGM, a Mac OS based product.

M-Global MUMPS was the first commercial MUMPS for the PC and the only Mac implementation. DSM-11 was superseded by VAX/DSM for the VAX/VMS platform, and that was ported to the Alpha in two variants: DSM for OpenVMS, and as DSM for Ultrix.

This period also saw considerable MDC activity. The second revision of the ANSI standard for MUMPS (X11.1-1984) was approved on November 15, 1984. On November 11, 1990 the third revision of the ANSI standard (X11.1-1990) was approved. In 1992 the same standard was also adopted as ISO standard 11756-1992. Use of M as an alternative name for the language was approved around the same time. On December 8, 1995 the fourth revision of the standard (X11.1-1995) was approved by ANSI, and by ISO in 1999 as ISO 11756-1999. The MDC finalized a further revision to the standard in 1998 but this has not been presented to ANSI for approval. On 6 January 2005, ISO re-affirmed its MUMPS-related standards: ISO/IEC 11756:1999, language standard, ISO/IEC 15851:1999, Open MUMPS Interconnect and ISO/IEC 15852:1999, MUMPS Windowing Application Programmers Interface.

By 2000, the middleware vendor InterSystems had become the dominant player in the MUMPS market with the purchase of several other vendors. Initially they acquired DataTree Inc. in the early 1990s. And, on December 30, 1995, Intersystems acquired the DSM product line from DEC. InterSystems consolidated these products into a single product line, branding them, on several hardware platforms, as OpenM. In 1997, InterSystems essentially completed this consolidation by launching a unified successor named Caché. This was based on their ISM product, but with influences from the other implementations. Micronetics Design Corporation assets were also acquired by InterSystems on June 21, 1998. Intersystems remains today (2007) the dominant MUMPS vendor, selling Caché to MUMPS developers who write applications for a variety of operating systems.

Greystone Technology Corporation's GT.M implementation was sold to Sanchez Computer Associates Inc. (now part of Fidelity National Financial Inc.) in the mid 1990s. On November 7, 2000 Sanchez made GT.M for Linux available under the GPL license and on October 28, 2005 GT.M for OpenVMS and Tru64 UNIX were also made available under the GPL license. GT.M continues to be available on other UNIX platforms under a traditional license.

The newest implementation of MUMPS, released in April 2002, is an MSM derivative called M21 from the Real Software Company of Rugby, UK.

There are also several open source implementations of MUMPS, including some research projects. The most notable of these is Professor Kevin O'Kane's (and students') project, now at the University of Northern Iowa.

One of the original creators of the MUMPS language, Neil Pappalardo, early founded a company called Meditech. They extended and built on the MUMPS language, naming the new language MIIS (and later, MAGIC). Unlike Intersystems, Meditech no longer sells middleware, so MIIS and MAGIC are only used internally at Meditech.

Current users of MUMPS applications
The US Veteran's Administration was one of the earliest major adopters of the MUMPS language. Their development work (and subsequent contributions to the free MUMPS application codebase) was an influence on many medical users worldwide. In 1995, the Veterans Administration's patient Admission/Tracking/Discharge system, Decentralized Hospital Computer Program (DHCP) was the recipient of the Computerworld Smithsonian Award for best use of Information Technology in Medicine. A decade later (July, 2006), the Department of Veterans Affairs (VA) / Veterans Health Administration (VHA) was the recipient of the Innovations in American Government Award presented by the Ash Institute of the John F. Kennedy School of Government at Harvard University for its extension of DHCP into VistA in July, 2006. Nearly the entire VA hospital system in the United States and the Indian Health Service, as well as major parts of the Department of Defense CHCS hospital system all run systems using MUMPS databases for clinical data tracking.

Large companies currently using MUMPS include AmeriPath (now part of Quest Diagnostics), Care Centric, Team Health, Epic Systems Corporation, EMIS, Partners HealthCare, Meditech, and GE Healthcare (formerly IDX Systems and Centricity). Many reference laboratories, such as Quest Diagnostics and Dynacare, use MUMPS software written or based on by Antrim Corporation code. Antrim, and its parent Sunquest, was acquired by Misys in 2001.

Coventry Healthcare and Massachusetts Hospital have also been reported to use MUMPS.

MUMPS is also widely used in financial applications. MUMPS gained an early following in the financial sector, and MUMPS applications are in use at many banks and credit unions. It is used by Ameritrade, the largest online trading service in the US with over 12 billion transactions per day, as well as by the Bank of England and Barclays Bank, among others.

As of 2005 most use of M is either in the form of GT.M or InterSystems Caché. The latter is being aggressively marketed by InterSystems and has had success in penetrating new markets, such as telecommunications, in addition to existing markets.

Overview
MUMPS is a language intended for and designed to build database applications. Secondary language features were included to help programmers make applications using minimal computing resources. The original implementations were interpreted, though modern implementations may be fully or partially compiled.

The most outstanding, and unusual, design feature of MUMPS is that database interaction is transparently built into the language. The MUMPS language assumes the presence of a MUMPS hierarchical database, which is implicitly "opened" for every application. All variable names prefixed with the caret character ("^") use permanent (instead of RAM) storage, will maintain their values after the application ends, and will be visible to (and modifiable by) other running applications. Variables using permanent storage are called Globals in MUMPS, though this is not meant in the usual sense of unscoped variables.

Additionally, all variables (both RAM and disk-based) are hierarchical. They can all have child nodes (called subscripts in MUMPS terminology). Thus, the variable 'Car' can have subscripts "Door", "Steering Wheel" and "Engine", each of which can contain a value and have subscripts of their own. Thus, you could say

to modify a nested child node of ^Car. In MUMPS terminology, "Color" is the 2nd subscript of the variable ^Car (both the names of the child-nodes and the child-nodes themselves are called subscripts). Hierarchical variables are similar to objects with properties in Object Oriented languages. Additionally, all subscripts of variables are automatically kept in sorted order. Numeric subscripts (including floating-point numbers) are stored from lowest to highest. All non-numeric subscripts are stored in alphabetical order following the numbers. In MUMPS terminology, this is canonical order. By using only non-negative integer subscripts, the MUMPS programmer can emulate the Arrays data type from other languages. Although MUMPS does not natively offer a full set of DBMS features, several DBMS systems have been built on top of it that provide application developers with flat-file, relational and network database features.

As a secondary language feature, you can abbreviate nearly all commands and native functions to a single character to save space; this was a common feature of languages designed in this period (eg, early BASICs). Additionally, there are built-in operators which treat a delimited string (eg, comma-separated values) as an array. Early MUMPS programmers would often store a structure of related information as a delimited string, parsing it after it was read in; this saved disk access time and offered considerable speed advantages on some hardware.

MUMPS has no data types. Numbers can be treated as strings of digits, or strings can be treated as numbers by numeric operators (coerced, in MUMPS terminology). Coercion can have some odd side effects, however. For example, when a string is coerced, the parser turns as much of the string (starting from the left) into a number as it can, then discards the rest (Also known as Weak typing). Thus the statement  is evaluated as   in MUMPS.

Other features of the language are intended to help MUMPS applications interact with each other in a multi-user environment. Database locks, process identifiers, and atomicity of database update transactions are all required of standard MUMPS implementations.

In contrast to languages in the C or Wirth traditions, some space characters between MUMPS statements are significant. A single space separates a command from its argument, and a space, or newline, separates each argument from the next MUMPS token. Commands which take no arguments (eg, ) require two following spaces. The concept is that one space separates the command from the (nonexistent) argument, the next separates the "argument" from the next command. Newlines are also significant; an,   or   command processes (or skips) everything else til the end-of-line. To make those statements control multiple lines, you must use the  command to create a code block.

"Hello, World!" in MUMPS
A simple Hello world program in MUMPS might be:

and would be run from the MUMPS command line with the command ' '. Since MUMPS allows commands to be strung together on the same line, and since commands can be abbreviated to a single letter, this routine could be even more compact:

The ' ' after the text generates a newline. The ' ' is not strictly necessary at the end of a function like this, but is good programming practice in case other functions are added below ' ' later.

Summary of key language features
The following summary seeks to give programmers familiar with other languages a feeling for what MUMPS is like. This is not a formal language specification, and many features and qualifiers have been omitted for brevity. ANSI X11.1-1995 gives a complete, formal description of the language; an annotated version of this standard is available online.

Data types: There is one universal datatype, automatically interpreted/converted to string, integer, or floating-point number as context requires.

Booleans: In IF statements and other conditional statements, any numeric, nonzero value is treated as True. yields 1 if a is less than b, 0 otherwise.

Declarations: None. All variables are dynamically created on first reference.

Lines: are important syntactic entities, unlike their status in languages patterned on C or Pascal. Multiple statements per line are allowed and are common. The scope of IF and FOR is "the remainder of current line."

Case sensitivity: Commands and intrinsic functions are case-insensitive. In contrast, variable names and labels are case-sensitive. There is no special meaning for upper vs. lower-case and few widely followed conventions. The percent sign (%) is legal as first character of variables and labels.

Postconditionals:   sets A to "FOO" if N is less than 10;   performs PRINTERR if N is greater than 100. This construct provides a conditional whose scope is less than a full line.

Abbreviation: You can abbreviate nearly all commands and native functions to one or two characters.

Reserved words: None. Since MUMPS interprets source code by context, there is no need for reserved words. You may use the names of language commands as variables. There has been no obfuscated MUMPS contest as in C, despite the potential of examples such as the following, perfectly legal, MUMPS code:

Arrays: are created dynamically, stored as B-trees, use almost no space for missing nodes, can use any number of subscripts, and subscripts can be strings or numeric (including floating point). Arrays are always automatically stored in sorted order, so there is never any occasion to sort, pack, reorder, or otherwise reorganize the database. $ORDER, $ZPREVIOUS, and $QUERY functions provide efficient traversal of the fundamental array structure, on disk or in memory.

Local arrays: variable names not beginning with caret are stored in memory by process, are private to the creating process, expire when the creating process terminates. The available storage depends on partition size, but is typically small (32K). Efficient memory allocation means that this was little practical impediment in former, memory starved, times and is still less an issue today.

Global arrays:. These are stored on disk, are available to all processes, and are persistent when the creating process terminates. Very large globals (eg, hundreds of megabytes) are practical and efficient in most implementations. This is MUMPS' main "database" mechanism. It is used instead of calling on the operating system to create, write, and read files.

Indirection: in many contexts,  can be used, and effectively substitutes the contents of VBL into another MUMPS statement. sets the variable ABC to 123. performs the subroutine named REPORT. This is effectively the operational equivalent of "pointers" in other languages.

Piece function: This breaks variables into pieces guided by a user specified separator character. Those who know awk will find this familiar. means the "third caret-separated piece of STRINGVAR." It can appear as an assignment target. After

yields "std"   causes X to become "office@world.std.com" (note that $P is equivalent to $PIECE and could be written as such).

Order function

$ yields 6, $  yields 10, $  yields 10, $  yields 15, $  yields "".

Here, the argument-less For repeats until stopped by a terminating Quit. This line prints a table of i and stuff(i) where i is successively 6, 10, and 15.

Multi-User/Multi-Tasking/Multi-Processor: MUMPS supports multiple simultaneous users and processes even when the underlying operating system does not (Eg. MS-DOS). Additionally, by specifying a machine name in a variable (as in ), you can access data on remote machines.

For a thorough listing of the rest of the MUMPS commands, operators, functions and special variables, see these online resources:
 * MUMPS by Example, or the (out of print) book of the same name by Ed de Moel. Much of the language syntax is detailed there, with examples of usage.
 * The Annotated MUMPS Language Standard, showing the evolution of the language and differences between versions of the ANSI standard.

"MUMPS" vs. "M"
While of little interest to those outside the MUMPS/M community, this topic has been contentious there.

All of the following positions can, and have been, supported by knowledgeable people at various times:
 * The language's name became M in 1993 when the M Technology Association adopted it.
 * The name became M on December 8 1995 with the approval of ANSI X11.1-1995
 * Both M and MUMPS are officially accepted names.
 * M is only an "alternate name" or "nickname" for the language, and MUMPS is still the official name.

Some of the contention arose in response to strong M advocacy on the part of one commercial interest, InterSystems, whose chief executive disliked the name MUMPS and felt that it represented a serious marketing obstacle. Thus, favoring M to some extent became identified as alignment with InterSystems. The dispute also reflected rivalry between organizations (the M Technology Association, the MUMPS Development Committee, the ANSI and ISO Standards Committees) as to who determines the "official" name of the language. Some writers have attempted to defuse the issue by referring to the language as M[UMPS], square brackets being the customary notation for optional syntax elements.

The most recent standard (ISO/IEC 11756:1999, re-affirmed on 6 January 2005), still mentions both M and MUMPS as officially accepted names.

Recently Microsoft announced development plans for a new XML-based programming language M. This marks the second name similarity of a recent Microsoft product with an established computing technology of Department of Veterans Affairs (the other being VistA).

The MUMPS epoch
In MUMPS, the current date and time is contained in a special system variable, $H (short for "HOROLOG"). The format is a pair of integers separated by a comma, e.g. "54321,12345" The first number is the number of days since December 31, 1840, i.e. day number 1 is January 1, 1841; the second is the number of seconds since midnight.

The reason for this not very obvious choice of epoch is a bit of MUMPS trivia. James M. Poitras has written that he chose this epoch for the date and time routines in a package developed by his group at MGH in 1969: I remembered reading of the oldest (one of the oldest?) U.S. citizen, a Civil War veteran, who was 121 years old at the time. Since I wanted to be able to represent dates in a Julian-type form so that age could be easily calculated and to be able to represent any birth date in the numeric range selected, I decided that a starting date in the early 1840s would be 'safe.' Since my algorithm worked most logically when every fourth year was a leap year, the first year was taken as 1841. The zero point was then December 31, 1840.... I wasn't party to the MDC negotiations, but I did explain the logic of my choice to members of the Committee.

(More colorful versions have circulated in the folklore, suggesting, for example, that December 31 1840 was the exact date of the first entry in the MGH records, but these seem to be urban legends.)

A piece of MUMPS trivia: $HOROLOG hit 60000 on April 10, 2005; will be 70000 on August 26, 2032; 80000 on January 12, 2060; 90000 on May 30, 2087; and 100000 on October 16, 2114.

Sample programs
An example of "traditional" M coding style, from the Fileman system written for the US Government Veterans' Administration. This fragment uses statement abbreviations and cryptic routine and variable names, extensively. In an era in which disks were small, memory smaller, and I/O often limited to 300 baud, this made some sense. Modern MUMPS code typically uses helpful variable names and would avoid statement abbreviations if possible. There is still a line length limit in most MUMPS implementations, however.

The following code is a complete implementation of ROT13, a trivially breakable cipher used for various purposes on the Net, none high security. It illustrates the compact nature of MUMPS code and is rather less cryptic than the sample above.

A second implementation is below, which illustrates the possibilities of concision in MUMPS. In many respects, like pattern matching, it is comparable to later languages like Perl

Finally, one of the shortest programs ever written in a high language, demonstrating the extreme concision of which MUMPS is capable. The same algorithm using expanded variable and command names This program sets a value of "x x" to a variable named x, and then launches an infinite recursive execution of x, resulting in stack overflow. At 13 characters, including spaces and an end-of-line mark, the first variant demonstrates that it can be as compact and obscure as such languages as Perl.