User:Thepigdog/Relational programming

Relational programming is based on the mathematical relation. A relation is a table of which has a column, or columns for inputs and a column for outputs. A function is a type of relation.

Logic programming is based on a restricted implementation of relational programming. The output value of all relations is always boolean. Resulting values are interpreted using negation as failure. Relation programming does not require this. Some logic programming systems extend this model with the use of constraints.

History
The Turing machine started computers with imperative programming. Imperative programming is an order sequence of statements, where each one calculates outputs from inputs.

Functional programming extends the model by making the statements independent and un-ordered. But still each statement defines only how to calculate the output from the input. Many different paths may be used in calculating the input from the output.

Many real life situations in computer programming require the computer to model changes in the world. Input and output streams needed to be included in functional languages. Initial implementations used imperative features to implement I/O. Later functional languages like haskell used monads to model change within functional programming paradigm.

Logic programming languages based on unification and resolution first provided the possibility of a statement defining different values depending on what values are given. But logic programming restricts the output to Boolean true values, and forces all results to be produced as side effects of the application of statements. Constraints fill in the gaps to some extent.

Languages based on narrowing provide a general implementation of relational programming.

Logic programming
By far the majoratory of relational programming languages are base on logic programming. The success of logic programming is due to efficient implementations based on unification and resolution.

These techniques implement a restricted subset of relational programming. The addition of constraints led to Constraint logic programming. These languages are closer to relational programming.

Narrowing
The other method used in implementing relational programming is narrowing. Narrowing extends functional programming to provide relational programming.

The relational model
The relational model puts the inputs and outputs on equal symmetric footing.


 * $$O = F(I_1, ... I_n) \equiv F_r(I_1, ... I_n, O) $$

The relationship $$F_r$$ is a constraint on the inputs and the outputs where the output O assumes an equal role with any of the inputs.

If F is a function then the output O is a single value. But as a relation or constraint, providing n values, leaving one free does not always restrict the remaining variable to one value.

A general method for representing multiple values for a variable as a value set is described in narrowing of algebraic value sets. The value set is treated as a single value by recording context information about the conditions under which each value in the value set is held.

Recording multiple values in a single container allows multiple values to be recorded as a single "value". Recorded relationships between values and possible worlds restrict each value to its proper context.

The model is then extended to allow each input and output to be a value set. A function is then modeled as a constraint between value sets.

Relational evaluation
An assertion is an expression that is deemed true. So an expression like,
 * $$x = 5$$

is a relationship between the values x, 5, and true. Because x is unknown, the relationship defines the value of x as 5. For another example,
 * $$y + 3 = 5$$

This may be seen as two relationships; a relationship between y, 3 and another value which we may call x, and a relationship between x, 5, and true. The second relationship tells us that x is 5, so the first relationship between y, 3 and 5, which gives y is 2.

y is 2 is used to represent that y has already been found to have the value 2, but $$y = 2$$ is a relationship, that implies that y is 2.

A relationship has a function, and one or more inverse functions. This set of functions may be used to calculate the unknown value from the known values. What values may be calculated from given inputs depends on the relation.

If the value of a particular value is required, or requested, then a relationship including the variable may be found. That relationship may have more unknowns. These unknowns may then be requested, chaining back eventually to known values.

There are limitations to this strategy. If a variable for a function is given twice, perhaps a recursive function is defined. But in general where a variable appears multiple times, the strategy fails, and the equation may need to be solved by other means.

Known and unknown values
Relational programming calculates known from unknown values. This means that this status must be recorded. But in a functional or mathematically based language, it is not possible to access this status, because it will change when a deduction calculates the value. The representation of unknown values is an implicit part of the language implementation that cannot be directly accessed.

Deductions
The values that can be calculated are different for each relationship. What values may be calculated also depends on what values are known, and may even depend on the particular value.

The following sections summarize the deductions for various types. Lower case letters are unknown values. Upper case letters are known values.

Deductions other than direct calculation are shown. A direct calculation is calculating the output from the inputs.

Comparison
The comparison relations are equal, not equal, less than, less than or equal, greater than, greater than or equal. Of these only equality allows a deduction other than direct calculation.

Equality
For comparing trees or lists deductions may be extended by the equality deduction on each component. This is unification.

And
Note that in the last case, the "or" operator gives a value set, not a set. A value set differs from a set in that it behaves as a single value, but manages the context in which each value applies (see value set).
 * In the possible world M, x is true and so y must be false.
 * In the possible world N, x is false, y is either true or false.

Or
Note that in the last case, the "or" operator gives a value set, not a set. A value set differs from a set in that it behaves as a single value, but manages the context in which each value applies (see value set).
 * In the possible world K, x is true and y is either true or false.
 * In the possible world L, x is false, so y must be true.

Square
Note that inverse of the square gives a value set, not a set. A value set differs from a set in that it behaves as a single value, but manages the context in which each value applies (see value set).

String reference
A string reference (StringRef) is a sub string of another string, whose start and end positions within the string are represented by variables. In concatenating two unknown strings to be equal to a string constant it is not known where the start of the first string ends, and the start of second string starts.

However it is known that the second string will start immediately after the end of the first string. A string reference allows this information to be captured.

Concatenation
Note C can be a string or a string reference. A string reference describes a string as a start and end position in another string. If C is a string reference in the example above, then the string reference positions are calculated for positions in the original string that C is a reference to.

Function application
The lambda abstraction may be used as the inverse of function application.

There are restrictions on this deduction. See Curry's paradox and deductive lambda calculus.

Value sets
The deductions for the "or" operator and the "square" function may give value sets. A value set is a of values that is managed so that context in which value is found is recorded. If the context was not managed then the combination of values from different contexts would give rise to incorrect calculations.

For example if,
 * $$x^2 = 4$$

then
 * $$ x \in \{2, -2\}$$

So a naive person might deduce,
 * $$ 2 x = x + x = \{2, -2\} + \{2, -2\} = \{4, 0, -4\} $$

this, of course, is incorrect because in reality,
 * $$ 2 x \in \{4, -4\}$$

The managing of value sets so as to avoid incorrect combinations of values is explained in narrowing of algebraic value sets.

Order of evaluation
Evaluation does not need to follow a fixed evaluation path. Deductions may be made where ever they are possible to calculate unknown values from known values.

All calculations that do not reply on value sets as the known value should be done first. Deductions on value sets should be ordered so that deductions that produce the smallest value set are done first. The product of the sizes of the value sets in the relation, other than the unknown value, gives an estimate of the size of the value set created for the unknown value. This is the optimal strategy for limiting the Combinatorial explosion.

As evaluating any relation may contribute to reduce the size of other value sets by narrowing, the best strategy is to evaluate expressions with the smallest value set size first, as this may lead to narrowing that reduces the size of other vale sets.

The evaluation should proceed on a list of relations for which values have been requested, ordered by the predicted size of the value set to be created. This predicted size will change over time leading to the need for periodic re-ordering. For value sets with one value, this strategy is lazy. For values sets with multiple values the strategy minimizes the combinatorial explosion.

Unlike constraint logic programming there is no backtracking required for this algorithm. All possible values are contained in value sets, and are considered at the same time. The lack of backtracking may be of benefit, or a disadvantage, depending on the problem. The use of value sets forces the consideration of all values, which allows the full benefits of narrowing in reducing the size of value sets. However it forces the solving of the whole problem at once, which if a problem may be naturally divided up, may be a disadvantage.

The relational model of state
Imperative programs are characterized by having implicit state. On the surface imperative programs are very different from functional programs. However monads provide a way of introducing imperative commands into a functional language, by hiding the state in the monad object.

But monads do not give a completely natural way of representing imperative programming. They do not provide a general model for translating an imperative program into a functional language.

Transformations on imperative programs
Mathematical expressions may be transformed using mathematical laws to give equivalent expressions. The same ability is needed for imperative programs. A mathematical model for an imperative program allows a change in the execution order of the program, without changing the overall effect of the program.

State holder
Using the relational evaluation model, a simple model of state transfer may be given. A statement in an imperative program has,
 * Input state
 * Input parameters
 * return value
 * Output state

It is natural in a functional to consider the input state as another parameter. But using the relational evaluation model this is not necessary. the relation model allows the output to be used in determining the input. For example in,
 * $$ x \cdot 3 = 9 $$

x is determined to be 3, when it is only input to the function.

Let the "state holder" be a triple consisting of, Because a state holder takes its input from its output, it does not behave in the same way as other mathematical objects. But with its own set of laws it may be included in a mathematical framework.
 * Input state
 * return value
 * Output state

The advantage of using a state holder is that state may be transferred through any parameter, providing a full model of imperative programming. The state transfer is completely hidden, and there is a direct translation between imperative programs and state holder programs. State only needs to be transferred where needed.

In comparison monads provide only a linear transfer through primary parameters, but not through every parameter.

A disadvantage is that translating an imperative program to a state holder form does not make it obey the fundamental axioms of equality. Different equality axioms are followed by state holders. Separate transformations must be applied to put the code in a functional form.

The use-mention distinction
In mathematics, quantification is over values, not formulas. To define the assignment statement it is necessary to refer to both the formula and the identity of the formula. The distinction between the formula representing a value and the identity of the formula is the use–mention distinction. The notation,
 * $$\forall X \#U $$

is introduced to mean quantification over formula X where X refers to the value, as a use, and U refers to the identity of the formula as represented or mentioned.

Expressions
For the purposes of describing state, lower case letters will represent symbols and upper case letters will represent variables.

Assignment
An assignment statement, represented by :=, is defined by,
 * $$ \forall X\#U, \forall I, \forall R, \forall O, (X := (I, R, O)) = (I, R, O[U:=R]) $$

where,
 * X is a left expression.
 * I and O are states.
 * R is an expression.
 * $$ O[U:=R] $$ is the state O with the value identified by the symbol X replaced by the value R.

The notation $$\forall X\#U$$ gives access to the identity of the variable X, in U. This identity is then used to look up the state.

State lookup
State lookup is represented by,
 * $$ \forall X\#U, \forall I, [X] =(I, I[U], I) $$

This definition uses the identity of the variable X (as U) to lookup the state, and return the value for that identity.

In a typical imperative language, state lookup is implicit. But that would cause some problems in describing state in mathematics.

State transfer
The following definitions give a version of any function that manages state,
 * $$ \forall I, \forall F, \forall X, \forall O, F\ (I, X, O) = (I, F\ X, O) $$
 * $$ \forall I, \forall F, \forall X, \forall O, (I, F, O)\ X = (I, F\ X, O) $$
 * $$ \forall I, \forall F, \forall X, \forall O, (I, F, M)\ (M, X, O) = (I, F\ X, O) $$

where the variables I, F, X, O are not state holders.

To avoid contradiction the standard definitions of functions would not apply to state holders. That is,
 * $$ ((I, X, M) = (M, X, O)) \ne (I = M \land M = O) $$

instead it only is defined by,
 * $$ ((I, X, M) = (M, X, O)) = (I, X = X, O) $$

This may be achieved given a class hierarchy. Standard equality may be defined for standard objects, but not for state holders.

State holder objects do not obey the substitution rule,
 * $$A = B \to f(A) = f(B)$$

This is a fundamental principle of mathematics. This rule must be suspended for state holders.

Example
Take for example,

The function f may be represented as,
 * $$ f = (x := 5 + [x]) + (x := 5 + [x]) $$
 * $$ [x] =(I_1, I_1[x], I_1) $$
 * $$ [x] =(I_2, I_2[x], I_2) $$
 * $$ f = (x := 5 + (I_1, I_1[x], I_1)) + (x := 5 + (I_2, I_2[x], I_2)) $$
 * $$ F (I, X, O) = (I, F X, O) $$
 * $$ f = (x := (I_1, 5 + I_1[x], I_1)) + (x := (I_2, 5 + I_2[x], I_2)) $$
 * $$ (X := (I, R, O)) = (I, R, O[X:=R]) $$
 * $$ f = (I_1, 5 + I_1[x], I_1[x:=5+I_1[x]]) + (I_2, 5 + I_2[x], I_2[x:=5+I_2[x]]) $$
 * $$ (I, F, M)\ (M, X, O) = (I, F\ X, O) $$
 * $$ f = (I_1, 5 + I_1[x] + 5 + I_2[x], I_2[x:=5+I_2[x]]) $$

where,
 * $$I_2 = I_1[x:=5+I_1[x]]$$

Suppose in the initial state $$I_1[x] = 2$$. Then,
 * $$ f = (I_1, 5 + 2 + 5 + I_1[x:=5+2][x], I_2[x:=5+I_2[x]]) $$
 * $$ f = (I_1, 5 + 2 + 5 + 5+2, I_2[x:=5+7]) $$
 * $$ f = (I_1, 19, I_2[x:=12]) $$

where, $$ I_2 = I_1[x:=5+2]$$

So the C++ program should return, x = 12, r = 19

Statement composition
Two statements are composed together, by feeding the output state from one statement into the input state of the next.
 * $$\forall I, \forall R_1, \forall M, \forall R_2, \forall O, (I, R_1, M); (M, R_2, O) \ = (I, R_2, O) $$

If statement
An if statement may be regarded as a function, with 3 parameters. Currying may be applied. The condition function may be state-full or stateless. Stateless is defined by,
 * $$ \forall A, \forall B, (\operatorname{if} \operatorname{true} \operatorname{then} A \operatorname{else} B) = A $$
 * $$ \forall A, \forall B, (\operatorname{if} \operatorname{false} \operatorname{then} A \operatorname{else} B) = B $$

where A and B may be state holders, or other types. State-full is defined by
 * $$\forall I, \forall M, \forall A, \forall B, \forall C, \operatorname{if} (I, C, M) \operatorname{then} A \operatorname{else} B = (I, C, M); \operatorname{if} C \operatorname{then} A \operatorname{else} B $$

For example,

While statement
The while statement is defined by,
 * $$\forall C, \forall B, \operatorname{while} C \operatorname{do} B = \operatorname{if} C \operatorname{then} (B; \operatorname{while} C \operatorname{do} B) $$

If C is not a state holder this statement either does nothing or expands recursively forever. If C is a state holder, the value represented by C does not need to be the same each time, even though the expression is the same.

Functions and return values
Many imperative languages use a fairly unnatural structure for defining functions. The natural structure is that the body of the function is an expression. The return value of the function is the value of the expression. The function call is then equal to the body of the function, with formal parameters replaced by actual parameters. The return value may be stateless, or it may be a state holder.

However imperative languages use statement composition which composes a state holder, which has no sensible return value. The return statement is then needed to capture a value from a state. An extra structure called the return frame is needed to get the value from a return statement.

Return frame
The return frame accepts the return value. The value may be stateless, or a state holder.
 * $$ \forall X, \operatorname{returnframe}(\operatorname{return} X) = X $$

Declaration frame
The declaration frame is required to insure the calling of destructors. It is defined by,
 * $$ \forall I, \forall R, \forall O, \forall X, \forall Q, \operatorname{declareframe}(X, \operatorname{return} (I, R, O)) = (I, \_, O); destroy(X); (Q, R, Q) $$

Return statement
The return statement is defined by,
 * $$\forall I, \forall D, \forall M, \forall R, \forall O, (I, D, M); \operatorname{return} (M, R, O) = \operatorname{return} (I, R, O)$$
 * $$\forall X, \forall U, (\operatorname{return} X; U) = \operatorname{return} X$$