User:Minhngo6/Syntax (programming languages)

Syntax definition[edit]
The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus–Naur form (a metalanguage for grammatical structure) to inductively specify syntactic categories (nonterminal) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category. Terminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) from which syntactically valid programs are constructed.

Syntax can be divided into context-free syntax and context-sensitive syntax. Context-free syntax are rules directed by the metalanguage of the programming language. These would not be constrained by the context surrounding or referring that part of the syntax, whereas context-sensitive syntax would.

Edited: 12 March 2024

Levels of syntax[edit]
Computer language syntax is generally distinguished into three levels:


 * Words – the lexical level, determining how characters form tokens;
 * Phrases – the grammar level, narrowly speaking, determining how tokens form phrases;
 * Context – determining what objects or variables names refer to, if types are valid, etc.

Distinguishing in this way yields modularity, allowing each level to be described and processed separately and often independently.

First, a lexer turns the linear sequence of characters into a linear sequence of tokens; this is known as "lexical analysis" or "lexing".

Second, the parser turns the linear sequence of tokens into a hierarchical syntax tree; this is known as "parsing" narrowly speaking. This ensures that the line of tokens conform to the formal grammars of the programming language. The parsing stage itself can be divided into two parts: the parse tree, or "concrete syntax tree", which is determined by the grammar, but is generally far too detailed for practical use, and the abstract syntax tree (AST), which simplifies this into a usable form. The AST and contextual analysis steps can be considered a form of semantic analysis, as they are adding meaning and interpretation to the syntax, or alternatively as informal, manual implementations of syntactical rules that would be difficult or awkward to describe or implement formally.

Thirdly, the contextual analysis resolves names and checks types. This modularity is sometimes possible, but in many real-world languages an earlier step depends on a later step – for example, the lexer hack in C is because tokenization depends on context. Even in these cases, syntactical analysis is often seen as approximating this ideal model.

Edited: 7 March 2024

Syntax versus semantics[edit]
The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. The meaning given to a combination of symbols is handled by semantics (either formal or hard-coded in a reference implementation). Valid syntax must be established before semantics can make meaning out of it. Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit undefined behavior. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it.

Edited: 14 February 2024