Assembly theory

Assembly theory is a framework for quantifying selection and evolution. When applied to molecule complexity, its authors show it to be the first technique that is experimentally verifiable, unlike other molecular complexity algorithms that lack experimental measures.

Background
The hypothesis was proposed by chemist Leroy Cronin in 2017 and developed by the team he leads at the University of Glasgow, then extended in collaboration with a team at Arizona State University led by astrobiologist Sara Imari Walker, in a paper released in 2021.

Assembly theory conceptualizes objects not as point particles, but as entities defined by their possible formation histories. This allows objects to show evidence of selection, within well-defined boundaries of individuals or selected units. Combinatorial objects are important in chemistry, biology and technology, in which most objects of interest (if not all) are hierarchical modular structures. For any object an 'assembly space' can be defined as all recursively assembled pathways that produce this object. The 'assembly index' is the number of steps on a shortest path producing the object. For such shortest path, the assembly space captures the minimal memory, in terms of the minimal number of operations necessary to construct an object based on objects that could have existed in its past. The assembly is defined as "the total amount of selection necessary to produce an ensemble of observed objects"; for an ensemble containing $$N_T$$ objects in total, $$N$$ of which are unique, the assembly $$A$$ is defined to be

$$A=\mathop{\sum }\limits_{i=1}^{N}{e}^{{a}_{i}}\left(\frac{{n}_{i}-1}{{N}_}\right)$$,

where $$n_i$$ denotes 'copy number', the number of occurrences of objects of type $$i=\{1,2,\dots,N\}$$ having assembly index $$a_i$$.

For example, the word 'abracadabra' contains 5 unique letters (a, b, c, d and r) and is 11 symbols long. It can be assembled from its constituents as a + b --> ab + r --> abr + a --> abra + c --> abrac + a --> abraca + d --> abracad + abra --> abracadabra, because 'abra' was already constructed at an earlier stage. Because this requires at least 7 steps, the assembly index is 7. The word ‘abracadrbaa’, of the same length, for example, has no repeats so has an assembly index of 10.

Take two binary strings $$C=[01010101]$$ and $$D=[00010111]$$ as another example. Both have the same length $$N=8$$ bits, both have the same Hamming weight $$N_1=N/2=4$$. However, the assembly index of the first string is $$a(C)=3$$ ("01" is assembled, joined with itself into "0101", and joined again with "0101" taken from the assembly pool), while the assembly index of the second string is $$a(D)=6$$, since in this case only "01" can be taken from the assembly pool.

In general, for K subunits of an object O the assembly index is bounded by $$\log_2(K) \le a_O \le K-1$$.

Once a pathway to assemble an object is discovered, the object can be reproduced. The rate of discovery of new objects can be defined by the expansion rate $$k_{\text{d}}$$, introducing a discovery timescale $$\tau_{\text{d}} \approx 1/k_{\text{d}}$$. To include copy number $$n_i$$ in the dynamics of assembly theory, a production timescale $$\tau_{\text{p}} \approx 1/k_{\text{p}}$$ is defined, where $$k_{\text{p}}$$ is the production rate of a specific object $$i$$. Defining these two distinct timescales $$\tau_{\text{d}}$$, for the initial discovery of an object, and $$\tau_{\text{p}}$$, for making copies of existing objects, allows to determine the regimes in which selection is possible.

While other approaches can provide a measure of complexity, the researchers claim that assembly theory's molecular assembly number is the first to be measurable experimentally. Molecules with a high assembly index are very unlikely to form abiotically, and the probability of abiotic formation goes down as the value of the assembly index increases. The assembly index of a molecule can be obtained directly via spectroscopic methods. This method could be implemented in a fragmentation tandem mass spectrometry instrument to search for biosignatures.

The theory was extended to map chemical space with molecular assembly trees, demonstrating the application of this approach in drug discovery, in particular in research of new opiate-like molecules by connecting the "assembly pool elements through the same pattern in which they were disconnected from their parent compound(s)".

It is difficult to identify chemical signatures that are unique to life. For example, the Viking lander biological experiments detected molecules that could be explained by either living or natural non-living processes. It appears that only living samples can produce assembly index measurements above ~15. However, 2021, Cronin first explained how polyoxometalates could have large assembly indexes >15 in theory due to autocatalysis.

Critical views
Chemist Steven A. Benner has publicly criticized various aspects of Assembly Theory. Benner argues that it is transparently false that non-living systems, and with no life intervention, cannot contain molecules that are complex but people would be misled in thinking that because it was published in Nature journals after peer review, these papers must be right.

A paper published in the Journal of Molecular Evolution refers to Hector Zenil's blog post "that identifies no less than eight fallacies of assembly theory". The paper also refers to the video essay by the same author staying "that summarizes these fallacies, and highlights conceptual/methodological limitations, and the pervasive failure by the proponents of assembly theory to acknowledge relevant previous work in the field of complexity science". The paper concludes that "the hype around Assembly Theory reflects rather unfavorably both on the authors and the scientific publication system in general". The author concludes that what "assembly theory really does is to detect and quantify bias caused by higher-level constraints in some well-defined rule-based worlds"; one "can use assembly theory to check whether something unexpected is going on in a very broad range of computational model worlds or universes".

The group led by Hector Zenil, a former Senior researcher and faculty member from Oxford and Cambridge and currently an Associate Professor in Biomedical Engineering from King's College London, is cited to have reproduced the results of Assembly Theory with traditional statistical algorithms.

Another paper authored by a group of chemists and planetary scientists, including an author affiliated with NASA, published in the journal of the Royal Society Interface demonstrated that abiotic chemical processes have the potential to form crystal structures of great complexity — values exceeding the proposed abiotic/biotic divide of MA index = 15. They conclude that "while the proposal of a biosignature based on a molecular assembly index of 15 is an intriguing and testable concept, the contention that only life can generate molecular structures with MA index ≥ 15 is in error".

The paper also cites the papers and posts of Hector Zenil as questioning whether a single scalar value like the assembly index can be employed to adequately discriminate between living and nonliving systems, and pointing out the noticeable similarities of the Assembly Theory approach to uncited prior efforts to distinguish biotic from abiotic molecular compounds.

In particular, the paper mentions that Zenil and colleagues "may also have anticipated key conclusions of Assembly Theory by exploring connections among causal memory, selection, and evolution".