Chemosensory protein

Chemosensory proteins (CSPs) are small soluble proteins which mediate olfactory recognition at the periphery of sensory receptors in insects, similarly to odorant-binding proteins. The typical structure of CSPs is made of six or seven α-helical chains of about 110-120 amino acids (10-12 kDa), including four cysteines that build two small loops, two adjacent disulfide bridges, and a globular "prism-like" functional structure [5]. Three CSP structures have been solved in moths (Mamestra brassicae and Bombyx mori) and locusts (Schistocerca gregaria) [5-8].

Gene structure and evolution
The CSP structure is highly flexible. CSPs are characterized by RNA editing and/or post-translational modifications as discovered in the silkworm moth, B. mori [9-14]. The addition of glycine near cysteine at specific location, amino acid inversion and motif insertion in protein sequence strongly argue for the existence of recoding at the level of protein synthesis in the CSP family [9-14]. In addition, they are capable of breathing or specific conformational changes upon ligand binding, which may represent another key feature of the ancestral primitive multifunctional soluble binding protein [15].

The number of CSP genes is usually very low in insects as found in Drosophila flies, Anopheles mosquitoes, Pediculus lice, honeybees and jewel wasps (4-8) [4, 24, 40-41]. A significantly higher number of CSP genes exist in butterfly, moth and beetle genomes (nb CSPs=19-20) [32, 42-43]. Culex mosquito species have between 27 and 83 CSP genes [44]. More than hundreds of protein variants can be produced from CSP genes through or mediated via post-translational modifications and/or RNA-peptide editing as in the case of Dscam and cochlear sensory genes [9-14].

CSP genes evolved via duplication, intron loss and gain, and retrotransposition events [4, 14, 32, 40-41, 45]. A single unified hypothesis of RNA editing and retrotransposition-driven evolution of CSPs, i.e. initial production of new CSP protein motifs via DNA and RNA -dependent RNA polymerization before retro- transposition of edited CSP-RNA variants, has been proposed in moths [11].

Expression
In insects, CSPs are found throughout the whole insect development process from eggs and larvae to nymphal and adult stages [4, 16-19]. In locusts, they are mainly expressed in the antennae, tarsi and legs, and found to be associated with phase change [3-4, 20-22]. CSPs are not the apanage of insects. They are also expressed in many various organisms such as crustacean, shrimp and many other arthropod species [23]. However, they are not specific to the arthropod kingdom. They are also expressed at the level of the bacterial superkingdom, demonstrating their existence not only in eukaryotes, but also in prokaryote organisms [23-24]. Prokaryote CSPs are twins or identical twins to insect CSPs [24]. They have been reported from bacterial species such as coccobacillus Acinetobacter baumannii, Macrococcus/Staphylococcus caseolyticus, the filamentous actinomycete Kitasatospora griseola, an Actinobacteria genus in the family Streptomycetaceae, and Escherichia coli (E. coli) which are known as common bacteria from the digestive tract, main prokaryotic secondary metabolites, opportunistic multi-drug resistant pathogens, high positive cytochrome c oxidase reactions, and symbionts of multiple insect species [24].

Their existence has been mentioned in plants, but this still needs to be demonstrated experimentally [25-26]. CSPs can be extracted from wasp venom [27]. In moths, nearly all CSPs are expressed in the female pheromone gland [9-14]. However, CSP expressing secretions and tissues are not only the female moth pheromone gland, but also antennal branches, mandibles and salivae, cephalic capsula, eyes, proboscis, thorax and abdomen, head, epidermis, fat body, gut, wings and legs, i.e. a wide range of reproductive and non reproductive, sensory and non-sensory fluids and tissues of the insect body [28-31]. Nearly all CSPs are up regulated in most of all tissues from the insect body, particularly in the gut, epidermis and fat body, following insecticide exposure [32].

Functions and binding properties
Such a broad pattern in gene expression over such a wide range of sensory and non-sensory fluids or tissues is in strong agreement with a very general basic function for this gene family, i.e. in relation with lipid transport and metabolism.

A role of CSPs in general immunity, insecticide resistance and xenobiotic degradation has been recently brought up by Xuan et al. (2015), who showed a drastic and remarkable up-regulation of CSP genes in many various tissues over exposure to abamectin insecticide molecule [32]. Increased load of CSPs (pherokines) in fly hemolymph is observed after microbial or viral infection [33]. The particular role of CSP proteins in lipid transport in relation with insecticide resistance has been brought up by Liu et al. (2016) in whiteflies [34]. Liu et al. showed insecticide-mediated up regulation and interaction of the protein with C18-lipid (linoleic acid), suggesting a metabolic role of CSP in insect defense rather than olfaction or chemical communication [34].

The first member of this soluble protein family has been reported by Nomura et al. (1982) as up-regulated factor (p10) in the regenerating legs of the American cockroach Periplaneta americana [35]. The same protein was identified in the antennae and legs from P. americana at the adult sexually mature stage with some apparent differences between males and females, rather suggesting a “chemodevol” function for this protein, contributing both to tissue development and recognition of sex-specific signals such as sex pheromones [2]. In immunocytochemistry experiments, one (polyclonal) antibody against CSP labeled the antennal sensillum, but the labeling was not restricted to sensory structures but rather diffused to the cuticle and supporting cells [3, 36]. A function of CSPs in lipid transport is consistent with a crucial role not only in insect general immunity, moth pheromone synthesis or locust behavioral phase change, but also in head development as described in honeybees [37]. CSPs have been proposed to mediate recognition of chemical signatures composed of cuticular lipids as for instance in ants [38]. However, it is not clear whether some CSPs are involved in chemical communications, others in development or other physiological roles. The functional CSP structure is bound with fatty acid molecules [5]. Other functional CSP structures have been shown to interact directly with exogenous compounds such as toxic chemical compounds (cinnamaldehyde) from plant oils [34]. So, CSPs expressed not only in arthropods, but also in bacteria, and apparently endowed with heterogeneous functions. CSPs can trigger innate immune pathways in plants [39].

Nomenclature
The first member of this gene family was called p10, in reference to the size and molecular weight (in kDa) of a protein from insect regenerating legs. The same protein (called Pam) was found in the adult antennae and legs from the two sexes of the American cockroach P. americana [2, 35]. Similar clones identified in Drosophila and Locusta in a search for olfactory genes referred to Olfactory-Sensory type D protein (OS-D or Pheromone Binding Protein A10) [20, 46-47]. Related clones identified in the antennae of the sphingid Manduca sexta were named sensory appendage proteins (SAPs) to distinguish them from a family of longer six-cysteine soluble proteins, i.e. odorant-binding proteins or OBPs [48]. Individual SAPs/CSPs have been designated in various ways: p10/Periplaneta americana (Nomura et al., 1992) [35], A10/Drosophila melanogaster (Pikielny et al., 1994) [46], OS-D/D. melanogaster (McKenna et al., 1994) [47], Pam/P. americana (Picimbon & Leal, 1999) [2], CSP/Schistocerca gregaria (Angeli et al., 1999) [3], SAP/Manduca sexta (Robertson et al., 1999) [48], Pherokine/D. melanogaster (Sabatier et al., 2003) [33], B-CSP/Acinetobacter baumannii, Macrococcus caseolyticus, Kitasatospora griseola, Escherichia coli (Liu et al., 2019) [24].

The protein family was renamed to chemosensory protein (CSP) by Angeli et al. after one (polyclonal) antibody against p10 labeled some sensory structures in the adult antennae of the desert locust Schistocerca gregaria [3]. The term “B-CSP” was used to refer to similar clones from bacterial (B) species [24]. However, the functional importance of CSP proteins in olfaction/chemosensing remains to be proved. Since then, this protein gene family has been proved to act outside the chemosensory system [32]. They were called pherokines to designate proteins in abundance in the fly hemolymph in response to microbial or viral infection [33]. It was even proposed to rename these proteins to cuticular sensory proteins to keep the name but to emphasize on their expression level not only in sensory organs, but also in the immune barriers between the insect and the environment [49-50].

An email forum was organized to find most suitable new name considering the growing evidence that CSPs do not play a central and unique role in chemosensing, if any [32]. The term “CSP” has grown and is taken to mean belonging to a group of soluble proteins with a particular four-cysteine pattern and a high level of structural similarity [4, 14, 23-36, 32-37, 50]. The term “CSP” is rather unsuitable especially to designate the whole protein gene family because it means literally “Chemosensory Proteins” [3]. This term should not be used to unite under a common name all genes and proteins that are related in an evolutionary context from bacteria to honeybees. The knowledge to name the CSPs properly comes now with this thorough analysis of sea crustaceans, arthropod, bacteria and insect genome and Expressed Sequence Tag (EST) databases in the continuity of molecular data that demonstrate that CSPs are not exclusively tuned to olfactory/taste chemosensory organs [4, 14, 23-36, 32-37, 50].

It is a situation similar to lipocalins (from Greek lipos=fat and Greek kalyx=cup), where the name designates a superfamily of widely distributed and heterogenous proteins, which transport small hydrophobic molecules including steroids and lipids. However, in contrast to lipocalins, the “CSP” family refers to homogenous evolutionary-well conserved proteins with characteristic sequence (4 cysteines), tissue profiling (ubiquitously expressed), and rather highly diverse binding properties (not only to long fatty acids (FAs) and straight lipid chains, but also to cyclic compounds such as cinnamaldehyde) [34]. Therefore, it is rather difficult to name groups and sub-groups within the CSP family, although numerous CSP proteins are mainly produced in the gut and the fat body that are considered as the insect body’s principle storage organs for energy in the forms of FAs and lipids, which are mobilized through lipolysis process to provide fuel to other organs to develop, regenerate or grow and/or to respond to an infectious agent [4, 14, 50]. In moths, specific lipid chains are mobilized for pheromone synthesis [9-14].