Draft:Semantic Brand Score

The Semantic Brand Score (SBS) is a measure of brand importance that can be calculated on textual data, including big data, in different contexts. The measure is rooted in graph theory and partly connected to Keller's conceptualization of brand equity.

The SBS is a composite indicator with three dimensions: prevalence, diversity and connectitivy.

The metric can be computed by examining different text sources, such as newspaper articles, online forums, scientific papers, or social media posts.

Pre-processing
To compute the Semantic Brand Score, it is necessary to convert the analyzed texts into word networks, i.e., graphs where each node signifies a word. Connections between words are established based on their co-occurrence within a specified proximity, such as within a sentence. Pre-processing of natural language is preliminary used to refine texts, involving tasks like eliminating stopwords and word affixes through stemming. Here is a sample network derived from pre-processing the sentence "The dawn is the appearance of light - usually golden, pink or purple - before sunrise".



Prevalence
This dimension measures the frequency of brand name usage, indicating how often a brand is explicitly referenced in a corpus. The prevalence factor is associated with brand awareness, suggesting that a brand mentioned frequently in a text is more familiar to its authors. Likewise, frequent mentions of a brand name enhance its recognition and recall among readers.

Diversity
This dimension assesses the variety of words linked with a brand, focusing on textual associations. These textual associations refer to the words used alongside a particular brand. Measurement involves employing the degree centrality indicator, reflecting the number of connections a brand node has in the semantic network. Alternatively, an approach using distinctiveness centrality has been proposed, assigning greater significance to unique brand associations and reducing redundancy. The rationale is that distinctive textual associations enrich discussions about a brand, thereby enhancing its memorability.

Diversity can be calculated for the brand node in a semantic network, i.e., a weighted undirected graph G, made of n nodes and m arcs. If two nodes, i and j, are not connected, then $$w_{ij}=0$$, otherwise the weight of the arc connecting them is $$w_{ij} \ge 1$$. In the following, $$g_j$$ is the degree of node j and $$I_{(f)}$$ is the indicator function which equals 1 if $$f=TRUE$$, i.e. if there is an arc connecting nodes i and j.

$$DI (i) = \sum_{j=1,j\neq i}^{n}\log_{10}\frac{n-1}{g_{j}}I_{(w_{ij}>0)}$$.

Connectivity
This third dimension evaluates a brand's connectivity within broader discourse, indicating its capacity to serve as a bridge between various words/concepts (nodes) in the network. It captures a brand's brokerage power, its ability to connect different words, groups of words, or topics together. The calculation hinges on the weighted betweenness centrality metric.

Semantic Brand Score
The Semantic Brand Score indicator is given by the sum of the standardized values of prevalence, diversity, and connectivity. SBS standardization is typically performed by subtracting the mean from the raw scores of each dimension and then dividing by the standard deviation. This process takes into account the scores of all relevant words in the corpus.

SBS measures brand importance, a construct that cannot be understood by examining a single dimension alone. Indeed, a brand name might be frequently mentioned in posts repeating the same content, indicating high prevalence but low diversity. Conversely, a brand cited across diverse contexts would show both high prevalence and diversity. Connectivity, which increases when a brand bridges various topics, could still remain low if the brand is discussed only within a niche of the overall discourse.