User:BjornKoemans/NLP for Requirements Engineering

Natural Language Processing, or NLP, can be utilised in the field of Requirements Engineering to extract information from unstructured Natural Language (NL). NLP techniques can be used throughout almost all requirements engineering phases, from requirement elicitation to requirements management. There exists a range of NLP tools that have been developed that incorporate NLP tasks or techniques to support requirements engineering tasks. For example, NLP can be used to help identify customer needs during requirement elicitation by processing interviews and feedback in NL and can also be used to formalise ambiguous requirements written in NL to prevent conflicts.

Natural Language Processing (NLP)
Natural Language Processing, is an interdisciplinary subfield that combines the expertise of linguistics, computer science, and artificial intelligence, in order to examine the interactions between human language and computers, with a specific focus on developing algorithms and techniques for computers to process and analyze vast quantities of natural language data. NLP can be used for a wide range of applications, including language translation, question answering, sentiment analysis, topic modelling, text generation and text summarisation. NLP relies on computational models and can be combined with other technologies such as speech recognition to process natural language.

Usage of NLP in RE
Requirements Engineering (RE) is a critical step in the software development process, as it helps to ensure that the product meets the customer's demand. By using requirements during software development, the characteristics that the system must possess to fulfil the needs of stakeholders are formally specified. RE is a natural language-intensive field and takes various natural language artefacts into account during the RE lifecycle, such as requirements documents, user stories, reviews, product descriptions, privacy policies, etcetera. Due to requirements being written in NL, they are easy to read, write and understand by the stakeholders involved, also when those have little to no experience with RE. However, NL also introduces room for ambiguity and informal representations, which could result in communication errors and deviations in implementations. Additionally, large amounts of NL are hard to process manually. To support RE tasks which involve NL, NLP tasks and tools can be deployed.

NLP can be used for automating the analysis of natural language artefacts and NLP can be used to automate and simplify the requirements engineering process by analysing natural language text and extracting useful information. Therefore, NLP can become very useful in supporting RE tasks in general and to make RE tasks more efficient and less error prone.

NLP has multiple use cases in RE, such as specifying requirements. NLP techniques are utilised to extract and formulate requirements in a specified format from the natural language texts at hand. Moreover, NLP can also be used to validate requirements for consistency and completeness, meaning that requirements that are too generic or specific, requirements that do not conform to a format or requirements that are inconsistent or conflicting can be identified and possibly also automatically improved. NLP also has certain use cases within the latter RE phases such as managing requirements, which covers tracing and monitoring the requirements. Using NLP implementation, requirements can automatically be monitored and traced according to natural language on how the project progresses and requirements change over time.

RE activities
The main objective of NLP for RE is to support requirement engineers to perform various RE activities that involve processing and analysing natural language requirements documents. The RE activities that are targeted and supported by NLP are detection, extraction, modelling, tracing and relating, classification, and search and retrieval. These RE tasks are executed separately in different phases of the RE lifecycle. Detection occurs in the requirement analysis phase of the RE lifecycle, extraction is done for requirements elicitation, tracing and relating corresponds to requirement management, modelling is performed within the requirements design phase and classification is covered throughout the entire RE lifecycle. During all activities, various NLP tasks and tools can be utilised to process natural language that is generated by communication with customers, such as customer interviews, prototyping, workshops and more requirements elicitation techniques. Using tools such as Standford CoreNLP, natural language from interviews with stakeholders for example can easily be processed, resulting in a range of possible data, such as tokens, linguistic annotations, named entities, linguistic dependencies and relations.

NLP tasks for RE activities
NLP-based tasks are small preprocessing steps within the overall pipeline of NLP. There are various NLP-based tasks that can be used during RE activities. Zhao et.al. extracted a total of 139 tasks from various studies, the ten most freqeuently used NLP tasks are shown in the table below. Thereby, every task mentioned in the table below are applicable to all of the six RE activities. The entire list of all 139 NLP tasks can be found in the image aside.

NLP tools for RE activities
NLP tools are software systems or software libraries which can combine several NLP-based tasks together in order to serve a certain NLP goal. There are various NLP tools that can be used for several RE activities, the ten most freqeuently used NLP tools according to the study of Zhao et.al. are shown in the table below.