User:Vrajadh/sandbox

 Data clumps  refer to a code smell in computer programming code, in which groups of data items that are related are always passed around together. They are often primitive values that ideally should have been converted into objects. They are created due to bad program structuring or not following object oriented principles. They can also be a result of excessive use of "copy and paste programming".

Description
Code refactoring is the process of improving the efficiency of code without affecting its functionality. An important part of this process is identifying code smells. According to Martin Fowler, "a code smell is a surface indication that usually corresponds to a deeper problem in the system". Data clumps is an example of a code smell in which two or three data items are passed around together in a program(eg. start and end variable or length, breadth and height of an object). Such recurrence of items leads to duplication of code. An element of abstraction is missing from the code making it difficult to understand.

Issues and Identification
The major downside to not removing Data Clumps from the code is re-usability. Since Data Clumps refer to related data that goes together, any change made to one such data item may not be reflected in other places of its usage (e.g., length and breadth of a rectangle). If for some reason the value of one of the dimensions needs to be changed, it will have to be changed everywhere else where it is being used. If these dimensions are packed in an object, a call to the Mutator method would reflect the changes in the system.

In some cases, there are 10-15 parameters that go together and passing them from one method to another results in ugly code. Such code is unreadable and hard to understand for the reader .It is also difficult to maintain. A general rule of thumb to identify Data Clumps is that when you see a few parameters that are repeatedly being passed around in groups, try to delete one of them and check whether the remaining parameters still make any sense or not. If they don't, then there exists a Data Clump.

Extract Class
In this refactoring technique, we break down a large class into multiple small classes. This results in maintaining of the single responsibility principle. Classes adhering to the single responsibility principle are reliable and tolerant to changes. Data clumps are avoided as the different data items can be passed as an object of the class.

Java Example
Below is an example of a class Person in java. The instance variables officeAreaCode and officeNumber are currently a part of it. They would travel around as data clumps in the code. Below the potential data clumps are put into a separate class. An object of this class is created in the Person class to link the two together. Thus the data can now be passed around the system as an object.

Parameter Object
Whenever repeating group of data items are encountered in the form of parameters they can be packed into an object. This helps in avoiding code duplication. As an example, we have an application that gets the information about a person and can be instantiated to refer to a person.

Java Example
In the below code a set of parameters is being passed to the createPerson method.

Instead of passing a long parameter list, two objects,one belonging to Name and other to the Address class have been created and passed.

Preserve Whole Object
There are occasions when several values need to be extracted from an object to be later passed on as parameters to a function. On such occasions the object itself can be passed on as a parameter.

Java Example
In the below code,we obtain the parameters highest and lowest from a temp object and pass it to a Range method. Instead the entire object can be passed as a parameter to the Range method.

Detection
For detecting code smells there exists Reek, a code smell detector that can analyze ruby files and extract code smells. In case of data clumps, it tries to identify a group of two or three data elements that are expected as parameters more than two methods of a class. Below is an example of a warning thrown by reek on identification of a data clump.

Advantages

 * Improved code understanding and organization.
 * Operations on the same set of data are gathered into a single place, instead of randomly being spread across the code.
 * Reduces code size considerably.
 * Huge code bases are easy to maintain.

Disadvantages
There aren't many disadvantages in removing Data Clumps from code as such when done carefully. Although it can be a daunting task to detect them in a legacy code and ensuring its handling in some cases can lead to further bugs being introduced in the system. While removing them the developer should do exhaustive testing in the form of Unit Test, Integration Test, Functional Test, End to End Test etc., before releasing the changes.