User:AmbitiousKru/sandbox

= Machine learning in heart disease prediction = A heart is a vital organ of the human body. If a heart does not perform its work properly, it will affect the other organ of the human-like kidney, brain, etc. Heart disease or Cardiovascular diseases (CVDs) are found to be the leading cause of death worldwide. It has become a major concern to deal with. Studies reveal that the number of deaths due to heart infections has increased significantly over the past few decades in India. It has become the preeminent cause of death. A study shows that the prevalence of heart disease and stroke has increased by over 50 % from 1990 to 2016 in India. Deaths due to cardiovascular diseases raised from 13 lakh in 1990 to 28 lakh in 2016.

Heart failure is also an outcome of heart disease, and breathlessness can occur when the heart becomes too weak to circulate blood. Some heart conditions occur with no symptoms at all, especially in older adults and individuals with diabetes. The term 'congenital heart disease' covers a range of conditions, but the general symptoms include sweating, high levels of fatigue, fast heartbeat and breathing, breathlessness, chest pain. However, these symptoms might not develop until a person is older than 13 years. In these types of cases, the diagnosis becomes an intricate task requiring great experience and high skill. A risk of a heart attack or the possibility of the heart disease if identified early can help the patients take precautions and take regulatory measures.

Heart disease prediction is treated as the most complicated task in the field of medical sciences. A reliable, precise and feasible system to diagnose such diseases in time for proper treatment is required. The health care field has an immense amount of data. Many hospitals use hospital information systems to manage their healthcare or patient data. But sadly, this data is rarely used to support medical decision making. There are certain techniques for processing those data.

The medical data mining uses a classification algorithm for identifying the possibility of heart attack before the occurrence. The classification algorithms can be trained and tested to make the predictions that determine the person’s nature of being affected by heart disease. The prediction is made using the classification model that is built from the classification algorithm when the heart disease dataset is used for training. This final model can be used for the prediction of any type of heart disease.

By applying machine learning concepts like Logistic regression, Support vector machine, Decision trees, Naïve bayes, Random forest, Neural Network, etc we can predict disease. By applying more machine learning algorithms we get the comparative result.

Many health care providers recommend treating anyone with CVD with high-dose ‘Statin therapy’. This includes those with coronary heart disease and who have had a stroke. For those who do not have CVD, treatment is determined by a person's individual risk for developing heart disease. That risk can be estimated using calculators which factor your age, sex, medical history, and other characteristics. If a person's risk is high (such as a 7.5 or 10 percent risk of developing CVD over 10 years), doctor may start on treatment preventively. They generally keep in mind your preferences towards taking medication in general. For those people whose risk is unclear, a coronary artery calcium score, which is a screening test looking for calcium (an indication of atherosclerosis) in the arteries, can help determine the need for statins.

Decision Tree Classification Algorithm:
The decision tree is a supervised machine learning algorithm. It handles both the categorical data and numerical data. Based on certain conditions it gives a categorical solution such as Yes/No, True or false, 1 or 0. For handling the medical dataset, the Decision tree classification algorithm is also used. The result of this model differing from the other models like the KNN model, SVM model. The output consists of horizontal and vertical line splits based on the condition depends on the dependent variables. The Decision tree model has predicted the heart disease patient with an accuracy level of 91% and Naïve Bayes classifier has predicted heart disease patient with an accuracy level of 87%. This model analyses the dataset in the tree shape format. Thus each and every attribute of the dataset is been analyzed. A Tree-shaped diagram determines the course of action. The decision tree model analyzes the data on the basis of three nodes namely the Root node, Interior node, and Leaf node.

Heart disease risk factor include:


 * High Cholesterol
 * High blood pressure
 * Diabetics
 * Smoking
 * Consuming too much alcohol
 * Being overweight or obese
 * Family history of coronary illness

Symptoms of Heart attack:


 * Shortness of breath
 * Pain and discomfort in chest
 * Pain may spread to left or right arm or to neck, jaw, back or stomach
 * Fatigue
 * Cold sweat and unsteadiness
 * Rapid or irregular heart beat
 * Heart burn or abnormal pain

Attribute and Description of the dataset

Step 1: Identify the information gain for the attributes in the dataset.

Step 2: Sort the information gain for the heart disease datasets in descending order.

Step 3: After the identification of the information gain assign the useful attribute of the dataset at the root of the tree.

Step 4: Calculate the information gain using the same formula.

Step 5: Split the nodes based on the highest information gain value.

Step 6: Repeat the process until each attributes are set as leaf nodes in all the branches of the tree

Performance Metrics:

Precision: It is the part of significant instances between the retrieved instances.

Recall: It is the small part of appropriate instances that have been retrieved over the total quantity of relevant instances.

F-Measure: It is considered based on the two times the precision times recall divided by the sum of precision and recall.

RoC curve: This shows a graphical way the connection/ trade off involving clinical sensitivity and specificity for every potential cut off for a test or an arrangement of tests.