Run-time estimation of system and sub-system level power consumption

Electronic systems’ power consumption has been a real challenge for Hardware and Software designers as well as users especially in portable devices like cell phones and laptop computers. Power consumption also has been an issue for many industries that use computer systems heavily such as Internet service providers using servers or companies with many employees using computers and other computational devices. Many different approaches (during design of HW, SW or real-time estimation) have been discovered by researchers to estimate power consumption efficiently. This survey paper focuses on the different methods where power consumption can be estimated or measured in real-time.

Measuring real time power dissipation is critical in thermal analysis of a new design of HW like processors (CPU) just as it is important for OS programmers writing process schedulers. Researchers discovered that knowing the real-time power consumption on a subsystem level like CPU, hard drives, memory and other devices can help power optimizations in applications such as storage encryption, virtualization, and application sandboxing, as well as application tradeoffs.

Different technologies have been discovered that can enable measuring power consumption in real-time. These technologies can be categorized into two main categories: direct measurement using subsystem power sensors and meters or indirect estimation based on provided information such as temperature or performance counters. There are also different methods within each category; for example, different models have been developed to utilize performance counters for power estimation. Each one of these methods has its own benefits and disadvantages. The goal of this paper is to survey that different methods in each category.

Run-time Estimation of System and Sub-system Level Power Consumption
Power consumption can be different for the same type of system because of differences in manufacturing of Hardware and in temperature conditions in which the device is going to operate. Real-Time power management can be used to optimize the system or subsystems to minimize the energy consumption which may, for example, extend the battery lifetime of mobile devices or result in energy savings for Internet companies operating with many computer servers. The following sections are technologies discovered to enable real-time power estimation.

Indirect Power measurement
Indirect power measurement such as using a CPU performance monitoring unit (PMU), or performance counters to estimate run-time CPU and memory power consumption are widely used for their low cost.

Performance counters
Hardware performance counters (HPCs) are a set of special purpose registers built into modern microprocessors to store the counts of hardware-related activities for hardware and software related events. Different models of processors have limited numbers of hardware counters with different events that will satisfy the CPU requirement. These performance counters are usually accurate and provide important detailed information about processor performance at the clock cycle granularity. Researchers were able to create different models that use the HPCs event to estimate the system power consumption in real-time.

First-order, linear power estimation model using performance counters
The first-order linear model was developed by G. Contreras and M. Martonosi at Princeton University using Intel PXA255 processor to estimate CPU and memory power consumption. This is distinct from previous work that uses HPCs to estimate power because the Intel PXA255 processor power requirement was tighter and it offered fewer available performance events compared to mid and high-end processors. This method is also not tied to specific processor technology and HPCs layout for power estimation but rather can be used for any type of processor with HPCs.

This linear power model uses five performance events as follows: Instruction Executed, Data Dependencies, Instruction Cache Miss, Data TLB Misses, and Instruction TLB Misses. A linear model expression is derived (equation 1) as follows assuming a linear correlation between performance counters values and power consumption.

$$Powe{{r}_{cpu}}={{\propto }_{1}}\left( IFetc{{h}_{miss}} \right)+{{\propto }_{2}}\left( DataDep \right)+{{\propto }_{3}}\left( DataTL{{B}_{miss}} \right)+{{\propto }_{4}}\left( InsTL{{B}_{miss}} \right)+{{\propto }_{5}}\left( InstExec \right)+{{K}_{cpu}}$$ (1)

Where, $${{\propto }_{1}},{{\propto }_{2}},{{\propto }_{3}},{{\propto }_{4}},{{\propto }_{5}}$$ are power weights and $${{K}_{cpu}}$$ is a constant for processor power consumption during idle time.

One can also estimate power consumption of memory (external RAM) by tracking the performance events if they are available on the designed processor. PXA255 processor, for example, does not have direct performance events accounting for external RAM but Instruction Cache Miss, Data Cache Miss, and Number of Data Dependencies on processor can be used to estimate the memory power consumption. Again, a linear model is derived from the given information (equation 2) to estimate the memory power consumption.

$$Powe{{r}_{memory}}={{\beta }_{1}}\left( IFetc{{h}_{miss}} \right)+{{\beta }_{2}}\left( DataDep \right)+{{K}_{memory}}$$ (2)

Where, $${{\beta }_{1}},{{\beta }_{2}}$$ are power weights and $${{K}_{memory}}$$ is a power consumption constant during idle time.

The main challenging issue with this method is computing the power weights using a mathematical model (ordinary Least Squares Estimation) at different voltage/frequency points. These constant values in equations 1 and 2 are voltage and frequency depends and they must be computed during benchmark testing. After building such a table for the power weights parameters, then the table can be implemented in software or hardware to estimate the real-time power. The other challenge is in accessing HPCs; for example, in this case they are being read at the beginning of the main OS timer interrupt which requires a software modification. A software program can be written using the equations 1 and 2 and the estimated power weights derived from the table to estimate the power consumption at run-time. For equation 1 the program also needs 5 samples of HPCs but in this example the PXA255 processor can only sample 2 events at any given time therefore multiple code execution is required as well as aligning the data.

In summary, the main benefits of this approach are that it is easy to implement, low cost, and does not require special hardware modification. Software designers can benefit from this model by having a quick power estimate for their applications without any extra hardware requirement.

The main disadvantage of this method is that: real world processors are not perfect and this model does not account for non-linear relationships in those processors. Another issue is also the software overhead running on the processor that consumes power. This approach also does not provide detailed information about power consumption in each architectural functional unit so designers can not see the difference between each module by executing different parts of the software. This method can not be used by OS scheduler or software developers executing multi threaded programs because it needs to gather data by running benchmarks several times. This work is also good for single core processors but not multi-core processors.

Piece-wise linear power estimation model using performance counters
The piece-wise model was developed to estimate power consumption accurately using performance counters. This method was developed by K.Singh, M.Bhadauria at Cornell University and S.A.McKee at Chalmers University of Technology independently of program behavior for SPEC 2006, SPEC-OMP and NAS benchmark suits. This method was developed to analyze the effects of shared resources and temperature on power consumption for chip multiprocessors.

This method used 4 performance counters of AMD Phenom processor. The performance counters are as follows: $${{\text{ }\!\!\varepsilon\!\!\text{ }}_{1}}$$: L2_CACHE_MISS: ALL, $${{\text{ }\!\!\varepsilon\!\!\text{ }}_{2}}$$: RETRIED_UOPS, $${{\text{ }\!\!\varepsilon\!\!\text{ }}_{3}}$$: RETIRED_MMX_AND_FP_INSTRUCTIONS: ALL, $${{\text{ }\!\!\varepsilon\!\!\text{ }}_{4}}$$: DISPATCH_STALLS. These performance counters are architecturally specific to AMD Phenom and may be different for other processors. AMD allows collecting data from those four HPCs simultaneously. A microbenchmarks, which is a small program, attempts to collect data from the above selected HPCs. Collected data on each processor core are used in the following equation.

$${{P}_{core}}=\left\{ \begin{matrix} {{F}_{1}}\left( {{g}_{1}}\left( {{r}_{1}} \right),\ldots \ldots ,{{g}_{n}}\left( {{r}_{n}} \right) \right),if\text{ }condition \\ {{F}_{2}}\left( {{g}_{1}}\left( {{r}_{1}} \right),\ldots \ldots ,{{g}_{n}}\left( {{R}_{n}} \right) \right),else \\ \end{matrix} \right.$$ (3)

Where $${{r}_{i}}={{\varepsilon }_{i}}/(cycle\text{ }count)$$ $${{F}_{n}}={{P}_{0}}+{{P}_{1}}*{{g}_{1}}\left( {{r}_{1}} \right)+\ldots \ldots +{{P}_{2}}*{{g}_{n}}({{r}_{n}})$$ (4)

Equation 4 transformation can be linear, inverse, logarithmic, exponential, or square root; it depends on what makes the power predication more accurate. Piece wise linear function was chosen to analyze equation 4 from collected data because it will capture more detail about each processor core power. Finally, analyzing the collected HPCs data with piece wise linear method gives the detailed power consumption (for example, L2 cache misses has the highest contribution in power consumption versus L3).

The above method was used to schedule each AMD Phenom processor core in a defined power envelope. The processors core gets suspended when the core exceeds the available power envelope and it becomes available again when enough power becomes available.

There are some restrictions and issues with this method; for example, this method does not account for temperature effect. There is a direct relationship between temperature and total power consumption (because as temperature increases the leakage power goes up) that this model does not account for because AMD Phenom does not have per-core temperature sensors. A second disadvantage is that mictobenchmarks is not complete to get a better power estimate (for instance, it does not cover the DISPATCH_STALLS HPC). A more complete microbenchmark will cause timing issues. Future work needs to be done to incorporate thermal data into the model and thread scheduling strategies as well as to reduce frequency (DVFS) of each core versus suspending the core. This method only covers processors but there are other subsystems, like memory, and disks, that also need to be considered in total power.

This method is different from many other methods using performance counters because all the cores in multi core processors are considered, the performance counters being used do not individually have high effect with power consumption and it estimates the power consumption for each core that can be used for real time scheduling of each core to be under power envelope.

Adaptive power estimation model using performance counters
Most models like the above do not have the capability to measure power consumption at a component or subsystem level. DiPART (Disaggregated Power Analysis in Real Time) developed by Professor M. Srivastava, Y. Sun, and L. Wanner at University of California, Los Angeles enables this capability to estimate power consumption based on hardware performance counters and using only one power sensor for the whole system. Models are required to estimate power consumption based on performance counters. These models correlate the data for different performance counters with power consumption and static models like above examples (First-order and Piece-wise linear) have different estimation errors due to variations across identical hardware. DiPART is a solution to this problem because it is a self-adaptive model that can be calibrated once and be applied across different platforms.

The linear estimation model for DiPART requires a power sensor capable of acquiring dissipated power consumption and current measurement at run time. There are different embedded sensors like Atom-LEAP system or Qualcomm's Snapdragon Mobil Development Platforms that can do the job for DiPART. One single power sensor can be used to calibrate the subsystem level estimation model DiPART.

Total power of the system is the summation of the power consumption by each subsystem shown in equation 5.

$${{P}_{system}}={{P}_{CPU}}+{{P}_{RAM}}+{{P}_{Disk}}$$ (5)

For each subsystem, power performance counters are being used. For CPU power, ten performance counters are required as follows: Task counts, Context Switch counts, CPU Migration counts, Page Fault counts, Cycles counts, Instruction counts, Branches counts, Cache Refer counts, and Cache Miss Counts. Then a linear model is used to compute the total power of CPU and coefficient values are computed with a liner regression algorithm using performance counter data and monitored power consumption data.

$${{P}_{CPU}}=\left[ \propto ,\beta ,\ldots \gamma \right]*Vecor\text{ }of\text{ }CPU\text{ }performance\text{ }Counters+{{\lambda }_{constantCPU}}$$  (6)

The above performance counters can also be used for RAM power consumption model and the memory coefficient vector and the constant value is also computed during training phase with performance counter data and monitored power consumption data.

$${{P}_{RAM}}=\left[ \text{ }\!\!\Delta\!\!\text{ },\text{ }\!\!\Gamma\!\!\text{ },\ldots \text{ }\!\!\Theta\!\!\text{ } \right]*\text{ }Vector\text{ }of\text{ }Performance\text{ }Counters+{{\lambda }_{constantRAM}}$$ (7)

Disk power consumption model is based on input counter and output counter correlated with Input/Output events counters.

The same approach is taken as for CPU and RAM to estimate the coefficient and constant for disk power during training phase.

$${{P}_{Disk}}=\left[ \varphi ,\chi \right]*\text{ }Vector\text{ }of\text{ }Disk\text{ }performance\text{ }counter+{{\lambda }_{constantDisk}}$$ (8)

During training the total power measured from the sensor is subtracted from the initial CPU, RAM, and Disk power model predication. Then 10% from the delta result is taken to compensate in individual subsystems CPU, RAM and disk models. This iteration will continue until estimation error for total system power is smaller than some threshold, or it hits the specified number of iterations. During this training process with some number of iteration process each subsystem model gets adjusted accordingly base on the delta percentage. Once the subsystems are trained the total system does not need to be trained.

The CPU, RAM, and Disk power model modification and system-level variation is required if the total delta is not less than 10%. The iteration process will continue until the individual subsystem power model prediction gets close to the monitored total power. When subsystem power consumption model has been trained the total system level power consumption model does not need to train again for the same system.

This method is beneficial compared to static models because of its adaptability to the variations among different systems even with exactly the same hardware. The experimental results show that estimated errors are high before training the DiPART, and that the error decreases as the number of iteration increases.

One major issue with this model is the dependency on power sensors to measure the total power. The other issue is the number of performance counters being used for DiPART model. These performance counters might not be available for all processors. This method was also used for CPU, RAM and disk subsystem but there are other subsystems that need to be considered in total power consumption. The main problem with adding more subsystems will be the adaptive mechanism because as the number of subsystems increases, the accuracy and training speed will decrease. Another issue is that the CPU, Disk and RAM are also not perfect and have some non-linearity part that was not considered in this method.

Dynamic Thermal Management
As the Integrated Circuit (IC) technology size is getting smaller in nanometer scale and more transistors are put together in that small area, the total power and temperature on chip are also increasing. The high temperature on the chip, if not controlled, can damage or even burn the chip. The chip high temperature also has impacts on performance and reliability. High chip temperature causes more leakage power consumption, higher interconnect resistance and slower speed of transistors. Therefore, Dynamic Thermal Management (DTM) is required for high performance embedded systems or high-end microprocessors. Thermal sensors are also not perfect for the job because of their accuracy and long delay to capture the temperature. The DTM idea is to detect and reduce the temperature of hot units spots in a chip using different techniques like activity migration, local toggling, dynamic voltage and frequency scaling.

A new method was developed by H. Li, P. Liu, Z. Qi, L. Jin, W. Wu, S.X.D Tan, J. Yang at University of California Riverside based on observing the average power consumption of low level modules running typical workload. There is a direct correlation between the observation and temperature variations. This new method was a solution to replace the old technologies such as on-line tracking sensors on the chip like CMOS-based sensor technology that are less accurate and requires hardware implementation.

This method is based on observing the average power in a certain amount of time which determines the temperature variations. This idea can be implemented with a fast run-time thermal simulation algorithm at architectural level. This method also presents a new way to compute the transient temperature changes based on the frequency domain moment matching concept. The moment matching concept is basically said that the transient behaviors of a dynamic system can be accurately described by a few dominant poles of the systems. The moment matching algorithm is required to compute the temperature variation response under initial temperature conditions and average power inputs for a given time. This method also follows circuit level thermal RC modeling at the architectural level as described in reference. The unit temperature variation during run-time is because of the irregular power trance generated by each unit in their architectural blocks. This power input is consistent of DC and small AC oscillation. It was also shown and proven that most of the energy in the power trace concentrates on the DC component. Therefore, the average power can be described as a constant DC input to thermal circuit. After all a thermal moment marching (TMM) with initial condition and DC input is required to be implemented. The TMM model is as follows:

$$Gx+Cx=Bu$$ (9)

G and C are conductive and capacitive circuit matrices, and x is the vector of node temperature. u is the vector of independent power source and B is the input selector matrix. This equation will be solved in frequency domain and the initial condition is required which will be the initial temperature at each node. The main idea is to implement the TMM algorithm which provides better reliable on-line temperature estimation for DTM applications.

In summary, the TMM algorithm is much faster than the previous work in this area to estimate the thermal variation because this method is using frequency domain moment matching method. The other work (like HotSpot) uses the integration method where it needs all previous points to obtain the temperature at certain running point. This will make the simulation time longer.

This work can also be improved by computing the average power real-time using performance counters. This method can be added to the above models using performance counters to estimate on the fly temperature variation as the programs are getting executed.

PowerBooter and PowerTutor
This power model technique was developed by collaboration between L. Zhang, B. Tiwana, Z. Qian, Z. Wang, R.P. Dick, Z.Mao from University of Michigan and L. Yang from Google Inc. to accurately estimate power estimation online for Smartphones. PowerBooter is an automated power model that uses built-in battery voltage sensors and behavior of battery during discharge to monitor power consumption of total system. This method does not require any especial external measurement equipment. PowerTutor is also a power measurement tool that uses PowerBooter generated data for online power estimation. There is always a limitation in Smartphone technology battery life span that HW and SW designers need to overcome. Software designers do not always have the best knowledge of power consumption to design better power optimized applications therefore end users always blame the battery lifespan. Therefore, there is a need for a tool that has the capability to measure power consumption on Smartphones that software designers could use to monitor their applications in real-time. Researchers have developed specific power management models for specific portable embedded systems and it takes a huge effort to reuse those models for a vast variety of modern Smartphone technology. So the solution to this problem is PowerBooter model that can estimate real-time power consumption for individual Smartphone subsystems such as CPU, LCD, GPS, audio, Wi-Fi and cell phone communication components. Along with PowerBooter model an on-line PowerTutor utility can use the generated data to determine the subsystem level power consumption. The model and PowerTutor utility can be used across different platforms and Smartphone technologies.

This model is different from the other models discovered because it relies only on knowledge of the battery discharge voltage curve and access to battery voltage sensor which is available in all modern Smartphones. The basic idea for this model technique is to use battery state of discharge with running training software programs to control phone component power and activity states. Each individual Smartphone component is held in a specific state for a significant period of time and the change in battery state of discharge is captured using built-in battery voltage sensors. The first challenging idea is to convert battery voltage readings into power consumption. This is determined by state of discharge (which is total consumed energy by battery) variation within a testing interval captured by voltage sensors that will eventually drive the following equation.

$$P*\left( {{t}_{1}}-{{t}_{2}} \right)=E*(SOD\left( {{V}_{1}} \right)-SOD\left( {{V}_{2}} \right))$$ (10)

Where E is the rated battery energy capacity and SOD (Vi) is the battery state of discharge at voltage Vi and P is the average power consumption in the time interval t1 and t2. The state of discharge can be estimated using look up table where the relationship between present voltage and SOD is captured. Determining the energy is also an issue because the energy is changing as the battery gets old. The new batteries have the total energy written on their back but the value can not be true for all time. It can estimate the energy at highest and lowest discharge rate to decrease the error. The internal resistance also has significant impact on the discharged current. To decrease the effect of internal resistance all the phone components can be switched to their lowest power modes to minimize the discharge current when taking a voltage reading. Finally, this method uses a piece-wise linear function to model the non-linear relationship between SOF and battery voltage.

The above battery model can be all automated with 3 steps which are described in. In conclusion, this method is beneficial because all Smartphones can use this method and for new Smartphones this model needs to be constructed only once and after automating the process there would be no need for any extra equipment to measure power consumption. Once the model is generated automatically or manually the PowerTutor utility can use the data to estimate power consumption in real time. Software engineers can use this utility to optimize their design or users can use this tool to make their decision about buying applications based on the power consumption.

The main issues are in computing the energy which adds up to accuracy of the power model. Another issue is also considering the internal resistor to read the voltage. This can be resolved in newer versions of Smartphones that provide current measurement instead of voltage. The above model needs to be modified using the current measurement.

Appscope and DevScope are similar work to estimate Smartphone power consumptions.

Run- time modeling and estimation of operating system power consumption
The operating system (OS) is the main software running on most computing systems and contributes a major component in dissipating power consumption. Therefore, operating system model was developed by T. Li and L.K John from University of Texas at Austin to estimate the power consumption by OS that helps power management and software applications power evaluation.

It has been computed that software execution on hardware components can dissipate a good portion of power consumption. It is also been shown that the choice of algorithm and other higher level software code decisions during the design of software could significantly affect system power. Many of these software applications rely on operating system; therefore, overlooking the estimated power consumption by OS could cause huge error in energy estimation. Estimating OS power consumption could help software designers optimize their code design to be more energy efficient. For example, software engineer; can observe the power consumption when using different compiling techniques to handle TLB misses and paging. A good OS model needs to have the following properties to be good enough for thermal or power management tools. The model needs to be highly reliable, fast, and it also should have run-time estimation capability that does not increase overhead. The model should be simple and easily adoptable across different platforms.

The purposed run-time power estimation requires a first order linear operation on a single power metric, reducing estimation overhead. The Instruction per Cycle (IPC) can be used as the metric to characterize the performance of modern processors. In paper shows how various components in the CPU and memory systems contributes to the total OS routine power. Data-path and pipeline structure along with clocks are consuming the most power. A linear model can be derived from IPC that tracks the OS routine power. A simple Energy equation $$E=P*T$$ can be used to estimate a given piece of software energy consumption, where P is the average power and T is the execution time of that program.

The challenging part is to compute the average power P for each individual routine of operation system. One can use the correlation between IPC and OS routine average power or hardware performance counters can be used. The profiling method (data gathered from benchmark testing) can also be used to predict the energy consumption. The linear power model in is as follows:$${{P}_{OS}}={{K}_{1}}*\text{ }IP{{C}_{OS}}+{{K}_{0}}$$. This is a simple linear model that shows a strong correlation between IPC and OS routine power. In this approach profiling is also required to generate data needed to build the model. After the model is generated for one system, then it is not needed again for the same system.

Virtual Machine Power Metering and Provisioning
Joulemeter is a proposed solution by Aman Kansal, Feng Zhao, and Jie Liu from Microsoft Inc. and Nupur Kothari from University of Southern California, Los Angeles and Arka Bhattacharya from Indian Institute of Technology to measure virtual machine power which cannot be measured directly in hardware. This method is used for power management for virtualized data centers. Most servers today have power metering and the old servers use power distribution units (PDUs). This method uses those individual power meters to save significant reduction in power provisioning costs.

This method uses power models in software to track VM energy usage on each significant hardware resource, using hypervisor-observable hardware power states. Joulemeter can also solve the power capping problem for VMs which will reduce power provisioning costs significantly. The largest power consuming subsystems in computer servers are the processor, memory and disk. Servers also have idle energy consumption which sometimes can be large, but it is static and it can be measured. Power models are presented for each of subsystems CPU, memory and disk in reference in detail. This power model is the core technique for Joulemeter. Figure 4 in reference shows the block diagram of Joulemeter where System Resource & Power Tracing module reads the full server CPU, disk and power usage. The VM resource tracking module tracks all the work load using hypervisor counters. The base model training module implements the learning methods described in as well as refinement module. The energy calculation module finally takes the out of base model training module and model refinement module to output the VM energy usage using the energy equations described in reference.

The benefits of this method are safe isolation of co-located workloads, enabling multiple workloads to be consolidated on fewer servers, resulting in improved resource utilization and reduced idle power costs. Joulemeter can also be used to solve the power capping problem for VMs which will saved significant amount of power provisioning costs in data centers.

Direct Power measurement
One can use different types of sensors to gather voltage, current, frequency or temperature and then use those data to estimate power consumption.

Low Power Energy Aware Processing embedded sensor system
The LEAP (Low Power Energy Aware Processing) has been developed by D. McIntire, K. Ho, B. Yip, A. Singh, W. Wu, and W.J. Kaiser at University of California Los Angeles to make sure the embedded network sensor systems are energy optimized for their applications. The LEAP system as described in reference offers a detailed energy dissipation monitoring and sophisticated power control scheduling for all subsystems including the sensor systems. LEAP is a multiprocessor architecture based on hardware and software system partitioning. It is an independent energy monitoring and power control method for each individual subsystem. The goal of LEAP is to control microprocessors to achieve the lowest per task operating energy. Many modern embedded networked sensors are required to do many things like image processing, statistical high performance computing and communication. To make sure all of these applications are working efficiently a real-time energy monitoring and scheduling feature is required and LEAP can offer this feature for those systems.

LEAP (ENS) system was designed to offer high accuracy and low overhead energy measurement capability. LEAP enables energy aware applications through scheduling and energy profiling of high energy efficiency components including multiple wireless network interfaces, storage elements, and sensing capabilities. The biggest advantage of LEAP system is its Energy Management and Preprocessing (EMAP) capability. The experimental results shows that the optimal choice of sensor systems, processor, wireless interface, and memory technology is not application dependent but it could be hardware allocation issue. EMAP has the capability to partition devices into many power domains with the capability to monitor, enable or disable power to each domain, as well as to respond to trigger events or conditions that restore or remove power in each domain. EMAP collects data periodically and transfers them to the host process and power management schedule is then provided by host processor to EMAP.

Figure 1 in reference shows the LEAP architecture and EMAP architecture. The LEAP and EMAP are complex platforms which require hardware and software. All of the detailed design approaches are described in reference.

In conclusion, LEAP differs from previous methods like PowerScope because it provides both real-time power consumption information and a standard application execution environment on the same platform. As a result, LEAP eliminates the need for synchronization between the device under test and an external power measurement unit. LEAP also provides power information of individual subsystems, such as CPU, GPU and RAM, through direct measurement, thereby enabling accurate assessments of software and hardware effects on the power behavior of individual components.

Power model validation through thermal measurements
One of the challenges for HW or SW designers is to validate their simulation data with empirical data. They require some type of utility or tool to measure power consumption and compare with their simulation data. One of these methods to capture real time data to validate power or thermal models is an infrared measurement setup developed by F.J. Mesa-Martinez, J.Nayfach-Battilana and J. Renau at University of California Santa Cruz. Their approach is to capture thermal maps using infrared cameras with high spatial resolution and high frame rate. Then a genetic algorithm finds a power equation for each floorplan block of processor that produces the capture thermal map to give detailed information about power breakdown (leakage and dynamic). They also developed an image processing filter to increase the thermal image accuracy. The biggest challenge for this approach is to obtain a detailed power map from the thermal measurements. There is no direct mapping between measured information and power. A genetic algorithm was developed described in reference that iterates multiple thermal traces and compares them with the results from thermal simulator to find the best power correlation.

The first step is to measure the temperature using IR camera and within the oil coolant that flows over the top of the chip surface, the detailed setup information is described in reference. Oil is chosen because of ease in modeling and accuracy. The infrared cameras must be calibrated to compensate for different material thermal emissions, lens configurations, and other factors in reference. A second filter is also applied to compensate for the optical distortion induced by lens setup. A very accurate thermal model is required in this approach to account for effects of the liquid cooling setup accurately. The model equations are described in reference.

Designers can use this method to validate their simulation or optimize their design especially because this method provides the breakdown information about leakage and dynamic power consumption. This method is also helpful in chip packaging design, heat sink, and cooling system. This method also shows designers which part of floorplan blocks propagates heat faster or slower.

Conclusion
Estimating power consumption is critical for hardware, software developers, and other computing system users like Internet companies to save energy or to optimize their HW/SW to be more energy efficient. It is also critical because one can use the available resources accordingly. Simulators are only good during design but their estimation also needs to be verified. Simulators in general have high errors due to manufacturing of hardware components. Power meters measure power consumption for the whole system but does not give detailed breakdowns about dissipated power so designers can optimize their application or hardware. This paper analyzed different methods that researchers have discovered in recent years to resolve some of the issues above.