GEH statistic

The GEH Statistic is a formula used in traffic engineering, traffic forecasting, and traffic modelling to compare two sets of traffic volumes. The GEH formula gets its name from Geoffrey E. Havers, who invented it in the 1970s while working as a transport planner in London, England. Although its mathematical form is similar to a chi-squared test, is not a true statistical test. Rather, it is an empirical formula that has proven useful for a variety of traffic analysis purposes.


 * The formula for the "GEH Statistic" is:


 * $$GEH=\sqrt{\frac{2(M-C)^2}{M+C}}$$


 * Where M is the hourly traffic volume from the traffic model (or new count) and C is the real-world hourly traffic count (or the old count)

Using the GEH Statistic avoids some pitfalls that occur when using simple percentages to compare two sets of volumes. This is because the traffic volumes in real-world transportation systems vary over a wide range. For example, the mainline of a freeway/motorway might carry 5000 vehicles per hour, while one of the on-ramps leading to the freeway might carry only 50 vehicles per hour (in that situation it would not be possible to select a single percentage of variation that is acceptable for both volumes). The GEH statistic reduces this problem; because the GEH statistic is non-linear, a single acceptance threshold based on GEH can be used over a fairly wide range of traffic volumes. The use of GEH as an acceptance criterion for travel demand forecasting models is recognised in the UK Highways Agency's Design Manual for Roads and Bridges the Wisconsin microsimulation modeling guidelines, the Transport for London Traffic Modelling Guidelines and other references.

For traffic modelling work in the "baseline" scenario, a GEH of less than 5.0 is considered a good match between the modelled and observed hourly volumes (flows of longer or shorter durations should be converted to hourly equivalents to use these thresholds). According to DMRB, 85% of the volumes in a traffic model should have a GEH less than 5.0. GEHs in the range of 5.0 to 10.0 may warrant investigation. If the GEH is greater than 10.0, there is a high probability that there is a problem with either the travel demand model or the data (this could be something as simple as a data entry error, or as complicated as a serious model calibration problem).

Applications
The GEH formula is useful in situations such as the following:


 * Comparing a set of traffic volumes from manual traffic counts with a set of volumes done at the same locations using automation (e.g. a pneumatic tube traffic counter is used to check the total entering volumes at an intersection to affirm the work done by technicians doing a manual count of the turn volumes).
 * Comparing the traffic volumes obtained from this year's traffic counts with a group of counts done at the same locations in a previous year.
 * Comparing the traffic volumes obtained from a travel demand forecasting model (for the "base year" scenario) with the real-world traffic volumes.
 * Adjusting traffic volume data collected at different times to create a mathematically consistent data set that can be used as input for travel demand forecasting models or traffic simulation models (as discussed in NCHRP 765).

Common criticism about GEH statistic
The GEH statistic depends on the magnitude of the values. Thus, the GEH statistic of two counts of different duration (e.g., daily vs. hourly values) cannot be directly compared. Therefore, GEH statistic is not suitable for evaluating other indicators, e.g., trip distance.

Deviations are evaluated differently upward or downward, so the calculation is not symmetrical.

Moreover, the GEH statistic is not without a unit, but has the unit  $\sqrt{\frac{vehicles}{hour}}$.

The GEH statistic does not fall within a range of values between 0 (no match) and 1 (perfect match). Thus, the range of values can only be interpreted with sufficient experience (= non-intuitively).

Furthermore, it is criticized that the value does not have a well-founded statistical derivation.

Development of the SQV statistic
An alternative measure to the GEH statistic is the Scalable Quality Value (SQV), which solves the above-mentioned problems: It is applicable to various indicators, it is symmetric, it has no units, and it has a range of values between 0 and 1. Moreover, Friedrich et al. derive the relationship between GEH statistic and normal distribution, and thus the relationship between SQV statistic and normal distribution. The SQV statistic is calculated using an empirical formula with a scaling factor $f$ : $$SQV=\frac{1}{1+\sqrt{\frac{(M-C)^2}{f\cdot{C}}}}$$

Fields of application
By introducing a scaling factor $f$, the SQV statistic can be used to evaluate other mobility indicators. The scaling factor $f$ is based on the typical magnitude of the mobility indicator (taking into account the corresponding unit). According to Friedrich et al., the SQV statistic value is suitable for assessing:


 * Traffic volumes (if necessary, differentiation can be made not only by time of day, but also by mode).
 * Person-related mobility indicators:
 * Number of trips per person (not differentiated or differentiated by mode and / or trip purpose, suggestion: $f=1$ ),
 * mean travel times per trip in minutes (not differentiated or differentiated by mode and / or trip purpose, proposal: $f=30$ ),
 * mean travel distances per trip in kilometers (not differentiated or differentiated by mode and / or trip purpose, suggestion: $f=10$ ).

However, the SQV statistic should not be used for the following indicators:


 * Percentage of modal split or modal shares: here there is a fixed upper limit of 100% that cannot be exceeded. Instead, the number of trips per person per mode can be used for validation with the SQV statistic.
 * Travel times for paths between 2 points in the network: This indicator does not depend on the path taken by a single person, but represents a sequence of distances along a route.

Quality categories
Friedrich et al. recommend the following categories: Depending on the indicator under comparison, different quality categories may be required.

Consideration of standard deviation and sample size
The survey of mobility indicators or traffic volumes is often conducted under non-ideal conditions, e.g. large standard deviations or small sample sizes. For these cases, a procedure was described by Friedrich et al. that integrates these two cases into the calculation of the SQV statistic.