Humpage Technology Ltd
An Introduction to Regaression Analysis
The science of experimental physics can be described as
"...putting something IN,
and then observing what comes OUT".
If we continue with our back-to-basics approach, we could also say that, in general, calibration procedures fall into one of only two categories:
Imagine that you perform a calibration on a device and take 20 measurements.
The results of our calibration are shown in tabular form below. Tabular Output
The data, as it stands, can be of value to other people. For instance it can be used as a look-up table to obtain the output value for a given input, or to determine what input is needed to produce a required output. However, presenting the information in this way produces a big question:
|
Graphical Output
A quick, simple and, it has to be said, effective, way of achieving this is to plot the calibration data on an x-y graph. Just by using your eye, a ruler, or a set of french-curves, you will be able generate a line that indicates the relationship between the data points. The line, which is known as a "trend line", will, if drawn with care, be a surprisingly accurate representation of the data points. Seeing a graph will also illustrate to you a very important characteristic of eye-brain coordination, namely that when it comes to detecting trends, you will do so much better using a graph than a table of values. Any deviation from a smooth x-y relationship is immediately detected by looking at the graph. The same cannot be said when you look at the table of values. Shown below is the data from our calibration, plotted as an x-y graph by Calibration Toolbox ADO.
Someone with a copy of this graph can quickly an easily read off data points and interpolate between them. But there are still questions to be answered.
Which leads us on to the subject of: |
Regression AnalysisRegression Analysis is a statistical technique that produces a line or curve of known function (equation) through a series of data points. The average of all the distances from each point to the curve/line is minimised. In other words, the line drawn through is the closest mathematical relationship to the data, or as it is known, a line of "best fit". The most common type of regression is where a straight line is drawn through the data points. This is known as "Linear" or "First Order" regression. The line generated will be of the type:
A person given the values of C and D for the line of best fit can calculate for themselves the value of y for ANY given value of x and vice-versa.
Example 1:
Inserting the values of C and D into our expression gives:
Example 2:
Again, inserting the values of C and D into our expression gives:
The examples chosen are actually conversions between temperatures expressed in Celsius
and Fahrenheit.
Conversions of this type are obviously very useful, but the picture is incomplete unless the person given the values of C and D is also made aware of exactly how closely the data fits the line. By this we are not talking about how good a job your eye and ruler (or software) does of putting the best line through the data points, any problem here is a mistake in interpretation of the data. What we are actually concerned about is the dispersion of the data points around our line of best fit, and hence, how well the data can be represented a trend line of known function. The statistical method of expressing the magnitude of this dispersion is called "CORRELATION". Correlation
The number that we use to quantify it is known by a few fancy names, such as the Pearson Product
Moment, but we shall refer to it as the "CORRELATION COEFFICIENT". Its calculation is beyond
the scope of this page, it suffices to say, that Caliso seamlessly performs these calculations
for you, and has all the tools you need for further Regression Analysis.
Its purpose though is this: |
Type | Description | Example | Range |
Positive Correlation. | Increasing x causes an increase in y. | Daytime temperature and the number of ice-creams sold. | >0 to +1 |
Negative Correlation. | Increasing x causes a decrease in y. | Your average speed and the the time it takes you to get from A to B. | <0 to -1 |
Zero Correlation. | No definable relationship. | The number of people under 6 feet tall in London, and the price of fish in Hong Kong. | 0 |
Have a look at the three Caliso graphs below. They will help you to appreciate the concept of a line of "best fit", and how the correlation coefficient indicates the level of dispersion of the data points around the line
Now have another look at the three graphs and note the following:
-
The bigger the dispersion of the data points, sometimes referred to as "Scatter", the more difficult it becomes to draw the line of best fit by hand.
-
The higher the level of scatter of the data points, the lower the correlation coefficient becomes. The level below which the correlation becomes unacceptable varies from one calibration procedure to another. It will also depend on the types of data being analysed, and its intended usage.
On the first graph, the data is a perfect match, and hence the Correlation Coefficient is 1.0.
-
This point is mainly for Caliso users.
The three graphs were produced automatically using Caliso's Regression Analysis tools. As you can see, the data points on the first graph are shown in green, whereas the data points on the other two are shown in red.This is because Caliso has the facility to apply a user-definable tolerance band to the regression results, and if the regressed data point lies outside the band, the point is shown red and green if it is inside.
Higher Order Regression Analysis Not all data is best represented by a straight line, for example, the area of a circle is a function of the square of its diameter and not the diameter itself.
This not a problem because Linear Regression is not the only type of Regression Analysis. Higher orders of regression allow us to find more complex relationships between x and y. Listed below are the types of analysis offered by Caliso:
Orders of Regression Analysis Order Equation First Order Second Order Third Order The A, B, C, and D are known as the "Regression Coefficients, where:
- A is the third order coefficient.
- B is the second order coefficient.
- C is the first order coefficient.
- D is the zeroth order coefficient, or "constant".
Orders higher than 3 are also possible, but are not used by Calibration Toolbox ADO. This is because third order regression is more than capable of providing accuracies that are within the requirements of experimental measurement.
Furthermore, each increase in regression order has a corresponding increase in the minimum number of data points that need to be supplied by the calibrator. Calibration Toolbox ADO is designed to save you work, and not make extra work for you.Selecting the Best Order Regression Analysis to Use
In some cases, external circumstances will require you to perform linear regression for example:
- SCADA or data acquisition systems that convert raw data into engineering units.
- Linear amplifiers for strain gauges or accelerometers which require calibration of their offset and gain.
Otherwise, use the highest order of regression available for the number of data points you have or are required to produce. All regression software tools worthy of merit will produce a line of best fit, even if that line is generated using a lower order of regression than the one chosen by the user. The software will do this by forcing any unwanted coefficients to zero.
Going back to start of our discussion, Calibration Toolbox ADO was asked to perform a third order regression on our 20 calibration points. The results were:
- Coefficient A = 0
- Coefficient B = 1
- Coefficient C = 0
- Coefficient D = 0
- Correlation Coeff = 1.0000
You might also like to know that, using Calibration Toolbox ADO, the whole process, which involved entry of the new calibration data and the calculations, took less that 2 minutes including printing a new calibration certificate and graph.