# Developing a Model

Introduction

ThyssenKrupp Ag employs 17,000 employees in 80 countries that are passionate and are experts in developing solution for sustainable progress. The company manages global growth with innovations and technical progress along with using finite resources in a sustainable way. ThyssenKrupp pushes the company to evolve which helps them to meet global challenges of the future with their innovation solutions.
The company’s main activities are the development and marketing of people moving equipments, so they have been invited to submit a proposal for the development and installation of a series of cycle trains for a new airport terminal. These trains will be designed to carry out the same job such as the capacity/load, distance and time frames. The company must take in consideration factors that will affect the efficiency of the train which includes the weight of passengers and their luggage. Data from similar trains located at different airports are put together the amount of time it takes the trains to travel three thousand feet.
The technical department and I are asked to aid in the development of a model to help calculate the length of time it will take for one of the company’s standard trains to travel three thousand feet in an installation.

Part 1

Time 20 22 19 28 30 29 20
Passenger 66 80 60 102 115 100 70
Time 19 21 24 23 28 25 20
Passenger 65 70 85 80 100 96 71
Time 21 26 19 28 30 22 25
Passenger 75 88 60 99 110 88 90
Table 1

The regression equation is
Time = 4.58 + 0.228 Passengers

Above in table 1 is the data collected from the various airports with similar trains along with calculated estimated regression equation which represent the least square line drawn through the scatter plot in fig 1. Figure 1 shows a graph of time versus passengers on a scatter diagram. The slope of the estimated regression is 0.228 positive, implying that as passenger increases, time in creases. We can expect that time will increase by 0.228 per passenger.
We will determine whether how well the estimated equation fits the data which is calculated by finding the total sum of squares, sum of squares due to regression and the sum of squares due to error which is also called the coefficient of determination which provides a measure of the goodness of fitness for the estimated regression equation. Table 2, gives the calculated figures for SST = 299.81, SSR=280.68 and SSE=19.13 and the Coefficient of determination or R2 is calculated to be 93.6%. We can conclude from this figure that 93.6% of the variability in time is explained by the linear relationship between the time it took to transport passengers and the amount of passengers. This indeed is a good fit for the regression equation.
Even though our R2 is large, we must not use the regression equation until further analysis of the appropriateness of the assumed model has been conducted. We will test for the significance of the relationship. But first we must find the standard error of estimate, which is 1.00350, meaning that the standard deviation of the actual point to the line is about 1.00%, (the measure of dispersion). We then calculate the estimate standard of deviation which is sb1 equal to the standard mean of the estimate divided by the sum of x1 minus x bar square. This will give us the figure(0.01363) that's needed to calculate the t-test In this test, the purpose is to see whether or not we can conclude that B1 ≠ 0. If the HO is rejected we will conclude that B1 ≠ 0 and that a statistically significant relationship exists between the two variables. However, if we can’t reject, we will have insufficient evidence to conclude that a significance relationship exits. The test statistic t = 16.69 and must be less than 0.5(2) alpha = 0.1. Since the p-value (0.00) is less than α = 0.1, we reject Ho and conclude that B1 is not equal to zero. This evidence is sufficient to conclude that a significant relationship exists between time and passengers.
The F test will be conducted to test for an overall significant relationship. In table 2, the calculate F test is equal to 278.72 and the Fα = 4.38 (α=0.5). Ho is equal to B1 = 0 and Ha is equal to B1 ≠0. We will reject Ho if the F value is greater than or equal to F alpha. Since the F value is equal to 278.72 and is greater F alpha, we reject Ho and conclude that a significant relationship exists between the time and passengers.

4.38 278.72

Part 2
We’ve realized that during the presentation, we have left out an important element that contributed to weight which was the passenger’s “luggage”, each with two pieces weighing fifty pounds each. At this time we now go back and factor that element into the model. While inputting the data for the passenger luggage, we’ve encounter that the luggage variables is highly correlated with the passenger variables which is automatically removed from the model equation and now left with the original model from the beginning.
As we continue from the previous model, we can see that one of the residual number is greater than two, meaning that there's an outlier. The Residual also represent the distance between the actual y and the regression line.

Recommendation
Figure 1 Scatter diagram of Time v Passenger
Regression Analysis: Time versus Passengers

The regression equation is
Time = 4.58 + 0.228 Passengers

Predictor Coef SE Coef T P
Constant 4.582 1.170 3.92 0.001
Passengers 0.22756 0.01363 16.69 0.000

S = 1.00350 R-Sq = 93.6% R-Sq(adj) = 93.3% PRESS = 22.5944 R-Sq(pred) = 92.46% Analysis of Variance
Source DF SS MS F P
Regression 1 280.68 280.68 278.72 0.000
Residual Error 19 19.13 1.01
Total 20 299.81

Table 2 ANOVA Table for Time and passengers
Regression Analysis: Time versus Passengers, Luggage
* Luggage is highly correlated with other X variables
* Luggage has been removed from the equation. The regression equation is Time = 4.58 + 0.228 Passengers
Predictor Coef SE Coef T P
Constant 4.582 1.170 3.92 0.001
Passengers 0.22756 0.01363 16.69 0.000
S = 1.00350 R-Sq = 93.6% R-Sq(adj) = 93.3%
PRESS = 22.5944 R-Sq(pred) = 92.46% Analysis of Variance
Source DF SS MS F P
Regression 1 280.68 280.68 278.72 0.000
Residual Error 19 19.13 1.01
Total 20 299.81 Obs Passengers Time Fit SE Fit Residual St Resid 1 66 20.000 19.601 0.332 0.399 0.42 2 80 22.000 22.787 0.227 -0.787 -0.80 3 60 19.000 18.235 0.397 0.765 0.83 4 102 28.000 27.793 0.326 0.207 0.22 5 115 30.000 30.751 0.472 -0.751 -0.85 6 100 29.000 27.338 0.306 1.662 1.74 7 70 20.000 20.511 0.293 -0.511 -0.53 8 65 19.000 19.373 0.342 -0.373 -0.40 9 70 21.000 20.511 0.293 0.489 0.51 10 85 24.000 23.924 0.219 0.076 0.08 11 80 23.000 22.787 0.227 0.213 0.22 12 100 28.000 27.338 0.306 0.662 0.69 13 96 25.000 26.428 0.271 -1.428 -1.48 14 71 20.000 20.739 0.284 -0.739 -0.77 15 75 21.000 21.649 0.253 -0.649 -0.67 16 88 26.000 24.607 0.225 1.393 1.42 17 60 19.000 18.235 0.397 0.765 0.83 18 99 28.000 27.110 0.297 0.890 0.93 19 110 30.000 29.613 0.413 0.387 0.42 20 88 22.000 24.607 0.225 -2.607 -2.67R 21 90 25.000 25.062 0.232 -0.062 -0.06
R denotes an observation with a large standardized residual.
Table 3
Regression Analysis: Time versus Luggage, Passengers, Time of day

* Passengers is highly correlated with other X variables
* Passengers has been removed from the equation.

The regression equation is
Time = 6.29 + 0.00217 Luggage - 0.502 Time of day

Predictor Coef SE Coef T P
Constant 6.292 2.547 2.47 0.024
Luggage 0.0021719 0.0001942 11.19 0.000
Time of day -0.5017 0.6617 -0.76 0.458

S = 1.01492 R-Sq = 93.8% R-Sq(adj) = 93.1%

PRESS = 26.2906 R-Sq(pred) = 91.23%

Analysis of Variance

Source DF SS MS F P
Regression 2 281.27 140.63 136.53 0.000
Residual Error 18 18.54 1.03
Total 20 299.81

Source DF Seq SS
Luggage 1 280.68
Time of day 1 0.59

Obs Luggage Time Fit SE Fit Residual St Resid 1 6600 20.000 19.623 0.337 0.377 0.39 2 8000 22.000 22.664 0.281 -0.664 -0.68 3 6000 19.000 18.320 0.417 0.680 0.73 4 10200 28.000 27.944 0.385 0.056 0.06 5 11500 30.000 30.767 0.478 -0.767 -0.86 6 10000 29.000 27.509 0.384 1.491 1.59 7 7000 20.000 20.492 0.297 -0.492 -0.51 8 6500 19.000 19.406 0.349 -0.406 -0.43 9 7000 21.000 20.492 0.297 0.508 0.52 10 8500 24.000 23.750 0.320 0.250 0.26 11 8000 23.000 22.664 0.281 0.336 0.34 12 10000 28.000 27.509 0.384 0.491 0.52 13 9600 25.000 26.641 0.393 -1.641 -1.75 14 7100 20.000 20.709 0.290 -0.709 -0.73 15 7500 21.000 21.578 0.272 -0.578 -0.59 16 8800 26.000 24.401 0.354 1.599 1.68 17 6000 19.000 18.320 0.417 0.680 0.73 18 9900 28.000 27.292 0.384 0.708 0.75 19 11000 30.000 29.180 0.709 0.820 1.13 X 20 8800 22.000 24.401 0.354 -2.401 -2.52R 21 9000 25.000 25.337 0.432 -0.337 -0.37

R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Table 4

Residual Plots for Time

Figure 2

