Thursday, December 12, 2019

Statistics & Data Analysis for Financial Engineering Case Study

Question: Describe about the Statistics and Data Analysis for Financial Engineering? Answer: Describe the sources of variability present in each data set of Experiment 1. Then using appropriate data displays, describe each data set in Experiment 1, highlighting any similarities or differences that may exist between the two speeds. Pressure and distance are unaltered at a low value in these 40 tests Sample number Low Speed Sample number High Speed 1 69.2 21 150.3 2 74 22 151.1 3 74.4 23 168.7 4 76.2 24 135.9 5 78.9 25 136.8 6 81.9 26 145.5 7 82.1 27 147 8 82.2 28 141.1 9 87.3 29 149.2 10 87.7 30 152.9 11 87.8 31 146.1 12 88.9 32 161.4 13 89.4 33 154.7 14 90.2 34 150.2 15 90.3 35 161.8 16 90.9 36 143.1 17 95.5 37 142.2 18 95.9 38 160.5 19 102.7 39 178.9 20 104.9 40 152.3 Table 1 Note: Data set arranged in descending to ascending order. Calculating the corresponding mean (average), median, variance and standard deviation of high and low speeds, from the above data, so as to analyze various similarities and differences: Low Speed High Speed Mean 86.52 151.485 Median 87.75 150.25 Standard deviation 9.34877 10.67013 Variance 87.39957 113.85186 Table 2 We observe that the mean and median are significantly different in figures for both low speed and high speed, but the standard deviation is not quite comparable, stating that the degrees of values spread out are familiar regardless of the variations in speed. Plotting the corresponding histograms of frequency v/s speed: Fig. 1 Fig. 2 As evident from the above histograms, most of the values vary from 83.49 to 90.62 in low speed data set, and from 144.6 to 153.1 in high speed data set. Table 3 From the table 3 above, we draw following conclusions: There is 95% probability that the true population mean lies in the interval varying from 8.34 to 10.35 in low speed data set and from 9.79 to 11.54 in high speed data set. This derivation requires that the data on the standard deviation in the thickness measurements come from a normal distribution. Calculating the corresponding probability: Below two tables (table 4 and 5), presents the methods of doing non-parametric test. Table 4 Table 5 Beforehand, an assumption is made that for each recorded value, it is either greater or less than corresponding median value with a probability p=0.5 and the number of successes from this experiment follows a binominal distribution which is B (20, 0.5). The range of the median of the standard deviation in the thickness measurements in low speed data set is [82.1, 90.2] at 90% confidence level and [146.1, 152.9] at the same confidence level in high speed measurements. The prime advantage is that its convenient and easier to know the properties of samples without any assumption. The disadvantage being that a large number of samples are required to estimate the variables accurately. The less the samples are, the larger the error is. Parametric Test: Described below, Table 6 and Fig. 3 are for the variables measurements in low speed, Table 7 and Fig. 4 are for high speed. Fig. 3 and Figure4 show us the precision degree for distributions to data. As evident from the Scatter plot, all 3 distributions provide a good fit. Moreover, the normal and log-normal distributions provide the best descriptions of this field data. Table 6 Fig. 3 Table 7 Fig. 4 The advantages of parameter test is that we can find the distribution which fits best to the data, and it is much more accurate than non-parameter test. The disadvantage being that the procedure is more complicated than non-parameter test; it takes ample amount of time to perform the test. Also, it needs plenty of data to evidently support the test. To get the parameterand the model in MS Excel, function LINEST {} is used. Results are noted in Table 8 below, and each corresponding parameter is in Table 9: From Table 9, the general model can be represented as: Y = 327.6148148 + 177.0111111*X1 + 109.4222222*X2 + 131.4722222*X3+ 32.02222222*X1^2 + (-22.37777778)*X2^2 + (-29.06111111)*X3^2 + 66.03333333*X1X2 + 75.45833333*X1X3 + 43.58333333*X2X3 + ei The diagram below evaluates how well the model fits the data. Fig. 5 Fig. 5, axis X means predicted value y, axis Y means actual value y (from experiment). We observe most of the predicted values are close to the actual values. Because of the randomness of sampling, it is very reasonable to see some values deflect from the actual values. Concluding, the model fits the data appropriately. Now, writing from model 1:and so on. It means that the bigger the absolute value of is, the more the parameter will influence the value of y. So we can regard as degree of the effect from corresponding parameter. If the is positive, it means that y is increasing while x is increasing (positive correlation). On the contrary, the negative sign means that y is decreasing while x is increasing (negative correlation). From Table 9, we know that 1, 2, 3 are greater than other except 0, so that we can conclude that correspondingly X1, X2, X3 are statistically significant for the result. Using values from previous question, we can get the simplified version of the model as: Y = 327.6148148 + 177.0111111*X1 + 109.4222222*X2 + 131.4722222*X3 + ei However, after practical testifying, it turns out that that this simplified model doesnt fit the data as appropriate as: Y = 327.6148148 + 177.0111111*X1 + 109.4222222*X2 Fig. 6 Fig.6, axis X plots predicted value y, axis Y plots actual value y (from experiment. Fig. 7: general model Fig. 7 The model plot above is with 95% confidential interval bands (black spots) and 95% prediction bands (red spots). We observe that the model fits the data quite accurately when the predicted value is small, but when the predicted value is increased, it does not fit the data within 95% confidence. Standard deviation means the degrees of values being spread out and required characteristic is the corresponding thickness of the coating material to the wafer be uniform. On calculating the minimum value from Table 10, we get 41.18148. And, if we are considering X3, the distance, the minimum value should be 24 when the value of distance is -1. References: Ruppert D. 2013.Statistics and Data Analysis for Financial Engineering (Springer Texts in Statistics). 2011 Edition. Springer. Dukkipati R. V. 2010.Probability and Statistics for Scientists and Engineers. Edition. New Age International Pvt Ltd Publishers. Durbin J. 2012.Time Series Analysis by State Space Methods (Oxford Statistical Science). 2 Edition. Oxford Univ Pr (Txt). Ramalingam K. 2010.Power System Transients: A Statistical Approach. 2nd edition Edition. Prentice-Hall of India Pvt.Ltd. Nagla J. R. 2014.Statistics for Textile Engineers (Woodhead Publishing India in Textiles). Edition. WPI Publishing.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.