You just said it yourself, "Mean and distribution is irrelevant when those people who knows what they want out of the testing is saying the results are not satisfactory. " So why the insistence on "evidence of fact"? What does it matter when people who knows what they want out of the testing, and I'd certainly say Admiral Ma is such a person, is saying that the results are satisfactory?
Mate,
Not so quick. This is what you said "What I draw from the overall statements by Rear Admiral Ma is confirmation that an EMALS catapult is under advanced stages of development, likely with at least one functioning prototype, and that he as a leader on the project is very satisfied with its development and performance either projected or demonstrated."
The whole premise of your proposition is that Mr. Ma is development head of the EMAL program. The scope of his development is unknown. What is known is publicly he has done what he needs to do and is ready to handoff the program to the next phase. I cannot object on reasonable grounds of deduction that he has finished his piece and by that meaning development effort at his end has advanced sufficiently to hand off. What we don't know are many things including :
(i) His development scope and what constitutes his end of the program and the start of the next phase
(ii) His definition of having completed development and what satisfied means.
(iii)The meaning of next phase and what further work entails
Factually all we know is that he is ready to pass the baby to the next person in the chain. Everything outside of that is simply conjecture.
...Rear Admiral Ma Weiming, power and electrical engineering specialist of the Navy of the Chinese People's Liberation Army (PLAN)....
Ma Weiming has won the First Prize of National Scientific and Technological Progress Award and First Prize of the Military Scientific and Technological Progress Award for a number of times. He is called a "national-treasure-class" technical rear admiral.
Rear Admiral Ma Weiming, inventor of China's electromagnetic catapult and specialist in electrical engineering
Pointing to the one star on his uniform, he said he is just a technical rear admiral and is only responsible for developing useable technologies, and that only high-ranking military officials can decide which kind of technical plan is adopted.
Since you want to labour on this point I will oblige by demonstrating you are making a fallacy error of equivocation.
Outside of lab conditions, there is a testing path that Jeff has highlighted.
Firstly you have the dead-load testing. How much of testing is actually required. I have no idea but presumably is a function of scope, issues encountered and reliability of test data generated amongst others. The US to-date has conducted more than 3000. Needless to say it is better than 1. How much dead-load testing has Mr. Ma conducted? So far we don't even know whether he has tested any outside of lab conditions. Facts matters because information can be put in perspective. When the mean cycle of 240 achieved is 5 times higher than target, we know a number of things from it. We know it is from a sample size of 1967 dead-load launches. We know that 5 times of mean cycle is a target of 1200. Translated, we know on average in a normal day, a carrier can launch 120 sorties over a 12 hour period. A targeted failure of 1200 is one failure every 10 days versus one failure every 2 days achieved in testing. The US Navy doesn't accept such a rate of failure - it is a matter of expectation. Others might find it acceptable. So when the test results are stated to be unsatisfactory, we know what it actually means in the context of things. When Mr.Ma says he is satisfied, please explain what is the satisfaction based on.
We also know that US testing has progressed to actual aircraft launches - 452 of it subject to various loading. The only reported issue is excessive release dynamics. It also mean there are no other known issues. In contrast where exactly can you place the test to-date on the Chinese side? Facts do matter.
Regarding the 201 failures out of 1967 thing...
I know that you're trying to use the example to demonstrate that success depends on the metric one wants to use, but the point latenlazy and dingyibvs are making is that, hypothetically, if the 201 failures were all clustered at the beginning of testing (say, the first 201 of 1967 tests), and then the subsequent 1766 tests were all successful, then that would suggest some kind of modification to the system has occurred which has dramatically improved its reliability, so thus a mean failure rate that includes the original 201 consecutive failures would be flawed. Of course like I said this is all hypothetical, and in the original US tests with the 201/1967 failures, the distribution of failures is such that it is probably reasonable to use a mean cycle failure instead.
Imo this entire discussion about the distribution of failures is only tangental to the matter at hand, because we have no idea what the testing stats of the Chinese EMALS is like.
For a matter of principle, I agree with you and think you're obviously correct in saying that the metric of success (such as mean failure rate during testing) is important, but latenlazy and dingyibvs are also correct in saying that hypothetically, if the vast majority of failures are clustered at the beginning of testing and then there were no (or very very few) subsequent failures onwards, then that suggests initial testing was either done incorrectly in a way that exacerbated failure, and/or that the system was modified to correct initial faults that made the system fail and that the updated system no longer suffers from the same defect that caused the initial consecutive/high fail rates.
That is to say, in the hypothetical idea that 201 first launches out of 1967 were failures with 202 onwards being all successful due to a modification of the system, then it would be reasonable to classify test number 202 as test number 1 of a new updated system separate from the first 201 tests.
If the system in this hypothetical scenario was not changed after the 201 failed test, and all subsequent 1766 tests were successful, then obviously it would be a statistical miracle and I'd probably go and buy a lottery ticket or ten if I was part of the project.
tl;dr I think latenlazy and dingyibvs are emphasizing the distribution of failures as indirectly saying that correcting initial defects of a hypothetical system may substantially reduce an initially high failure rate. In that sense, all this stuff about distribution is moot, as one should logically have two testing data sets, one for the system before the modification, and one for after.
Since you want to labour on this point I will oblige by demonstrating you are making a fallacy error of equivocation.
Outside of lab conditions, there is a testing path that Jeff has highlighted.
Firstly you have the dead-load testing. How much of testing is actually required. I have no idea but presumably is a function of scope, issues encountered and reliability of test data generated amongst others. The US to-date has conducted more than 3000. Needless to say it is better than 1. How much dead-load testing has Mr. Ma conducted? So far we don't even know whether he has tested any outside of lab conditions. Facts matters because information can be put in perspective. When the mean cycle of 240 achieved is 5 times higher than target, we know a number of things from it. We know it is from a sample size of 1967 dead-load launches. We know that 5 times of mean cycle is a target of 1200. Translated, we know on average in a normal day, a carrier can launch 120 sorties over a 12 hour period. A targeted failure of 1200 is one failure every 10 days versus one failure every 2 days achieved in testing. The US Navy doesn't accept such a rate of failure - it is a matter of expectation. Others might find it acceptable. So when the test results are stated to be unsatisfactory, we know what it actually means in the context of things. When Mr.Ma says he is satisfied, please explain what is the satisfaction based on.
We also know that US testing has progressed to actual aircraft launches - 452 of it subject to various loading. The only reported issue is excessive release dynamics. It also mean there are no other known issues. In contrast where exactly can you place the test to-date on the Chinese side? Facts do matter.
Primarily as I emphasized, the dissection over statistical possibilities and hypotheticals were not necessary because it doesn't have any bearing on the nature and direction of the conversation. It was simply a misdirection in my view. A proactive program would probably reflect a set of statistics that would show a progressive improvement in test results. We know as a fact that an additional 1000 plus dead-load launches were done subsequently except the results had not been released. If the results showed a positive improvement , the hypothetical consideration would effectively be cancelled. However if the results are still below expectation, then further work needs to be done. Regardless, I fail to see how this issue has any significant bearing on the status of testing of the program and in particular about the Chinese side of things.
Imo this entire discussion about the distribution of failures is only tangental to the matter at hand, because we have no idea what the testing stats of the Chinese EMALS is like.
Primarily as I emphasized, the dissection over statistical possibilities and hypotheticals were not necessary because it doesn't have any bearing on the nature and direction of the conversation. It was simply a misdirection in my view. A proactive program would probably reflect a set of statistics that would show a progressive improvement in test results. We know as a fact that an additional 1000 plus dead-load launches were done subsequently except the results had not been released. If the results showed a positive improvement , the hypothetical consideration would effectively be cancelled. However if the results are still below expectation, then further work needs to be done. Regardless, I fail to see how this issue has any significant bearing on the status of testing of the program and in particular about the Chinese side of things.
I think this entire discussion surrounding the distribution 201/1967 failures has no bearing on the status of testing of the Chinese EMALS program, which is why I said:
I think dingyibvs and latenlazy were making a caveat that a simple mean failure rate count isn't always accurate depending on whether the distribution of failures was a result of something like improving the system and/or incorrect initial testing conditions.
Reading over some of your replies on the last page I'm not sure if you missed the rationale of what they were suggesting?
I'm not sure why the last few pages was contesting this relatively simple principle, that in reality is not even really related to the Chinese EMALS situation.
I think we all implicitly agree that a sensible distribution of failures throughout a series of tests where a system is not modified is a reliable but simple indication of reliability, or at least I've said that in my last post.