AI Data Mining & Cleansing

The initial goal of Data Mining in the 8-12 Week “AIT Trial Period” is for the AIT/Client team to extract an accurate estimate of the total waste in the company due to setup time, scrap, rework, machine downtime, poor supplier performance, lost customer good-will due to late delivery, etc. These wastes typically rob a company of 10-20% of revenue. AI can eliminate 75% of these wastes, resulting in 15% more capacity with no increase in manpower or Capital investment in addition to >95% on-time delivery. The “AIT Trial” Period is used to estimate the actual waste potential on the client’s process for management consideration of further AI deployment.

Raw accounting data from the ERP system consists of normal data variation plus outliers. Outliers ruin the average and wreak havoc on the all-important Standard Deviation since it squares all values. One example of the source of outliers is human “clock in – clock out” error. We found a job in which an operator had clocked into a job that normally took about 10 hours, but he clocked out after 100 hours. (see graph)

Now if we plug in all the data on the graph into any Statistical Program, we obtain an Average of 13 and a standard deviation of 21. If we remove the outlier based on statistical analysis, we obtain a valid representation of the actual process variation which is an average of 9.7 and a standard deviation of 9.8. While the average is increased by 30% by one outlier in a population of 30 data points, the standard deviation is increased by 114% due to its quadratic nature!

Outliers therefore do not represent the process and must be cleansed from the data to provide accurate guidance for process improvement activities. Outliers may be the result lack of training of the Operators or Mistake proofing of their input, or poor Sensor performance. Hence the accuracy of sensors must also be frequently investigated. AI Accelerator Data Mining & Cleansing uses several statistical tests to identify outliers on a daily basis. The outlier report must be addressed by the appropriate level of management so that corrective action can be taken and any corrections to average and standard deviation can be effected to provide valid data to those involved with process improvement.


“Lean Sigma in the Age of Artificial Intelligence”   Pages 5-7

Figure 1a - Example of Clock In / Clock Out Error