Global Data Science Forum

Forecasting with Lead Regression

By Moloy De posted Thu January 14, 2021 09:35 PM

We got a pioneering company manufacturing Roof Shingles in Minnesota, US. Client was progressing towards implementing Industry 4.0 benchmark and developing analytics for their plants is an important activity in it. They implemented 100’s of sensors along their assembly line that are streaming nano-second data to their Spark Data-lake.

Viscosity of input fluid is an important factor to maintain quality of production. Data showed there are unwanted peaks (outliers) in viscosity data which client wanted to eliminate. Following are the steps they thought of
    1.    Monitoring Viscosity continuously in a dashboard at the plant
    2.    Finding the significant contributors in the fluctuations of the Viscosity values
    3.    Perform a root cause analysis (RCA) of the unwanted peaks

We implemented SPARK Repository to hold sensor records and displayed them in a plant dashboard. Multiple regression in R was used to find out significant contributors in viscosity fluctuations. Decision Tree was deployed to perform the RCA for the Web Tears (broken roof shingles) and Lead Regression was used to forecast viscosity. The modelling was successful and was implemented using SparkR.

QUESTION I : How one can detect significant factors from a linear regression?
QUESTION II :  How one can detect significant factors from a decision tree analysis?