051 – “Integrated Data Environment for Automated Labeling (IDEAL)”

Author: Nathan Rigoni
Company: US Army AMRDEC
Phone: (256) 876-4620
Email: nathaniel.c.rigoni.civ@mail.mil

Integrated Data Environment for Automated Labeling (IDEAL) Maintenance Innovation Challenge DoD Maintenance Symposium Army rotorcraft routinely capture a large amount of sensor data covering a range of information from engine performance and gearbox vibration to aircraft orientation and outside environment. This data acquisition strategy has been deployed for more than ten years, leading to a very large archive of historical fleet data. Natural language data, such as maintenance logbook and depot repair records, are also generated and stored in disparate databases.

The Army derives great value from the large amount of data it collects and cleanses, but often at great cost. Cleansing programs provide significant benefits, but depend upon SMEs who must manually evaluate the data recorded for each event. These projects are notably resource-limited: cleansing only 10% of logbook entries and documenting failures of only a small set of selected components.

The sensor and natural language data are only useful when processed together, creating ground truth labels essential for SME-guided, data-driven machine learning and analytics. The authors are investigating a potential solution to this problem.

Data Programming is a cutting-edge paradigm to automatically generate large labeled datasets.
The collaborative team thus introduces the Integrated Data Environment for Automated Labeling (IDEAL), a technology based on the principles of data programming. IDEAL learns the heuristics SMEs use to interpret and label datasets. It then uses its model of the SME labeling process to label massive data collections that might otherwise require years and/or millions of dollars to accomplish.

IDEAL is an environment that uses labeling functions from multiple SMEs who govern the human interpretation of data. Consider a trivial example: how an SME might determine that a rotor blade damper requires service. IDEAL would request labeling functions from the SMEs, whom might provide the following: rotor system balance is out of limits, blue blade is lagging all other blades, installed damper is near the end of service life, and visual inspection indicates damper fluid is low.

On their own, labeling functions have different accuracies, but IDEAL learns the correlations across the SME-provided functions. As more labeling functions are given to IDEAL, the better it becomes at learning the heuristics used by the SMEs to interpret data. It looks across datasets; it balances labeling functions governing natural language and HUMS data.

IDEAL iterates with the SME to determine if the learned model scales up by providing feedback as the algorithm learns. In this way, the SME can monitor the algorithm to determine if additional labeling functions are required prior to its use in generating large labeled training datasets. The goal of IDEAL is to enable a holistic CBM analysis by generalizing its capabilities to all data describing a rotorcraft platform.

The team is developing IDEAL for Army aviation problems. Success will result in a force multiplier for SMEs and successful data-driven analytics for CBM.