Community Library

View Only

Back to Library

Abstract - Artificial Intelligence for Test and Evaluation

12-19-2025 11:16 AM

Francis McIntire

ITEA Annual Symposium 2024

Paper Presentation

Artificial Intelligence for Test & Evaluation

July 2024

Francis E. McIntire

frank@golzup.com

(571) 982-0754

Approved for public release.

2024 Francis E. McIntire

Dunwoody Press

Denver, Colorado

Abstract

Introduction: As a Test Director or Test Manager you understand the pressure of the Developmental Test (DT) ‘Scrunch’ that operates like a vice to ‘squeeze’ the Operational Test community between system development and system Operational Test & Evaluation (OT&E) for release to the warfighter.

Bottom Line Up Front: This paper and presentation will show the utilities available from Artificial Intelligence (AI), Machine Learning (ML), and Data Science to anticipate:

1. Test Planning to include the key factors of operational testing for systems that are being promised ‘on time’ but may be delayed in the release from developmental testing,

2. Test Execution to include the best practices and lessons learned from similar systems under test that will help optimize the test and evaluation (for employment) when the test article is released from development for operational testing, and

3. Test Reporting to provide leaders and decision-makers the optimal format and presentation including the one-page introduction (bottom line up front) and the five-page summary that delivers the key findings and recommendations for the system under test

Deep dive into the use of current AI, ML, and Data Science capabilities to accelerate Operational Test Planning, Test Execution, and Test Reporting to optimize time and resources available for Test Execution during the Developmental Testing ‘scrunch”. This includes the identification of the key factors impacting the successful test and evaluation (for employment) of similar systems under test and the anticipated outcomes based on similar test articles. The output from AI, ML, and Data Science tools provides inputs to accelerate Test Planning and Test Reporting and improve the timeliness and accuracy of Test Execution through the identification of key factors impacting successful test and evaluation and the negative impacts of task and resource dependencies (limitations).

KEYWORDS: Test and Evaluation, Artificial Intelligence, Machine Learning.

Section 1

Introduction

This paper introduces the approach for inculcating the tenets of Artificial Intelligence (AI) into traditional Operational Test & Evaluation (T&E). A published work AI FOBB Academy, Vol 9 – Artificial Intelligence for Test & Evaluation Slim Jim is referenced. This work addresses the implementation of Artificial Intelligence for a number of separate domains for Operational Test & Evaluation including the Department of Defense, Department of Energy and the National Nuclear Security Agency, and branches of the Armed Forces. As systems move through development and developmental testing (DT) to traditional operational test and evaluation (OT&E) they experience the problematic DT ‘scrunch’ as a DT schedule overrun that threatens the efficacy of traditional T&E. A case study is included to demonstrate the methodology for implementing AI for T&E.

Section 2

Literature Search & Research

The Artificial Intelligence (AI) FOBB Academy includes the industry standard levels of AI fluency to address the Artificial Intelligence suite of capabilities (Machine Learning, Automation, Robotics, Data Science, Natural Language Programming, Neural Networks, etc), also DevSecOps, and Cyber capabilities (offensive & defensive).

AI FOBB Academy, Vol 9 – Artificial Intelligence for Test & Evaluation Slim Jim (McIntire et al, 2024)

Golzup, Volume 3 – Cyber Security Toolkit (McIntire, 2014)

GovWin - Government Contracting Market Analysis (govwin.com)

Defense dot Gov and Govconwire dot Com

Level	Name	Description
0	None	No knowledge of AI, Machine Learning, Data Science, Natural Language Processing, Automation, Robotics, and Neural Networks.
1	Conversational	Understanding of AI approaches and risks from a conversational level. No understanding or knowledge of how to solve a problem using AI.
2	Fundamental	Ability to listen to a description of a customer problem and recognize the features of the problem that make it a good example for application of AI, ML, or data science (tagged data sets).
3	Functional	Has the ability to use “no code” or “low code” applications to accomplish goals or solve a problem and with minimal programming skills.
4	Proficient	Ability to write AI programming using AI languages (e.g., Python), integrate AI to solve problems (research and development R&D) (test and evaluation T&E) and to develop effective solutions.
5	Expert	AI experience as an expert in at least one branch of AI (Machine Learning, Natural Language Processing, Automation, and Robotics). also: Neural Networks, Data Science, and tagged data sets

Figure 1 – Artificial Intelligence Fluency

Section 3

Artificial Intelligence and T&E Use Cases

The AI FOBB Academy overview is designed to recognize and understand best AI use cases:

1. Present and attend the FOBB AI Leader, Implementer, and Integrator (teams, bids) GOAL optimize BBAI SmartFit (Deltek and GovWin) for the breakdowns (e.g., NAICS); Check the Deltek GovWin Smart Summary for corporate contracts to optimize stature

2. Listen to customer needs, introduce AI and AI Center of Excellence at the right time for each audience

3. Become familiar with AI and ML technology including Natural Language Processing, Automation, and Robotics). also: Neural Networks, Data Science, and tagged data sets

4. Recognize the characteristics of the problem (question) and the data available to support optimal solutions

5. Use AI as a discriminator for building high performing teams and winning bids including Levels of AI learning from LinkedIn.

LinkedIn Level 1: Conversational AI. Introduction to Artificial Intelligence and Get Ready for Generative AI and LinkedIn: AI, ML, and Data Science – What’s the Difference? Phil Winder & Feynman Liang -GOTO 2019. LinkedIn AI Academy AI-100: 1 Demystifying AI and LinkedIn: Artificial Intelligence and Business Strategy and Safeguarding AI the LinkedIn: Executive Guide to AutoML (and the LinkedIn program Predictive Analytics Essential Training for Executives for the CRISP – DM phases)

LinkedIn Level 2. Fundamental AI. AI in Business Essential Training, and No-Code AI: Harness the Power of AI without Programming and LinkedIn: Learning XAI: Explainable Artificial Intelligence, and AutoML: Build Production-Ready Models Quickly (Python pre-reqs, LI Python Essential Training, Python for Data Science Essential Training, Python Essential Libraries Projects); and LinkedIn: Power BI: Integrating AI and Machine Learning, and Data Rights Foundations.

LinkedIn Level 3. Functional AI. Machine Learning Foundations: Probability; LinkedIn: Artificial Intelligence Foundations: Neural Networks, and LinkedIn: Artificial Intelligence Foundations: Machine Learning, and LinkedIn: Machine Learning with Python: Foundations, and LinkedIn: Machine Learning with Python: k-Means Clustering, and Learning XAI: Explainable Artificial Intelligence, and Machine Learning with Python: Decision Trees, LinkedIn: Introducing AI to Your Organization; LinkedIn: Becoming an AI-First Product Leader; and LinkedIn: Introduction to Amazon Bracket: Quantum Computing on AWS.

LinkedIn Level 4. Proficient AI. LinkedIn: MLOps Essentials: Model Deployment and Monitoring (Joe Zirilli), and Machine Learning and AI Foundations: Classification Modeling, and LinkedIn: TensorFlow: Neural Networks and Working with Tables, and TensorFlow 2.0: Working with Images, and TensorFlow: Working with NLP (neural networks and natural language processing), and Deep Learning: Model Optimization and Tuning, and LinkedIn: Advanced Python: Classes and Functions, and LinkedIn: Applied Machine Learning: Ensemble Learning, and Reinforcement Learning Foundations, LI: Build Production-Ready Models Quickly!, and LI: MLOps Essentials: Model Development and Integration, LI: Essentials of MLOps with Azure: 1 Introduction, and LI: Transitioning into Machine Learning Engineering

Examples of AutoML vendors AutoML focused: Amazon SageMaker, Dataiku, Driverless AI, Gartner report on: “Multipersona Data Science and Machine Learning Platforms”

Level 5. Expert AI. LI: Applied Machine Learning: Ensemble Learning, and LI: Applied AI: Getting Started with Hugging Face Transformers, and LI: Applied AI: building NLP Apps with Hugging Face Transformers, and LI: A Code-Driven Introduction to Reinforcement Learning Phil Winder GOTO 2020, and LI: The Road to General AI – Danny Lange GOTO 2020

Figure 2 – Use Case Scenarios

1. Army Space, Air Force Space, Space Force, and Navy Space strategy, roles & missions

2. Ground control, satellite control, command and control, and signals from space

3. Missile defense, space warfare, force projection, hypersonic & laser technology

4. Laser offensive and defensive technology (ground, air, and space-based COIL)

5. Satellite launch control, satellite command & control, constellation management

6. On orbit asset management, offensive and defensive capabilities, new technologies

7. OT&E with AI for ground, air, and spaced based C2 / C5ISR warfighter systems

Section 4

Key Commonalties

Requirements

The T&E process is founded in requirements, including Critical Technical Parameters, Critical Operational Issues, and Measures of Performance and Effectiveness. The maturity of this process provides an excellent benchmark for the evolution of the VV&A process. In the same way that the T&E process assesses operational system performance, the VV&A process assesses M&S credibility. The DOD Generic VV&A Process, described in the RPG, begins by identifying the problem to be solved and the requirements for solving that problem. The next step is to determine the problem solving approach. M&S is one tool for problem solving, but other tools may also be used to arrive at a solution. Given that at least part of the solution will be arrived at through M&S, general requirements for model capabilities are identified. Depending on these general requirements, the problem solver may be able to use an existing model either "as is" or modified, or a new model may need to be developed. Once that decision is made, requirements for the specific model(s) chosen are established and the model is prepared for use.

While the T&E process is rooted in requirements' definition, the VV&A process has not yet learned the lesson or importance of requirements. Many programs attempt to avoid requirements' definition or make unfounded assumptions. One assumption is that M&S is the correct tool to use where another tool might be easier or less costly for the given problem. Another assumption is in choosing a specific model without rationale or basis in requirements. A poor choice of model may result in invalid results where the model chosen was not built to answer particular types of questions. Sometimes these problems are due to unfamiliarity with the VV&A process, although there have also been instances where sub- optimal decisions were intentionally made. Such decisions often reflected a desire to maximize resources in other areas or to placate a decision maker who had already decided what tool would be used. In a few cases, requirements have been "tailored" out of the VV&A process. "Tailoring" is a VV&A term that describes the focusing of a well-planned VV&A effort on those tasks that will provide optimal return on investment. It is the process of selecting which V&V tasks and techniques will provide the most expedient and credible results by which the model can be assessed. Requirements' definition, however, is not open to negotiation or tailoring. Common sense dictates that in order to credibly assess a simulation, one must know what the simulation is supposed to do!

Management

The T&E process is well established and is understood by a large community of developers, testers, and managers. By comparison, the VV&A process is relatively new. The T&E process utilizes mature methods that provide excellent examples which VV&A would do well to emulate. For example, the TEMP requires that responsibilities for each segment of the testing community be delineated. Another example is the approval process for the TEMP and other testing documents, which requires negotiation and compromise among participating organizations prior to the start of a T&E effort. By comparison, VV&A efforts reflect a wide variety of dissimilar and nonstandard approaches, many of which are incomplete or which unnecessarily delay the start of the VV&A process. Identification of roles and responsibilities -- who will do what for whom, when, where, why and for how much - is essential prior to the start of any VV&A effort. Unfortunately, many programs do not learn this lesson until after they have expended large sums of time and money, losing the optimal window of opportunity, during the development of the simulation. Often resources are wasted on educating the contractors who are supposed to understand and implement VV&A, thereby adding to the reputation of VV&A being costly.

Documentation

The T&E process is characterized by clearly defined documentation. Although common reporting formats for VV&A were developed for DOD, many programs avoid committing implementation details to writing. Initial attempts at writing a VV&A plan often include large tutorials that have been written at considerable expense. No new information is offered in such treatises despite claims of "tailoring" to meet "unique" program needs. However, the veracity of such claims have been few. Where the T&E process details the specific information requirements and criteria for assessing the system under test, most VV&A plans to date have failed to provide the executable detail needed to perform V&V. Specific V&V tasks and techniques must be identified and linked to specific portions of the problem to ensure that those tasks are indeed necessary. Unfortunately, the combined lack of stated requirements and the absence of executable V&V detail result in VV&A plans that merely provide a high level strategy, but never provide clear direction and action.

The evolving Simulation Based Acquisition concept will employ synthetic environments and digital representations of evolving systems. This will require disciplined implementation of requirements' traceability, sound management processes and thorough documentation. With only digital models and databases, the software products will be based upon proven software development processes. The VV&A community will have to actively engage these practices to ensure that VV&A is not "assumed away" under the context of good software development processes, or replaced altogether as an "unnecessary expense" to program offices.

JWARS Case Study

The JWARS program office initiated a V&V effort in October 1997 to support the production version of JWARS. Additionally, the Joint Requirements Oversight Council (JROC) directed that a test and evaluation plan be required as part of the Operational Requirements Document (ORD). T&E planing began in December 1997, using the V&V Plan as its initial point of departure.

JWARS presents a new form of modeling and simulation use in DOD. Where M&S has historically been used to support development of weapons systems and other tangible assets, the JWARS simulation is itself the system under test. As such, the system requires some form of T&E, while its formulation as a simulation requires that JWARS also undergo V&V, under DOD Instruction 5000.61. The relationship of these two processes has been the subject of research being conducted to support other DOD programs facing similar dilemmas. That research has been applied to the evolution of an integrated JWARS T&E / V&V strategy.

The JWARS V&V effort is distinct from the in-house quality assurance being conducted by the developer. It is independent in the sense that the V&V contractor reports to the oversight body for JWARS development, the Joint Analytic Modeling Improvement Program (JAMIP). The conduct of additional V&V beyond developer QA is evidence of the commitment on the part of the program office to provide a product that is useful and usable by the warfare analysis community. During the first quarter of FY98, the V&V contractor developed a V&V plan. The RPG was used as the primary resource for the development of the plan. The V&V contractor also worked closely with an oversight group of recognized DOD VV&A experts who provided input and direction for the plan's development. The V&V plan is currently under Service and CINC review. It has received favorable comments from many of the key reviewers, although significant concern for supporting resources and the V&V relationship to T&E has been stressed.

JWARS is an ACAT III program, therefore formal test and evaluation per DOD acquisition directives is not required. However, the JWARS Program Office has elected to use the Test and Evaluation Master Plan (TEMP) format to guide the development of a T&E plan. T&E planning and execution is occurring in parallel with the V&V effort and is leveraging from the V&V Plan to ensure coordination between these two processes.

JWARS T&E differs from traditional program T&E in two significant ways. First, to support its T&E initiative, the JWARS program has recognized the need to involve both the Services' T&E and analytical agencies. The T&E agencies are the traditional sources for test and evaluation support to the Services and would naturally be sought for their expertise during this process. However, the Operational Test Directors (OTDs) at these agencies are primarily warfighters who test hardware -- platforms, weapons systems, and equipment –

which is distinctly different from "analytical software" such as JWARS. Whereas the OTDs represent the military users of hardware systems, the military analysts are the targeted user community for JWARS. The operational test of a weapons system often involves hands-on use of the system in the field by military operators. Similarly, operational test of a simulation requires use by personnel trained and experienced in that arena. For JWARS, those personnel are the analysts who are trained in the use of theater-level simulation. The analysis agencies will serve as test sites and the OT&E agencies will provide oversight and report on the testing conducted.

A second difference is JWARS' intended use of alpha and beta testing. Although conducted at field sites and involving potential users, the primary purpose of these tests is to provide feedback to the developer. JWARS intends to use the alpha and beta testing phases to support both the developer's quality assurance program, and to provide the military user community with early opportunities to become familiar with the simulation

The JWARS V&V Plan identifies V&V techniques from the RPG, however due to limited resources, problem domain validation will be restricted to face validation by subject matter experts. This technique, while necessary, is not sufficient for credible validation of a simulation of the magnitude and criticality of JWARS. Therefore, the T&E effort has been focused on extending the validation envelope through additional test techniques that meet both V&V and T&E objectives. In their capacity as T&E/ V&V oversight support to the program office, MITRE developed a crosswalk between the V&V Plan (as recommended in RPG) and the TEMP (as described in DODR 5000.2-R). This provided important information regarding the information overlaps which exist between the two documents, and identified where existing information in the V&V Plan could be leveraged as immediate input to the T&E plan. The value of this approach is the reduction of duplication of effort, thereby saving time and money, while ensuring that these processes mesh and complement each other.

A Working Group Integrated Process Team (WGIPT) was established to develop the strategy, identify test activities and testers, ensure the correct conduct and documentation of test events, review test results, and provide recommendations to the JAMIP. The WGIPT consists of representatives from the Services' analysis organizations, the Services' T&E agencies, OSD, J-8, and the JWARS program office. Advisors to the WGIPT included the JWARS developer, MITRE, and representatives from the Joint Data System (JDS). An IT&E contractor will prepare test plans for the WGIPT's approval, provide periodic status briefings to the WGIPT, coordinate required memoranda of agreement, and document T&E results. An initial concept for JWARS T&E was developed and presented to the WGIPT in March 1998. This concept focused on testing of the Planning & Execution and Force Assessment applications of JWARS prior to IOC. Systems Effectiveness & Tradeoff Analysis and Concept & Doctrine Development and Assessment was identified for later testing. A set of proposed performance measures was also provided, with traceability, utility, and V&V highlighted as the three key performance parameters. An additional briefing

described the Fielding Plan developed by J-8, which provides a detailed description of the logistic implementation for JWARS fielding at identified test sites. The Fielding Plan included designation of the level of testing at which each test site would participate. J-8 also provided a macro-level T&E process diagram that enumerated the various tasks and responsibilities for T&E. This was used to develop a strawman partitioning of lower level tasks and apportionment of those tasks among the various players. These products are currently under Service and CINC review.

Both the T&E and V&V processes require identification and agreement of roles and responsibilities before beginning either process. Key to the testing of JWARS are the roles of the traditional T&E agencies and the eventual users of the simulation, the analytical organizations. Similarly, there is a need for a balanced perspective among the Services, and in relation to OSD and the Joint Staff. While the WGIPT reflects a reasonable balance among these various players, there remain significant decisions to be made regarding the tests that each test site will perform and the method for supporting the test events.

Outstanding technical issues include the relationship of the T&E effort with the V&V plan, which is only slightly more mature, but progressing. Despite this early positive direction, the V&V effort did not produce an executable plan of specific tasks and techniques. The T&E effort must, therefore, focus on specifying measures of performance, critical technical parameters, and critical operational issues; and developing comprehensive, executable test plans that incorporate these necessary elements.

The JWARS T&E schedule provides timeframes for task identification and assignment, determination of assessment criteria for those tasks, negotiation for test site support, and identification of test events and resources. This schedule is both reasonable and necessary. The use of the T&E community's proven methodology for planning a T&E effort is highly appropriate and is the most effective use of time and resources. The JWARS program office is committed to building a useful, usable simulation product for DoD, and an optimal level of success can be achieved with the participation of JWARS' potential customers, the Services and CINCs.

Conclusions

C2 and C5ISR systems benefit from AI utilities that support OT&E .

Implementation of AI for OT&E support the dual objectives of:

1. Effective operational test planning, test execution, and test reporting for the system under test, and

2. Anticipating the accelerated OT&E schedule in light of the typical developmental testing schedule delays (aka the “DT scrunch”)

List of References

1. McIntire, F. E. 2024. “AI FOBB Academy, Volume 9 – Artificial Intelligence for Test & Evaluation Slim Jim”

2. McIntire, F. E. 2024. “AI FOBB Academy, Volume 1 – Artificial Intelligence for Test & Evaluation”

3. Allen, H. P.; P. B. Burleson and P. Glasow. 1997. "The Relationship of VV&A to T&E." Proceedings of the 1997 Summer Simulation Conference, Society for Computer Simulation.

4. Department of Defense Instruction 5000.61, DOD Modeling and Simulation (M&S) Verification, Validation and Accreditation (VV&A), April 29, 1996.

5. Department of Defense Regulation 5000.2-R, Mandatory Procedures for Major Defense Acquisition Programs (MDAPS) and Major Automated Information Systems (MAIS) Acquisition Programs, March 15, 1996.

6. Glasow, P. (ed.). 1996. "Department of Defense Verification, Validation and Accreditation Recommended Practices Guide." Defense Modeling and Simulation Office.

Statistics

0 Favorited

13 Views

1 Files

0 Shares

4 Downloads

Attachment(s)

Symposium Paper ITEA Nov 2024 Frank McIntire.docx 157 KB 1 version
Uploaded - 12-19-2025

Download