In recent years, we have seen advances in AI being increasingly applied in the domain of software engineering, which has resulted in the development of AI-assisted techniques for addressing a range of software-engineering problems, such as automated program synthesis, code translation, code summarization, code completion, test input and test oracle generation, fault localization, and program repair (e.g., [1-6]). To help drive new innovations in this area, IBM Research launched Project CodeNet in May 2021. Project CodeNet provides a rich dataset, consisting of 14 million code samples and 500 million lines of code in 55+ programming languages, along with tools aimed at advancing research in AI for code.
As part of the Project CodeNet initiative, IBM Software and IBM Research have launched Project Minerva for Modernization, which aims to enable AI-driven modernization of legacy applications. In particular, one of the goals of Project Minerva is to assist with evolving monolithic application architectures toward a modern, microservices-based architecture. The Minerva tools employ program-analysis techniques, combining them with AI and ML algorithms, to provide recommendations for decomposing a monolithic application into partitions, which can serve as starting points for microservices.
The initial tool set released in Minerva consists of the following:
-
Java binary analyzer, a tool that performs static analysis of Java application code in order to collect data for AI refactoring consideration.
-
Java binary instrumenter, a tool for adding instrumentation probes, at runtime, to the classes and methods of a Java application for collecting dynamic data for AI refactoring consideration.
-
CARGO, a graph-partitioning tool that performs community detection, via novel context-sensitive label propagation, to recommend decomposition of an application's classes into partitions.
We envision Minerva for Modernization to grow into a community that brings together industry practitioners, academic researchers, and open-source developers helping build an open ecosystem of tools to support the application-modernization journey.
To mark the launch of Minerva and invite community participation, we announce two coding challenges designed around the initial Minerva tools.
Coding Challenge 1: Application Instrumentation
Problem Statement
The goal of this challenge is to develop solutions that improve upon the time and space overhead of the Minerva Java instrumenter. The Minerva instrumenter adds probes at the entry and each exit point of a method to record information about method calls and returns that occur at runtime. The added probes impose a runtime overhead on the application and generate trace files. A submission in this challenge should improve upon one or both of these dimensions by reducing runtime overhead and/or reducing the size of the generated trace files, while recording the same type of information as contained in the Minerva-generated traces (i.e., there is no loss in the information content even though the traces are more compact). A solution may be implemented as a modification of the Minerva instrumenter or as a separate Java-based tool, built leveraging other open-source libraries.
Evaluation Procedure and Dataset
The submitted solutions will be evaluated over an open-source Java application and set of test cases (to be provided), against baselines consisting of the runtime overhead of the Minerva-instrumented application and the cumulative size of the generated trace files. The baseline dataset, consisting of execution traces and data on runtime overhead, will be provided together with the benchmark application.
Submission Materials
The submission material should consist of a link to a GitHub repository containing the instrumenter source code, along with the execution traces generated on the benchmark application and a summary of the improvements implemented in the instrumenter and data on how the instrumenter outperforms the Minerva instrumenter. More details and precise submission instructions will be provided in early 2023.
Coding Challenge 2: Application Partitioning
Problem Statement
The goal of this challenge is to develop partitioning techniques that improve upon the partitioning performed by CARGO. The CARGO partitioning technique employs a novel label-propagation algorithm over a context-sensitive program dependence graph that captures static call relations, data dependencies, heap dependencies, and code-database transaction dependencies [7]. It computes partitions as disjoint groupings of the set of Java classes in the monolithic application. The static analysis for constructing the program dependence graph is implemented in the Konveyor Data Gravity Insights tool, which is leveraged by CARGO.
A submission in this challenge should consist of the implementation of a partitioning algorithm that improves CARGO’s partitioning, where improvement is measured using the metrics described in Reference [7]. These metrics measure the partitioning quality from different perspectives, such as database transactional purity, business context purity, cohesion, coupling, etc. Improvement over CARGO’s partition recommendations may be achieved by means of (1) enriching the program dependence graph with additional types of information not currently used by CARGO and/or (2) implementing a different partitioning algorithm than the one used by CARGO. Thus, a solution may be developed by taking either or both of these approaches.
Evaluation Procedure and Dataset
The submitted solutions will be evaluated over the set of benchmark applications and the partitioning-quality metrics described in [7]. Because the evaluation involves multiple metrics, submissions will be scored and ranked on each metric. The baseline dataset---consisting of benchmark applications, inputs to CARGO, partition recommendations generated by CARGO, and the partitioning-quality scores on the metrics---will be provided.
Submission Materials
The submission material should consist of a link to a GitHub repository containing the source code of the partitioning tool, along with clear instructions to generate partitioning recommendations computed by the tool for the benchmark applications. Also provide a documentation of the improvements implemented in the partitioning tool, and evidence on how it outperforms CARGO on partition recommendations. More details and precise submission instructions will be provided in early 2023.
Important Dates
The coding challenges will run in the first half of 2023---watch this space for more information coming in early 2023.
Submission Site
A submission site will be created for the Minerva coding challenges---watch this space for more information coming in early 2023.
References
- S. Wang et al. Machine/Deep Learning for Software Engineering: A Systematic Literature Review. IEEE Transactions on Software Engineering, 2022, doi: 10.1109/TSE.2022.3173346.
- S. Shafiq, A. Mashkoor, C. Mayr-Dorn and A. Egyed. A Literature Review of Using Machine Learning in Software Development Life Cycle Stages. IEEE Access, vol. 9, pp. 140896-140920, 2021, doi: 10.1109/ACCESS.2021.3119746.
- T. Sharma, M. Kechagia, S. Georgiou, R. Tiwari, and F. Sarro. A Survey on Machine Learning Techniques for Source Code Analysis. CoRR abs/2110.09610 (2021), https://arxiv.org/abs/2110.09610
- T. H. M. Le, H. Chen, and M. A. Babar. Deep Learning for Source Code Modeling and Generation: Models, Applications, and Challenges. ACM Comput. Surv. 53, 3, Article 62 (May 2021), https://doi.org/10.1145/3383458
- E. Dehaerne, B. Dey, S. Halder, S. De Gendt and W. Meert. Code Generation Using Machine Learning: A Systematic Review. IEEE Access, vol. 10, pp. 82434-82455, 2022, doi: 10.1109/ACCESS.2022.3196347.
- V. H. S. Durelli et al. Machine Learning Applied to Software Testing: A Systematic Mapping Study. IEEE Transactions on Reliability, vol. 68, no. 3, pp. 1189-1212, Sept. 2019, doi: 10.1109/TR.2019.2892517.
- V. Nitin, S. Asthana, B. Ray and R. Krishna. CARGO: AI-Guided Dependency Analysis for Migrating Monolithic Applications to Microservices Architecture. Proceedings of ASE, 2022, https://arxiv.org/pdf/2207.11784.pdf
#GlobalAIandDataScience#GlobalDataScience