FABIO PALOMBAAssistant Professor
.01

ABOUT

PERSONAL DETAILS
Via Giovanni Paolo II, 132, 84084 Fisciano

BIO

ABOUT ME

Fabio Palomba is an Assistant Professor at the Software Engineering (SeSa) Lab (within the Department of Computer Science) of the University of Salerno. He received the European PhD degree in Management & Information Technology in 2017. His PhD Thesis was the recipient of the 2017 IEEE Computer Society Best PhD Thesis Award.

His research interests include software maintenance and evolution, empirical software engineering, source code quality, and mining software repositories. He was the recipient of two ACM/SIGSOFT and one IEEE/TCSE Distinguished Paper Awards at the IEEE/ACM International Conference on Automated Software Engineering (ASE'13), the International Conference on Software Engineering (ICSE'15), and the IEEE International Conference on Software Maintenance and Evolution (ICSME'17), respectively, and Best Paper Awards at the ACM Computer Supported Cooperative Work (CSCW'18) and the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER'18). In 2019 he was the recipient of an SNSF Ambizione grant, one of the most prestigious individual research grants in Europe, while in 2023 he has awarded with the prestigious IEEE Computer Society Technical Council of Software Engineering Rising Star Award, an early research career recognition assigned for the outstanding contribution to the field of code refactoring and code smells.

He serves and has served as a program committee member of various international conferences (e.g., IEEE/ACM International Conference on Software Engineering, IEEE International Conference on Program Comprehension, IEEE International Conference on Software Maintenance and Evolution), and as referee for various international journals (e.g., IEEE Transactions on Software Engineering, Springer's Empirical Software Engineering Journal, Elsevier's Journal of Systems and Software) in the field of software engineering.

He is a member of the Steering Committee of ICPC (elected in 2021). He has been program co-chair of SANER 2024 and ICPC 2021, industrial track co-chair of SANER 2022, NIER/ERA track co-chair of ASE 2022, SCAM 2022, and MobileSoft 2022, FOSS Award co-chair of MSR 2022, other than program co-chair of MaLTeSQuE 2018 and 2019. In addition, he has been a member of the organizing committee of ICPC 2015 and SANER 2018. Since 2022, he is Editorial Board Member of the Elsevier's Information and Software Technology Journal (IST). Since 2021, he is Editorial Board Member of the Springer's Empirical Software Engineering Journal (EMSE), where he already was Review Board Member since 2016, and the e-Informatica Software Engineering Journal (EISEJ). Since 2020 he is Review Board Member of the IEEE Transactions on Software Engineering. From 2019 to 2022, he has been Editorial Board Member of ACM Transactions on Software Engineering and Methodology (TOSEM). Since 2019, he is an Editorial Board Member of Elsevier's Journal of Systems and Software (JSS), and Elsevier's Science of Computer Programming (SCICO). For his reviewing activities, he was the recipient of 13 Distinguished/Outstanding Reviewer Awards.

FACTS

SOME NUMBERS ABOUT ME

920
CUPS OF COFFEE PER YEAR
100+
REVIEWS PER YEAR
2,965
HOURS OF MEETINGS
40+
JOURNAL PAPERS
60+
CONFERENCE PAPERS
50+
THESES (CO-)ADVISED

HOBBIES

... AND OTHER FACTS ABOUT ME

First and foremost, I am a social drinker. Wine, beer, spirits, everything is fine. But, if I can choose, please give me some Campari Spritz - preferably, without sparkly water, just Campari and sparkly wine.

All TV series, especially those telling of political conspiracies (e.g., House of Cards) and supernatural powers (e.g., Fringe), i.e., yes, I like dramas.

A long time ago [... in a galaxy far far away], I used to be a (kind of) professional soccer player. Unfortunately, then I started my Ph.D. and I had to decide which career I should have pursued; Now, whenever possible, I still like to play with friends and collagues.

Since I cannot play soccer so often, I found a less attractive but still useful e-alternative called PlayStation - Fifa is basically the only game I play with. I am not as good and creative as I am in reality, but still...


.02

CAREER

  • ACADEMIC AND PROFESSIONAL POSITIONS
  • now
    2020

    ASSISTANT PROFESSOR

    Salerno

    UNIVERSITY OF SALERNO

    Software Engineering (SESA) Lab.
  • 2019
    2018

    SENIOR RESEARCH ASSOCIATE

    Zurich

    UNIVERSITY OF ZURICH - Switzerland

    Zurich Empirical Software Engineering Team.
  • 2017

    POST-DOC RESEARCHER

    Delft

    DELFT UNIVERSITY OF TECHNOLOGY - Netherlands

    EINDHOVEN UNIVERSITY OF TECHNOLOGY - Netherlands

  • EDUCATION
  • 2017
    2014

    DEGREE OF EUROPEAN DOCTOR OF PHILOSOPHY (PH.D.) IN MANAGEMENT AND INFORMATION TECHNOLOGY

    Salerno

    UNIVERSITY OF SALERNO

    Funded by University of Salerno and University of Molise. Advisor: Prof. Andrea De Lucia
  • 2013
    2011

    MASTER’S DEGREE (M.SC.) IN COMPUTER SCIENCE

    Salerno

    UNIVERSITY OF SALERNO

    110/110 magna cum laude and special commendation by the commission. Advisor: Prof. Andrea De Lucia
  • 2011
    2008

    BACHELOR’S DEGREE (B.SC.) IN COMPUTER SCIENCE

    Isernia

    UNIVERSITY OF MOLISE

    110/110 cum laude.
    Advisor: Prof. Rocco Oliveto
  • QUALIFICATIONS AND LICENCES
  • 2020

    ITALIAN SCIENTIFIC QUALIFICATION AS FULL PROFESSOR

    SECTOR 01/B1 – INFORMATICA

    Evaluation available at ASN Site.
  • 2019

    ITALIAN SCIENTIFIC QUALIFICATION AS FULL PROFESSOR

    SECTOR 09/H1 – SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI

    Evaluation available at ASN Site.
  • 2019

    ITALIAN SCIENTIFIC QUALIFICATION AS ASSOCIATE PROFESSOR

    SECTOR 01/B1 – INFORMATICA

    Evaluation available at ASN Site.
  • 2019

    ITALIAN SCIENTIFIC QUALIFICATION AS ASSOCIATE PROFESSOR

    SECTOR 09/H1 – SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI

    Evaluation available at ASN Site.
  • 2014

    LICENCE OF COMPUTER ENGINEER

    Campobasso

    UNIVERSITY OF MOLISE

.03

PUBLICATIONS

PUBLICATIONS LIST
[J68] IST 2024

The Quantum Frontier of Software Engineering: A Systematic Mapping Study.*

Elsevier's Information and Software Technology (IST)

Quantum computing is becoming a reality, and quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs. This paper presents a systematic mapping study of the current state of QSE research, aiming to identify the most investigated topics, the types and number of studies, the main reported results, and the most studied quantum computing tools/frameworks. Additionally, the study aims to explore the research community's interest in QSE, how it has evolved, and any prior contributions to the discipline before its formal introduction through the Talavera Manifesto. We searched for relevant articles in several databases and applied inclusion and exclusion criteria to select the most relevant studies. After evaluating the quality of the selected resources, we extracted relevant data from the primary studies and analyzed them.  Download PDF

Journal Software Quality Empirical Software Engineering M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, A. De Lucia.

The Quantum Frontier of Software Engineering: A Systematic Mapping Study.*

M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. Quantum computing is becoming a reality, and quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs. This paper presents a systematic mapping study of the current state of QSE research, aiming to identify the most investigated topics, the types and number of studies, the main reported results, and the most studied quantum computing tools/frameworks. Additionally, the study aims to explore the research community's interest in QSE, how it has evolved, and any prior contributions to the discipline before its formal introduction through the Talavera Manifesto. We searched for relevant articles in several databases and applied inclusion and exclusion criteria to select the most relevant studies. After evaluating the quality of the selected resources, we extracted relevant data from the primary studies and analyzed them. We found that QSE research has primarily focused on software testing, with little attention given to other topics, such as software engineering management. The most commonly studied technology for techniques and tools is Qiskit, although, in most studies, either multiple or none specific technologies were employed. The researchers most interested in QSE are interconnected through direct collaborations, and several strong collaboration clusters have been identified. Most articles in QSE have been published in non-thematic venues, with a preference for conferences. Conclusions. The study's implications are providing a centralized source of information for researchers and practitioners in the field, facilitating knowledge transfer, and contributing to the advancement and growth of QSE.

[J67] JSS 2024

Technical Debt in AI-Enabled Systems: On the Prevalence, Severity, Impact, and Management Strategies for Code and Architecture.*

Elsevier's Journal of Systems and Software (JSS)

Artificial Intelligence (AI) is pervasive in several application domains and promises to be even more diffused in the next decades. Developing high-quality AI-enabled systems — software systems embedding one or multiple AI components, algorithms, and models — could introduce critical challenges for mitigating specific risks related to the systems' quality. Such development alone is insufficient to fully address socio-technical consequences and the need for rapid adaptation to evolutionary changes. Recent work proposed the concept of AI technical debt, a potential liability concerned with developing AI-enabled systems whose impact can affect the overall systems’ quality. While the problem of AI technical debt is rapidly gaining the attention of the software engineering research community, scientific knowledge that contributes to understanding and managing the matter is still limited. In this paper, we leverage the expertise of practitioners to offer useful insights to the research community, aiming to enhance researchers' awareness about the detection and mitigation of AI technical debt.  Download PDF

Journal Software Quality Empirical Software Engineering G. Recupito, F. Pecorelli, G. Catolino, V. Lenarduzzi, D. Taibi, D. Di Nucci, F. Palomba.

Technical Debt in AI-Enabled Systems: On the Prevalence, Severity, Impact, and Management Strategies for Code and Architecture.*

G. Recupito, F. Pecorelli, G. Catolino, V. Lenarduzzi, D. Taibi, D. Di Nucci, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. Artificial Intelligence (AI) is pervasive in several application domains and promises to be even more diffused in the next decades. Developing high-quality AI-enabled systems — software systems embedding one or multiple AI components, algorithms, and models — could introduce critical challenges for mitigating specific risks related to the systems' quality. Such development alone is insufficient to fully address socio-technical consequences and the need for rapid adaptation to evolutionary changes. Recent work proposed the concept of AI technical debt, a potential liability concerned with developing AI-enabled systems whose impact can affect the overall systems’ quality. While the problem of AI technical debt is rapidly gaining the attention of the software engineering research community, scientific knowledge that contributes to understanding and managing the matter is still limited. In this paper, we leverage the expertise of practitioners to offer useful insights to the research community, aiming to enhance researchers' awareness about the detection and mitigation of AI technical debt. Our ultimate goal is to empower practitioners by providing them with tools and methods. Additionally, our study sheds light on novel aspects that practitioners might not be fully acquainted with, contributing to a deeper understanding of the subject. Method: We develop a survey study featuring 53 AI practitioners, in which we collect information on the practical prevalence, severity, and impact of AI technical debt issues affecting the code and the architecture other than the strategies applied by practitioners to identify and mitigate them. The key findings of the study reveal the multiple impacts that AI technical debt issues may have on the quality of AI-enabled systems (e.g., the high negative impact that Undeclared consumers has on security, whereas Jumbled Model Architecture can induce the code to be hard to maintain) and the little support practitioners have to deal with them, limited to apply manual effort for identification and refactoring. We conclude the article by distilling lessons learned and actionable insights for researchers.

[J66] IST 2024

SENEM: A Software Engineering-Enabled Educational Metaverse.*

Elsevier's Information and Software Technology (IST)

The term metaverse refers to a persistent, virtual, three-dimensional environment where individuals may communicate, engage, and collaborate. One of the most multifaceted and challenging use cases of the metaverse is education, where educators and learners may require multiple technical, social, psychological, and interaction instruments to accomplish their learning objectives. While the characteristics of the metaverse might nicely fit the problem's needs, our research points out a noticeable lack of knowledge into (1) the specific requirements that an educational metaverse should actually fulfill to let educators and learners successfully interact toward their objectives and (2) how to design an appropriate educational metaverse for both educators and learners. In this paper, we aim to bridge this knowledge gap by proposing SENEM, a novel software engineering-enabled educational metaverse. We first elicit a set of functional requirements that an educational metaverse should fulfill.  Download PDF

Journal Software Quality Empirical Software Engineering V. Pentangelo, D. Di Dario, S. Lambiase, F. Ferrucci, C. Gravino, F. Palomba.

SENEM: A Software Engineering-Enabled Educational Metaverse.*

V. Pentangelo, D. Di Dario, S. Lambiase, F. Ferrucci, C. Gravino, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. The term metaverse refers to a persistent, virtual, three-dimensional environment where individuals may communicate, engage, and collaborate. One of the most multifaceted and challenging use cases of the metaverse is education, where educators and learners may require multiple technical, social, psychological, and interaction instruments to accomplish their learning objectives. While the characteristics of the metaverse might nicely fit the problem's needs, our research points out a noticeable lack of knowledge into (1) the specific requirements that an educational metaverse should actually fulfill to let educators and learners successfully interact toward their objectives and (2) how to design an appropriate educational metaverse for both educators and learners. In this paper, we aim to bridge this knowledge gap by proposing SENEM, a novel software engineering-enabled educational metaverse. We first elicit a set of functional requirements that an educational metaverse should fulfill. in this respect, we conduct a literature survey to extract the currently available knowledge on the matter discussed by the research community, and afterward, we assess and complement such knowledge through semi-structured interviews with educators and learners. Upon completing the requirements elicitation stage, we then build our prototype implementation of SENEM, a metaverse that makes available to educators and learners the features identified in the previous stage. Finally, we evaluate the tool in terms of learnability, efficiency, and satisfaction through a Rapid Iterative Testing and Evaluation research approach, leading us to the iterative refinement of our prototype. Through our survey strategy, we extracted nine requirements that guided the tool development that the study participants positively evaluated. Our study reveals that the target audience appreciates the elicited design strategy. Our work has the potential to form a solid contribution that other researchers can use as a basis for further improvements.

[J65] IEEE Access 2024

FedCSD: A Federated Learning Based Approach for Code-Smell Detection.*

IEEE Access

Software quality is critical, as low quality or code smells increases technical debt and maintenance costs. There is a timely need for a collaborative model that detects and manages code smells by learning from diverse and distributed data sources while respecting privacy and providing a scalable solution for continuously integrating new patterns and practices in code quality management. However, the current literature is still missing such capabilities. This paper addresses the previous challenges by proposing a Federated Learning Code Smell Detection (FedCSD) approach, specifically targeting "God Class", to enable organizations to train distributed ML models while safeguarding data privacy collaboratively.  Download PDF

Journal Software Quality Empirical Software Engineering S. Alawadi, K. Alkharabsheh, F. Alkhabbas, V. Kebande, F. Awaysheh, F. Palomba, M. Awad.

FedCSD: A Federated Learning Based Approach for Code-Smell Detection.*

S. Alawadi, K. Alkharabsheh, F. Alkhabbas, V. Kebande, F. Awaysheh, F. Palomba, M. Awad. Journal Software Quality Empirical Software Engineering

Abstract. Software quality is critical, as low quality or code smells increases technical debt and maintenance costs. There is a timely need for a collaborative model that detects and manages code smells by learning from diverse and distributed data sources while respecting privacy and providing a scalable solution for continuously integrating new patterns and practices in code quality management. However, the current literature is still missing such capabilities. This paper addresses the previous challenges by proposing a Federated Learning Code Smell Detection (FedCSD) approach, specifically targeting "God Class", to enable organizations to train distributed ML models while safeguarding data privacy collaboratively. We conduct experiments using manually validated datasets to detect and analyze code smell scenarios to validate our approach. Experiment 1, a centralized training experiment, revealed varying accuracies across datasets, with dataset two achieving the lowest accuracy (92.30%) and datasets one and three achieving the highest (98.90% and 99.5%, respectively). Experiment 2, focusing on cross-evaluation, showed a significant drop in accuracy (lowest: 63.80%) when fewer smells were present in the training dataset, reflecting technical debt. Experiment 3 involved splitting the dataset across 10 companies, resulting in a global model accuracy of 98.34%, comparable to the centralized model's highest accuracy. The application of federated ML techniques demonstrates promising performance improvements in code-smell detection, benefiting both software developers and researchers.

[J64] TOSEM 2024

Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?*

ACM Transactions on Software Engineering and Methodology (TOSEM)

With the rate of discovered and disclosed vulnerabilities escalating, researchers have been experimenting with machine learning to predict whether a vulnerability will be exploited. Existing solutions leverage information unavailable when a CVE is created, making them unsuitable just after the disclosure. This paper experiments with early exploitability prediction models driven exclusively by the initial CVE record, i.e., the original description and the linked online discussions. Leveraging NVD and Exploit Database, we evaluate 72 prediction models trained using six traditional machine learning classifiers, four feature representation schemas, and three data balancing algorithms. We also experiment with five pre-trained large language models (LLMs).  Download PDF

Journal Empirical Software Engineering E. Iannone, G. Sellitto, E. Iaccarino, F. Ferrucci, A. De Lucia, F. Palomba.

Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?*

E. Iannone, G. Sellitto, E. Iaccarino, F. Ferrucci, A. De Lucia, F. Palomba. Journal Empirical Software Engineering

Abstract. With the rate of discovered and disclosed vulnerabilities escalating, researchers have been experimenting with machine learning to predict whether a vulnerability will be exploited. Existing solutions leverage information unavailable when a CVE is created, making them unsuitable just after the disclosure. This paper experiments with early exploitability prediction models driven exclusively by the initial CVE record, i.e., the original description and the linked online discussions. Leveraging NVD and Exploit Database, we evaluate 72 prediction models trained using six traditional machine learning classifiers, four feature representation schemas, and three data balancing algorithms. We also experiment with five pre-trained large language models (LLMs). The models leverage seven different corpora made by combining three data sources, i.e., CVE description, Security Focus, and BugTraq. The models are evaluated in a realistic, time-aware fashion by removing the training and test instances that cannot be labeled "neutral" with sufficient confidence. The validation reveals that CVE descriptions and Security Focus discussions are the best data to train on. Pre-trained LLMs do not show the expected performance, requiring further pre-training in the security domain. We distill new research directions, identify possible room for improvement, and envision automated systems assisting security experts in assessing the exploitability.

[J63] EMSE 2024

An Empirical Study Into the Effects of Transpilation on Quantum Circuit Smells.*

Springer's Journal of Empirical Software Engineering (EMSE)

Quantum computing is a promising field that can solve complex problems beyond traditional computers' capabilities. Developing high-quality quantum software applications, called quantum software engineering, has recently gained attention. However, quantum software development faces challenges related to code quality. A recent study found that many open-source quantum programs are affected by quantum-specific code smells, with long circuit being the most common. While the study provided relevant insights into the prevalence of code smells in quantum circuits, it did not explore the potential effect of transpilation, a necessary step for executing quantum computer programs, on the emergence of code smells.  Download PDF

Journal Software Quality Empirical Software Engineering M. De Stefano, D. Di Nucci, F. Palomba, A. De Lucia.

An Empirical Study Into the Effects of Transpilation on Quantum Circuit Smells.*

M. De Stefano, D. Di Nucci, F. Palomba, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. Quantum computing is a promising field that can solve complex problems beyond traditional computers' capabilities. Developing high-quality quantum software applications, called quantum software engineering, has recently gained attention. However, quantum software development faces challenges related to code quality. A recent study found that many open-source quantum programs are affected by quantum-specific code smells, with long circuit being the most common. While the study provided relevant insights into the prevalence of code smells in quantum circuits, it did not explore the potential effect of transpilation, a necessary step for executing quantum computer programs, on the emergence of code smells. Indeed, transpilation might alter those characteristics employed to detect the presence of a smell on a circuit. To address this limitation, we present a new study investigating the impact of transpilation on quantum-specific code smells and how different target gate sets affect the results. We conducted experiments on 17 open-source quantum programs alongside a set of 100 synthetic circuits. We found that transpilation can significantly alter the metrics that are used to detect code smells, even into previously smell-free circuits, with the long circuit smell being the most susceptible to transpilation. Furthermore, the choice of the gate set significantly influences the presence and severity of code smells in transpiled circuits, highlighting the need for careful gate set selection to mitigate their impact. These findings have implications for circuit optimization and high-quality quantum software development. Further research is needed to understand the consequences of code smells and their potential impact on quantum computations, considering the characteristics and constraints of different gate sets and hardware platforms.

[J62] EMSE 2024

Toward Granular Search-Based Automatic Unit Test Case Generation.*

Springer's Journal of Empirical Software Engineering (EMSE)

Unit testing verifies the presence of faults in individual software components. Previous research has been targeting the automatic generation of unit tests through the adoption of random or search-based algorithms. Despite their effectiveness, these approaches aim at creating tests by solely optimiz- ing metrics like code coverage, without ensuring that the resulting tests have granularities that would allow them to verify both the behavior of individual production methods and the interaction between methods of the class under test. To address this limitation, we propose a two-step systematic approach to the generation of unit tests.  Download PDF

Journal Software Testing Empirical Software Engineering F. Pecorelli, G. Grano, F. Palomba, H. Gall, A. De Lucia.

Toward Granular Search-Based Automatic Unit Test Case Generation.*

F. Pecorelli, G. Grano, F. Palomba, H. Gall, A. De Lucia. Journal Software Testing Empirical Software Engineering

Abstract. Unit testing verifies the presence of faults in individual software components. Previous research has been targeting the automatic generation of unit tests through the adoption of random or search-based algorithms. Despite their effectiveness, these approaches aim at creating tests by solely optimiz- ing metrics like code coverage, without ensuring that the resulting tests have granularities that would allow them to verify both the behavior of individual production methods and the interaction between methods of the class under test. To address this limitation, we propose a two-step systematic approach to the generation of unit tests: we first force search-based algorithms to create tests that cover individual methods of the production code, hence implementing the so-called intra-method tests; then, we relax the constraints to enable the creation of intra-class tests that target the interactions among production code methods. The assessment of our approach is conducted through a mixed- method research design that combines statistical analyses with a user study. The key results report that our approach is able to keep the same level of code and mutation coverage while providing test suites that are more structured, more understandable and aligned to the design principles of unit testing.

[J61] IST 2023

Test Code Flakiness in Mobile Apps: The Developer's Perspective.*

Elsevier's Information and Software Technology (IST)

Test flakiness arises when test cases have a non-deterministic, intermittent behavior that leads them to either pass or fail when run against the same code. While researchers have been contributing to the detection, classification, and removal of flaky tests with several empirical studies and automated techniques, little is known about how the problem of test flakiness arises in mobile applications. We point out a lack of knowledge on: (1) The prominence and harmfulness of the problem; (2) The most frequent root causes inducing flakiness; and (3) The strategies applied by practitioners to deal with it in practice. An improved understanding of these matters may lead the software engineering research community to assess the need for tailoring existing instruments to the mobile context or for brand-new approaches that focus on the peculiarities identified.  Download PDF

Journal Software Testing Empirical Software Engineering V. Pontillo, F. Palomba, F. Ferrucci.

Test Code Flakiness in Mobile Apps: The Developer's Perspective.*

V. Pontillo, F. Palomba, F. Ferrucci. Journal Software Testing Empirical Software Engineering

Abstract. Test flakiness arises when test cases have a non-deterministic, intermittent behavior that leads them to either pass or fail when run against the same code. While researchers have been contributing to the detection, classification, and removal of flaky tests with several empirical studies and automated techniques, little is known about how the problem of test flakiness arises in mobile applications. We point out a lack of knowledge on: (1) The prominence and harmfulness of the problem; (2) The most frequent root causes inducing flakiness; and (3) The strategies applied by practitioners to deal with it in practice. An improved understanding of these matters may lead the software engineering research community to assess the need for tailoring existing instruments to the mobile context or for brand-new approaches that focus on the peculiarities identified. We address this gap of knowledge by means of an empirical study into the mobile developer's perception of test flakiness. We first perform a systematic grey literature review to elicit how developers discuss and deal with the problem of test flakiness in the wild. Then, we complement the systematic review through a survey study that involves 130 mobile developers and that aims at analyzing their experience on the matter. The results of the grey literature review indicate that developers are often concerned with flakiness connected to user interface elements. In addition, our survey study reveals that flaky tests are perceived as critical by mobile developers, who pointed out major production code- and source code design-related root causes of flakiness, other than the long-term effects of recurrent flaky tests. Furthermore, our study lets the diagnosing and fixing processes currently adopted by developers and their limitations emerge. We conclude by distilling lessons learned, implications, and future research directions.

[J60] EMSE 2023

Machine Learning-Based Test Smell Detection.*

Springer's Journal of Empirical Software Engineering (EMSE)

Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of these detectors is still limited and dependent on tunable thresholds. We design and experiment with a novel test smell detection approach based on machine learning to detect four test smells. First, we develop the largest dataset of manually-validated test smells to enable experimentation. Afterward, we train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we compare the ML-based approach with state-of-the-art heuristic-based techniques.  Download PDF

Journal Software Testing Empirical Software Engineering V. Pontillo, D. Amoroso D'Aragona, F. Pecorelli, D. Di Nucci, F. Ferrucci, F. Palomba.

Machine Learning-Based Test Smell Detection.*

V. Pontillo, D. Amoroso D'Aragona, F. Pecorelli, D. Di Nucci, F. Ferrucci, F. Palomba. Journal Empirical Software Engineering

Abstract. Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of these detectors is still limited and dependent on tunable thresholds. We design and experiment with a novel test smell detection approach based on machine learning to detect four test smells. First, we develop the largest dataset of manually-validated test smells to enable experimentation. Afterward, we train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we compare the ML-based approach with state-of-the-art heuristic-based techniques. The key findings of the study report a negative result. The performance of the machine learning-based detector is significantly better than heuristic-based techniques, but none of the learners able to overcome an average F-Measure of 51%. We further elaborate and discuss the reasons behind this negative result through a qualitative investigation into the current issues and challenges that prevent the appropriate detection of test smells, which allowed us to catalog the next steps that the research community may pursue to improve test smell detection techniques.

[J59] EMSE 2023

On the Adoption and Effects of Source Code Reuse on Defect Proneness and Maintenance Effort.*

Springer's Journal of Empirical Software Engineering (EMSE)

Software reusability mechanisms, like inheritance and delegation in Object-Oriented programming, are widely recognized as key instruments of software design that reduce the risks of source code being affected by defects, other than to reduce the effort required to maintain and evolve source code. Previous work has traditionally employed source code reuse metrics for prediction purposes, e.g., in the context of defect prediction. However, our research identifies two noticeable limitations of the current literature. First, still little is known about the extent to which developers actually employ code reuse mechanisms over time. Second, it is still unclear how these mechanisms may contribute to explaining defect-proneness and maintenance effort during software evolution.  Download PDF

Journal Empirical Software Engineering G. Giordano, G. Festa, G. Catolino, F. Palomba, F. Ferrucci, C. Gravino.

On the Adoption and Effects of Source Code Reuse on Defect Proneness and Maintenance Effort.*

G. Giordano, G. Festa, G. Catolino, F. Palomba, F. Ferrucci, C. Gravino. Journal Empirical Software Engineering

Abstract. Software reusability mechanisms, like inheritance and delegation in Object-Oriented programming, are widely recognized as key instruments of software design that reduce the risks of source code being affected by defects, other than to reduce the effort required to maintain and evolve source code. Previous work has traditionally employed source code reuse metrics for prediction purposes, e.g., in the context of defect prediction. However, our research identifies two noticeable limitations of the current literature. First, still little is known about the extent to which developers actually employ code reuse mechanisms over time. Second, it is still unclear how these mechanisms may contribute to explaining defect-proneness and maintenance effort during software evolution. We aim at bridging this gap of knowledge, as an improved understanding of these aspects might provide insights into the actual support provided by these mechanisms, e.g., by suggesting whether and how to use them for prediction purposes. We propose an exploratory study, conducted on 12 Java projects---over 44,900 commits---of the Defects4J dataset, aiming at (1) assessing how developers use inheritance and delegation during software evolution; and (2) statistically analyzing the impact of inheritance and delegation on fault proneness and maintenance effort. Our results let emerge various usage patterns that describe the way inheritance and delegation vary over time. In addition, we find out that inheritance and delegation are statistically significant factors that influence both source code defect-proneness and maintenance effort.

[J58] JSS 2023

An Empirical Investigation into the Influence of Software Communities' Cultural and Geographical Dispersion on Productivity.*

Elsevier's Journal of Systems and Software (JSS)

Estimating and understanding software development productivity represent crucial tasks for researchers and practitioners. Although different works focused on evaluating the impact of human factors on productivity, a few explored the influence of cultural/geographical diversity in software development communities. More particularly, all previous treatise addresses cultural aspects as abstract concepts without providing a quantitative representation. Improved knowledge of these matters might help project managers to assemble more productive teams and tool vendors to design software analytics toolkits that may better estimate productivity. This paper has the goal of enlarging the existing body of knowledge on the factors affecting productivity by focusing on cultural and geographical dispersion of a development community---namely, how diverse a community is in terms of cultural attitudes and geographical collocation of the members who belong to it.  Download PDF

Journal Empirical Software Engineering S. Lambiase, G. Catolino, F. Pecorelli, D. Tamburri, F. Palomba, W.J. van den Heuvel, F. Ferrucci.

An Empirical Investigation into the Influence of Software Communities' Cultural and Geographical Dispersion on Productivity.*

S. Lambiase, G. Catolino, F. Pecorelli, D. Tamburri, F. Palomba, W.J. van den Heuvel, F. Ferrucci. Journal Empirical Software Engineering

Abstract. Estimating and understanding software development productivity represent crucial tasks for researchers and practitioners. Although different works focused on evaluating the impact of human factors on productivity, a few explored the influence of cultural/geographical diversity in software development communities. More particularly, all previous treatise addresses cultural aspects as abstract concepts without providing a quantitative representation. Improved knowledge of these matters might help project managers to assemble more productive teams and tool vendors to design software analytics toolkits that may better estimate pro- ductivity. This paper has the goal of enlarging the existing body of knowledge on the factors affecting productivity by focusing on cultural and geographical dispersion of a development community---namely, how diverse a community is in terms of cultural attitudes and geographical collocation of the members who belong to it. To reach this goal, we performed a mixed-method empirical study. First, we built a statistical model relating dispersion metrics with the productivity of 25 open-source communities on Github. Then, we performed a confirmatory survey with 140 practitioners. The key results of our study indicate that cultural and geographical dispersion considerably impact productivity, thus encouraging managers and practitioners to consider such aspects during all the phases of the software development lifecycle. We conclude our paper by elaborating on the main insights from our analyses and instilling implications that may drive further research.

[J58] JSS 2023

An Empirical Investigation into the Influence of Software Communities' Cultural and Geographical Dispersion on Productivity.*

Elsevier's Journal of Systems and Software (JSS)

Estimating and understanding software development productivity represent crucial tasks for researchers and practitioners. Although different works focused on evaluating the impact of human factors on productivity, a few explored the influence of cultural/geographical diversity in software development communities. More particularly, all previous treatise addresses cultural aspects as abstract concepts without providing a quantitative representation. Improved knowledge of these matters might help project managers to assemble more productive teams and tool vendors to design software analytics toolkits that may better estimate productivity. This paper has the goal of enlarging the existing body of knowledge on the factors affecting productivity by focusing on cultural and geographical dispersion of a development community---namely, how diverse a community is in terms of cultural attitudes and geographical collocation of the members who belong to it.  Download PDF

Journal Empirical Software Engineering S. Lambiase, G. Catolino, F. Pecorelli, D. Tamburri, F. Palomba, W.J. van den Heuvel, F. Ferrucci.

An Empirical Investigation into the Influence of Software Communities' Cultural and Geographical Dispersion on Productivity.*

S. Lambiase, G. Catolino, F. Pecorelli, D. Tamburri, F. Palomba, W.J. van den Heuvel, F. Ferrucci. Journal Empirical Software Engineering

Abstract. Estimating and understanding software development productivity represent crucial tasks for researchers and practitioners. Although different works focused on evaluating the impact of human factors on productivity, a few explored the influence of cultural/geographical diversity in software development communities. More particularly, all previous treatise addresses cultural aspects as abstract concepts without providing a quantitative representation. Improved knowledge of these matters might help project managers to assemble more productive teams and tool vendors to design software analytics toolkits that may better estimate pro- ductivity. This paper has the goal of enlarging the existing body of knowledge on the factors affecting productivity by focusing on cultural and geographical dispersion of a development community---namely, how diverse a community is in terms of cultural attitudes and geographical collocation of the members who belong to it. To reach this goal, we performed a mixed-method empirical study. First, we built a statistical model relating dispersion metrics with the productivity of 25 open-source communities on Github. Then, we performed a confirmatory survey with 140 practitioners. The key results of our study indicate that cultural and geographical dispersion considerably impact productivity, thus encouraging managers and practitioners to consider such aspects during all the phases of the software development lifecycle. We conclude our paper by elaborating on the main insights from our analyses and instilling implications that may drive further research.

[J57] EMSE 2023

Fairness-Aware Machine Learning Engineering: How Far Are We?*

Springer's Journal of Empirical Software Engineering (EMSE)

Machine learning is part of the daily life of people and companies worldwide. Unfortunately, bias in machine learning algorithms risks unfairly influencing the decision-making process and reiterating possible discrimination. While the interest of the software engineering community in software fairness is rapidly increasing, there is still a lack of understanding of various aspects connected to fair machine learning engineering, i.e., the software engineering process involved in developing fairness-critical machine learning systems. Questions connected to the practitioners’ awareness and maturity about fairness, the skills required to deal with the matter, and the best development phase(s) where fairness should be faced more are just some examples of the knowledge gaps currently open.  Download PDF

Journal Empirical Software Engineering C. Ferrara, G. Sellitto, F. Ferrucci, F. Palomba, A. De Lucia.

Fairness-Aware Machine Learning Engineering: How Far Are We?*

C. Ferrara, G. Sellitto, F. Ferrucci, F. Palomba, A. De Lucia. Journal Empirical Software Engineering

Abstract. Machine learning is part of the daily life of people and companies worldwide. Unfortunately, bias in machine learning algorithms risks unfairly influencing the decision-making process and reiterating possible discrimination. While the interest of the software engineering community in software fairness is rapidly increasing, there is still a lack of understanding of various aspects connected to fair machine learning engineering, i.e., the software engineering process involved in developing fairness-critical machine learning systems. Questions connected to the practitioners’ awareness and maturity about fairness, the skills required to deal with the matter, and the best development phase(s) where fairness should be faced more are just some examples of the knowledge gaps currently open. In this paper, we provide insights into how fairness is perceived and managed in practice, to shed light on the instruments and approaches that practitioners might employ to properly handle fairness. We conducted a survey with 117 professionals who shared their knowledge and experience highlighting the relevance of fairness in practice, and the skills and tools required to handle it. The key results of our study show that fairness is still considered a second-class quality aspect in the development of artificial intelligence systems. The building of specific methods and development environments, other than automated validation tools, might help developers to treat fairness throughout the software lifecycle and revert this trend.

[J56] CSUR 2023

A Systematic Literature Review on Code Smells Datasets and Validation Mechanisms.*

ACM Computing Surveys (CSUR)

The accuracy reported for code smell detection tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.  Download PDF

Journal Empirical Software Engineering Systematic Literature Review M. Zakeri-Nasrabadi, S. Parsa, E. Esmaili, F. Palomba.

A Systematic Literature Review on Code Smells Datasets and Validation Mechanisms.*

M. Zakeri-Nasrabadi, S. Parsa, E. Esmaili, F. Palomba. Journal Empirical Software Engineering Systematic Literature Review

Abstract. The accuracy reported for code smell detection tools varies depending on the dataset used to evaluate the tools. Our survey of 45 existing datasets reveals that the adequacy of a dataset for detecting smells highly depends on relevant properties such as the size, severity level, project types, number of each type of smell, number of smells, and the ratio of smelly to non-smelly samples in the dataset. Most existing datasets support God Class, Long Method, and Feature Envy while six smells in Fowler and Beck's catalog are not supported by any datasets. We conclude that existing datasets suffer from imbalanced samples, lack of supporting severity level, and restriction to Java language.

[J55] SoftwareX 2023

QuantuMoonLight: A Low-Code Platform to Experiment with Quantum Machine Learning.*

Elsevier's SoftwareX

Nowadays, machine learning is being used to address multiple problems in various research fields, with software engineering researchers being among the most active users of machine learning mechanisms. Recent advances revolve around the use of quantum machine learning, which promises to revolutionize program computation and boost software systems' problem-solving capabilities. However, using quantum computing technologies is not trivial and requires interdisciplinary skills and expertise.  Download PDF

Journal Empirical Software Engineering F. Amato, [other authors] , F. Palomba.

QuantuMoonLight: A Low-Code Platform to Experiment with Quantum Machine Learning.*

F. Amato, M. Cicalese, L. Contrasto, G. Cubicciotti, G. D'Ambola, A. La Marca, G. Pagano, F. Tomeo, G. Robertazzi, G. Vassallo, G. Acampora, A. Vitiello, G. Catolino, G. Giordano, S. Lambiase, V. Pontillo, G. Sellitto, F. Ferrucci, F. Palomba. Journal Empirical Software Engineering

Abstract. Nowadays, machine learning is being used to address multiple problems in various research fields, with software engineering researchers being among the most active users of machine learning mechanisms. Recent advances revolve around the use of quantum machine learning, which promises to revolutionize program computation and boost software systems' problem-solving capabilities. However, using quantum computing technologies is not trivial and requires interdisciplinary skills and expertise. For such a reason, we propose QuantuMoonLight, a community-based low-code platform that allows researchers and practitioners to configure and experiment with quantum machine learning pipelines, compare them with classic machine learning algorithms, and share lessons learned and experience reports. We showcase the architecture and main features of QuantuMoonLight, other than discussing its envisioned impact on research and practice.

[J54] JSS 2023

The Anatomy of a Vulnerability Database: A Systematic Mapping Study.*

Elsevier's Journal of Systems and Software (JSS)

Software vulnerabilities play a major role, as there are multiple risks associated, including loss and manipulation of private data. The software engineering research community has been contributing to the body of knowledge by proposing several empirical studies on vulnerabilities and automated techniques to detect and remove them from source code. The reliability and generalizability of the findings heavily depend on the quality of the information mineable from publicly available datasets of vulnerabilities as well as on the availability and suitability of those databases.  Download PDF

Journal Empirical Software Engineering Systematic Literature Review X. Li, S. Moreschini, Z. Zhang F. Palomba, D. Taibi.

The Anatomy of a Vulnerability Database: A Systematic Mapping Study.*

X. Li, S. Moreschini, Z. Zhang F. Palomba, D. Taibi. Journal Empirical Software Engineering Systematic Literature Review

Abstract. Software vulnerabilities play a major role, as there are multiple risks associated, including loss and manipulation of private data. The software engineering research community has been contributing to the body of knowledge by proposing several empirical studies on vulnerabilities and automated techniques to detect and remove them from source code. The reliability and generalizability of the findings heavily depend on the quality of the information mineable from publicly available datasets of vulnerabilities as well as on the availability and suitability of those databases. In this paper, we seek to understand the anatomy of the currently available vulnerability databases through a systematic mapping study where we analyze (1) what are the popular vulnerability databases adopted; (2) what are the goals for adoption; (3) what are the other sources of information adopted; (4) what are the methods and techniques; (5) which tools are proposed. An improved understanding of these aspects might not only allow researchers to take informed decisions on the databases to consider when doing research but also practitioners to establish reliable sources of information to inform their security policies and standards.

[J53] EMSE 2023

Rubbing Salt in The Wound? A Large-Scale Investigation into The Effects of Refactoring on Security.*

Springer's Journal of Empirical Software Engineering (EMSE)

Software refactoring is a behavior-preserving activity to improve the source code quality without changing its external behavior. Unfortunately, it is often a manual and error-prone task that may induce regressions in the source code. Researchers have provided initial compelling evidence of the relation between refactoring and defects, yet little is known about how much it may impact software security. This paper bridges this knowledge gap by presenting a large-scale empirical investigation into the effects of refactoring on the security profile of applications.  Download PDF

Journal Software Quality Empirical Software Engineering E. Iannone, Z. Codabux, V. Lenarduzzi, A. De Lucia, F. Palomba.

Rubbing Salt in The Wound? A Large-Scale Investigation into The Effects of Refactoring on Security.*

E. Iannone, Z. Codabux, V. Lenarduzzi, A. De Lucia, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. Software refactoring is a behavior-preserving activity to improve the source code quality without changing its external behavior. Unfortunately, it is often a manual and error-prone task that may induce regressions in the source code. Researchers have provided initial compelling evidence of the relation between refactoring and defects, yet little is known about how much it may impact software security. This paper bridges this knowledge gap by presenting a large-scale empirical investigation into the effects of refactoring on the security profile of applications. We conduct a three-level mining software repository study to establish the impact of 14 refactoring types on (i) security-related metrics, (ii) security technical debt, and (iii) the introduction of known vulnerabilities. The study covers 39 projects and a total amount of 7,708 refactoring commits. The key results show that refactoring has a limited connection to security. However, Inline Method and Extract Interface statistically contribute to improving some security aspects connected to encapsulating security-critical code components. Extract Superclass and Pull Up Attribute refactoring are commonly found in commits violating specific security best practices for writing secure code. Finally, Extract Superclass and Extract and Move Method refactoring tend to occur more often in commits contributing to the introduction of vulnerabilities. We conclude by distilling lessons learned and recommendations for researchers and practitioners.

[J52] JSEP 2022

"Through the looking-glass..." An Empirical Study on Blob Infrastructure Blueprints in TOSCA.*

Wiley's Journal of Software: Evolution and Process (JSEP)

Infrastructure-as-Code (IaC) helps keep up with the demand for fast, reliable, high-quality services by provisioning and managing infrastructures through configuration files. Those files ensure efficient and repeatable routines for system provisioning, but they might be affected by code smells that negatively affect quality and code maintenance. Research has broadly studied code smells for traditional source code development; however, none explored them in the "Topology and Orchestration Specification for Cloud Applications" (TOSCA), the technology-agnostic OASIS standard for IaC. In this paper, we investigate a prominent tradi- tional implementation code smell potentially applicable to TOSCA: Large Class, or "Blob Blueprint" in IaC terms.  Download PDF

Journal Software Quality Empirical Software Engineering S. Dalla Palma, C. van Asseldonk, G. Catolino, D. Di Nucci, F. Palomba, D. Tamburri.

"Through the looking-glass..." An Empirical Study on Blob Infrastructure Blueprints in TOSCA.*

S. Dalla Palma, C. van Asseldonk, G. Catolino, D. Di Nucci, F. Palomba, D. Tamburri. Journal Software Quality Empirical Software Engineering

Abstract. Infrastructure-as-Code (IaC) helps keep up with the demand for fast, reliable, high-quality services by provisioning and managing infrastructures through configuration files. Those files ensure efficient and repeatable routines for system provisioning, but they might be affected by code smells that negatively affect quality and code maintenance. Research has broadly studied code smells for traditional source code development; however, none explored them in the "Topology and Orchestration Specification for Cloud Applications" (TOSCA), the technology-agnostic OASIS standard for IaC. In this paper, we investigate a prominent tradi- tional implementation code smell potentially applicable to TOSCA: Large Class, or "Blob Blueprint" in IaC terms. We compare metrics-based and unsupervised learning-based detectors on a large dataset of manually validated observations related to Blob Blueprints. We provide insights on code metrics that corroborate previous findings and em- pirically show that metrics-based detectors perform highly in detecting Blob Blueprints. We deem our results put forward a new research path toward dealing with this problem, e.g., in the scope of fully automated service pipelines.

[J51] JSS 2022

A Critical Comparison on Six Static Analysis Tools: Detection, Agreement, and Precision.*

Elsevier's Journal of Systems and Software (JSS)

Developers use Static Analysis Tools (SATs) to control for potential quality issues in source code, including defects and technical debt. Tool vendors have devised quite a number of tools, which makes it harder for practitioners to select the most suitable one for their needs. To better support developers, researchers have been conducting several studies on SATs to favor the understanding of their actual capabilities. Despite the work done so far, there is still a lack of knowledge regarding (1) what is their agreement, and (2) what is the precision of their recommendations. We aim at bridging this gap by proposing a large-scale comparison of six popular SATs for Java projects: Better Code Hub, CheckStyle, Coverity Scan, FindBugs, PMD, and SonarQube.  Download PDF

Journal Software Quality Empirical Software Engineering V. Lenarduzzi, F. Pecorelli, N. Saarimaki, S. Lujan, F. Palomba.

A Critical Comparison on Six Static Analysis Tools: Detection, Agreement, and Precision.*

V. Lenarduzzi, F. Pecorelli, N. Saarimaki, S. Lujan, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. Developers use Static Analysis Tools (SATs) to control for potential quality issues in source code, including defects and technical debt. Tool vendors have devised quite a number of tools, which makes it harder for practitioners to select the most suitable one for their needs. To better support developers, researchers have been conducting several studies on SATs to favor the understanding of their actual capabilities. Despite the work done so far, there is still a lack of knowledge regarding (1) what is their agreement, and (2) what is the precision of their recommendations. We aim at bridging this gap by proposing a large-scale comparison of six popular SATs for Java projects: Better Code Hub, CheckStyle, Coverity Scan, FindBugs, PMD, and SonarQube. We analyze 47 Java projects applying 6 SATs. To assess their agreement, we compared them by manually analyzing---at line- and class-level---whether they identify the same issues. Finally, we evaluate the precision of the tools against a manually-defined ground truth. The key results show little to no agreement among the tools and a low degree of precision. Our study provides the first overview on the agreement among different tools as well as an extensive analysis of their precision that can be used by researchers, practitioners, and tool vendors to map the current capabilities of the tools and envision possible improvements.

[J50] EMSE 2022

Static Test Flakiness Prediction: How Far Can We Go?*

Springer's Journal of Empirical Software Engineering (EMSE)

Test flakiness is a phenomenon occurring when a test case is non-deterministic and exhibits both a passing and failing behavior when run against the same code. The problem has been closely investigated by researchers and practitioners, who all have shown its relevance in practice. The software engineering research community has been working toward defining approaches for detecting and addressing test flakiness. Despite being quite accurate, most of these approaches rely on expensive dynamic steps, e.g., the computation of code coverage information. Consequently, they might suffer from scalability issues that possibly preclude their practical use. This limitation has been recently targeted through machine learning solutions that could predict the flakiness of tests using various features, like source code vocabulary or a mixture of static and dynamic metrics computed on individual snapshots of the system.  Download PDF

Journal Software Testing Empirical Software Engineering V. Pontillo, F. Palomba, F. Ferrucci.

Static Test Flakiness Prediction: How Far Can We Go?*

V. Pontillo, F. Palomba, F. Ferrucci. Journal Software Testing Empirical Software Engineering

Abstract. Test flakiness is a phenomenon occurring when a test case is non-deterministic and exhibits both a passing and failing behavior when run against the same code. The problem has been closely investigated by researchers and practitioners, who all have shown its relevance in practice. The software engineering research community has been working toward defining approaches for detecting and addressing test flakiness. Despite being quite accurate, most of these approaches rely on expensive dynamic steps, e.g., the computation of code coverage information. Consequently, they might suffer from scalability issues that possibly preclude their practical use. This limitation has been recently targeted through machine learning solutions that could predict the flakiness of tests using various features, like source code vocabulary or a mixture of static and dynamic metrics computed on individual snapshots of the system. In this paper, we aim to perform a step forward and predict test flakiness only using static metrics. We propose a large-scale experiment on 70 Java projects coming from the iDFlakies and FlakeFlagger datasets. First, we statistically assess the differences between flaky and non-flaky tests in terms of 25 test and production code metrics and smells, analyzing both their individual and combined effects. Based on the results achieved, we experiment with a machine learning approach that predicts test flakiness solely based on static features, comparing it with two state-of-the-art approaches. The key results of the study show that the static approach has performance comparable to those of the baselines. In addition, we found that the characteristics of the production code might impact the performance of the flaky test prediction models.

[J49] JSS 2022

On the Use of Artificial Intelligence to Deal with Privacy in IoT Systems: A Systematic Literature Review.*

Elsevier's Journal of Systems and Software (JSS)

The Internet of Things (IoT) refers to a network of Internet-enabled devices that can make different operations, like sensing, communicating, and reacting to changes arising in the surrounding environment. Nowadays, the number of IoT devices is already higher than the world population. These devices operate by exchanging data between them, sometimes through an intermediate cloud infrastructure, and may be used to enable a wide variety of novel services that can potentially improve the quality of life of billions of people. Nonetheless, all that glitters is not gold: the increasing adoption of IoT comes with several privacy concerns due to the lack or loss of control over the sensitive data exchanged by these devices.  Download PDF

Journal Empirical Software Engineering Systematic Literature Review G. Giordano, F. Palomba, F. Ferrucci.

On the Use of Artificial Intelligence to Deal with Privacy in IoT Systems: A Systematic Literature Review.*

G. Giordano, F. Palomba, F. Ferrucci. Journal Empirical Software Engineering Systematic Literature Review

Abstract. The Internet of Things (IoT) refers to a network of Internet-enabled devices that can make different operations, like sensing, communicating, and reacting to changes arising in the surrounding environment. Nowadays, the number of IoT devices is already higher than the world population. These devices operate by exchanging data between them, sometimes through an intermediate cloud infrastructure, and may be used to enable a wide variety of novel services that can potentially improve the quality of life of billions of people. Nonetheless, all that glitters is not gold: the increasing adoption of IoT comes with several privacy concerns due to the lack or loss of control over the sensitive data exchanged by these devices. This represents a key challenge for software engineering researchers attempting to address those privacy concerns by proposing (semi-)automated solutions to identify sources of privacy leaks. In this respect, a notable trend is represented by the adoption of smart solutions, that is, the definition of techniques based on artificial intelligence (AI) algorithms. This paper proposes a systematic literature review of the research in smart detection of privacy concerns in IoT devices. Following well-established guidelines, we identify 152 primary studies that we analyze under three main perspectives: (1) What are the privacy concerns addressed with AI-enabled techniques; (2) What are the algorithms employed and how they have been configured/validated; and (3) Which are the domains targeted by these techniques. The key results of the study identified six main tasks targeted through the use of artificial intelligence, like Malware Detection or Network Analysis. Support Vector Machine is the technique most frequently used in literature, however in many cases researchers do not explicitly indicate the domain where to use artificial intelligence algorithms. We conclude the paper by distilling several lessons learned and implications for software engineering researchers.

[J48] EMSE 2022

FindICI: Using Machine-Learning to Detect Linguistic Inconsistencies between Code and Natural Language Descriptions in Infrastructure-as-Code.*

Springer's Journal of Empirical Software Engineering (EMSE)

Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure-as-Code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their short text names. To this end, we propose FindICI a novel automated approach that employs word embedding and classification algorithms.  Download PDF

Journal Empirical Software Engineering N. Borovits, I. Kumara, D. Di Nucci, P. Krishnan, S. Dalla Palma, F. Palomba, D. Tamburri, W.J. van den Heuvel.

FindICI: Using Machine-Learning to Detect Linguistic Inconsistencies between Code and Natural Language Descriptions in Infrastructure-as-Code.*

N. Borovits, I. Kumara, D. Di Nucci, P. Krishnan, S. Dalla Palma, F. Palomba, D. Tamburri, W.J. van den Heuvel. Journal Empirical Software Engineering

Abstract. Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure-as-Code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their short text names. To this end, we propose FindICI a novel automated approach that employs word embedding and classification algorithms. We build and use the abstract syntax tree of IaC code units to create code embeddings used by machine learning techniques to detect inconsistent IaC code units. We evaluated our approach with two experiments on Ansible tasks systematically extracted from open source repositories for various word embedding models and classification algorithms. Classical machine learning models and novel deep learning models with different word embedding methods showed comparable and satisfactory results in detecting inconsistent Ansible tasks related to the top-10 used Ansible modules.

[J47] EMSE 2022

The Making of Accessible Android Applications: An Empirical Study on the State of the Practice.*

Springer's Journal of Empirical Software Engineering (EMSE)

Nowadays, mobile applications represent the principal means to en- able human interaction. Being so pervasive, these applications should be made usable for all users: accessibility collects the guidelines that developers should follow to include features allowing users with disabilities (e.g., visual impairments) to better interact with an application. While research in this field is gaining interest, there is still a notable lack of knowledge on how developers practically deal with the problem: (i) whether they are aware and take accessibility guidelines into account when developing apps, (ii) which guidelines are harder for them to implement, and (iii) which tools they use to be supported in this task.  Download PDF

Journal Empirical Software Engineering Computer-Human Interaction M. Di Gregorio, D. Di Nucci, F. Palomba, G. Vitiello.

The Making of Accessible Android Applications: An Empirical Study on the State of the Practice.*

M. Di Gregorio, D. Di Nucci, F. Palomba, G. Vitiello. Journal Empirical Software Engineering Computer-Human Interaction

Abstract. Nowadays, mobile applications represent the principal means to en- able human interaction. Being so pervasive, these applications should be made usable for all users: accessibility collects the guidelines that developers should follow to include features allowing users with disabilities (e.g., visual impairments) to better interact with an application. While research in this field is gaining interest, there is still a notable lack of knowledge on how developers practically deal with the problem: (i) whether they are aware and take accessibility guidelines into account when developing apps, (ii) which guidelines are harder for them to implement, and (iii) which tools they use to be supported in this task. To bridge the gap of knowledge on the state of the practice concerning the accessibility of mobile applications, we adopt a mixed-method research approach with a twofold goal. We aim to (i) verify how accessibility guidelines are implemented in mobile applications through a coding strategy and (ii) survey mobile developers on the issues and challenges of dealing with accessibility in practice. The key results of the study show that most accessibility guidelines are ignored when developing mobile apps. This behavior is mainly due to the lack of developers’ awareness of accessibility concerns and the lack of tools to support them during the development.

[J46] CCIS 2022

Unsupervised Labor Intelligence Systems: A Detection Approach and Its Evaluation.*

Springer's Communications in Computer and Information Science (CCIS)

In recent years, job advertisements through the web or social media represent an easy way to spread this information. However, social media are often a dangerous showcase of possibly labor exploitation advertisements. This paper aims to determine the potential indicators of labor exploitation for unskilled jobs offered in the Netherlands.  Download PDF

Journal Computer-Human Interaction A. Andreou, G. Cascavilla, G. Catolino, F. Palomba, D. Tamburri, W.J. Van Den Heuvel.

Unsupervised Labor Intelligence Systems: A Detection Approach and Its Evaluation.*

A. Andreou, G. Cascavilla, G. Catolino, F. Palomba, D. Tamburri, W.J. Van Den Heuvel. Journal Computer-Human Interaction

Abstract. In recent years, job advertisements through the web or social media represent an easy way to spread this information. However, social media are often a dangerous showcase of possibly labor exploitation advertisements. This paper aims to determine the potential indicators of labor exploitation for unskilled jobs offered in the Netherlands. Specifically, we exploited topic modeling to extract and handle information from textual data about job advertisements for analyzing deceptive and characterizing features. Finally, we use these features to investigate whether automated machine learning methods can predict the risk of labor ex- ploitation by looking at salary discrepancies. The results suggest that features need to be carefully monitored, e.g., hours, link. Finally, our results showed encouraging results, i.e., F1-Score 61%, thus meaning that Data Science methods and AI approaches can be used to detect labor exploitation—starting from job advertisements-based on the discrepancy of delta salary, possibly representing a revolutionary step.

[J45] JSS 2022

Software Engineering for Quantum Programming: How Far Are We?*

Elsevier's Journal of Systems and Software (JSS)

Quantum computing is no longer only a scientific interest but is rapidly becoming an industrially available technology that can potentially overcome the limits of classical computation. Over the last years, all major companies have provided frameworks and programming languages that allow developers to create their quantum applications. This shift has led to the definition of a new discipline called quantum software engineering, which is demanded to define novel methods for engineering large-scale quantum applications. While the research community is successfully embracing this call, we notice a lack of systematic investigations into the state of the practice of quantum programming. Understanding the challenges that quantum developers face is vital to precisely define the aims of quantum software engineering.  Download PDF

Journal Software Quality Empirical Software Engineering M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, A. De Lucia.

Software Engineering for Quantum Programming: How Far Are We?*

M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. Quantum computing is no longer only a scientific interest but is rapidly becoming an industrially available technology that can potentially overcome the limits of classical computation. Over the last years, all major companies have provided frameworks and programming languages that allow developers to create their quantum applications. This shift has led to the definition of a new discipline called quantum software engineering, which is demanded to define novel methods for engineering large-scale quantum applications. While the research community is successfully embracing this call, we notice a lack of systematic investigations into the state of the practice of quantum programming. Understanding the challenges that quantum developers face is vital to precisely define the aims of quantum software engineering. Hence, in this paper, we first mine all the GitHub repositories that make use of the most used quantum programming frameworks currently on the market and then conduct coding analysis sessions to produce a taxonomy of the purposes which quantum technologies are used for. In the second place, we conduct a survey study that involves the contributors of the considered repositories and that aim at eliciting the developers’ opinions on the current adoption and challenges of quantum programming. On the one hand, the results achieved highlight that the current adoption of quantum programming is still limited. On the other hand, there are many challenges that the software engineering community should carefully consider: these do not strictly pertain to technical concerns but also socio-technical matters.

[J44] EMSE 2022

Handling Uncertainty in SBSE: A Possibilistic Evolutionary Approach for Code Smells Detection.*

Springer's Journal of Empirical Software Engineering (EMSE)

Code smells,also known as anti-patterns, are poor design or implementation choices that hinder program comprehensibility and maintainability. While several code smell detection methods have been proposed, Mantyla et al. identified the uncertainty issue as one of the major individual human factors that may affect developer's decisions about the smelliness of software classes: they may indeed have different opinions mainly due to their different knowledge and expertise. Unfortunately, almost all the existing approaches assume data perfection and neglect the uncertainty when identifying the labels of the software classes.  Download PDF

Journal Software Quality Empirical Software Engineering S. Boutaib, M. Elarbi, S. Bechikh, F. Palomba, L. Ben Said.

Handling Uncertainty in SBSE: A Possibilistic Evolutionary Approach for Code Smells Detection.*

S. Boutaib, M. Elarbi, S. Bechikh, F. Palomba, L. Ben Said. Journal Software Quality Empirical Software Engineering

Abstract. Code smells,also known as anti-patterns, are poor design or implementation choices that hinder program comprehensibility and maintainability. While several code smell detection methods have been proposed, Mantyla et al. identified the uncertainty issue as one of the major individual human factors that may affect developer's decisions about the smelliness of software classes: they may indeed have different opinions mainly due to their different knowledge and expertise. Unfortunately, almost all the existing approaches assume data perfection and neglect the uncertainty when identifying the labels of the software classes. Ignoring or rejecting any uncertainty form could lead to a considerable loss of information, which could significantly deteriorate the effectiveness of the detection and identification processes. Inspired by our previous works and motivated by the interesting performance of the PDT (Possibilistic Decision Tree) in classifying uncertain data, we propose ADIPE (Anti-pattern Detection and Identification using Possibilistic decision tree Evolution), as a new tool that evolves and optimizes a set of detectors (PDTs) that could effectively deal with software class labels uncertainty using some concepts from the Possibility theory. ADIPE uses a PBE (Possibilistic Base of Examples: a dataset with possibilistic labels) that it is built using a set of opinion-based classifiers (i.e., a set of probabilistic classifiers) with the aim to simulate human developers’ uncertainty. A set of advisors and probabilistic classifiers are employed in order to mimic the subjectivity and the doubtfulness of software engineers. A detailed experimental study is conducted to show the merits and outperformance of ADIPE in dealing with uncertainty in code smells detection and identification with respect to four relevant state-of-the-art methods, including the baseline PDT. The experimental study was performed in uncertain and certain environments based on two suitable metrics: PF-measure_dist (Possibilistic F-measure_Distance) and IAC (Information Affinity Criterion); which corresponds to the F-measure and Accuracy (PCC) for the certain case. The obtained results for the uncertain environment reveal that for the detection process, the PF-measure_dist of ADIPE ranges within [0.9047 and 0.9285], and its IAC lies within [0.9288 and 0.9557]; while for the identification process, the PF-measure_dist of ADIPE is in [0.8545, 0.9228], and its IAC lies within [0.8751, 0.933]. ADIPE is able to find 35% more code smells with uncertain data than the second best algorithm (i.e., BLOP). In addition, ADIPE succeeds to decrease the number of false alarms (i.e., misclassified smelly instances) with a rate equals to 12%. Our proposed approach is also able to identify 43% more smell types than BLOP and decreases the number of false alarms with a rate equals to 32%. Similar results were obtained for the certain environment, which demonstrate the ability of ADIPE to also deal with the certain environment.

[J43] JSS 2022

Just-in-Time Software Vulnerability Detection: Are We There Yet?*

Elsevier's Journal of Systems and Software (JSS)

Background. Software vulnerabilities are weaknesses in source code that might be exploited to cause harm or loss. Previous work has proposed a number of automated machine learning approaches to detect them. Most of these techniques work at release-level, meaning that they aim at predicting the files that will potentially be vulnerable in a future release. Yet, researchers have shown that a commit-level identification of source code issues might better fit the developer’s needs, speeding up their resolution. Objective. To investigate how currently available machine learning-based vulnerability detection mechanisms can support developers in the detection of vulnerabilities at commit-level.  Download PDF

Journal Software Quality Empirical Software Engineering F. Lomio, E. Iannone, A. De Lucia, F. Palomba, V. Lenarduzzi.

Just-in-Time Software Vulnerability Detection: Are We There Yet?*

F. Lomio, E. Iannone, A. De Lucia, F. Palomba, V. Lenarduzzi. Journal Software Quality Empirical Software Engineering

Abstract. Background. Software vulnerabilities are weaknesses in source code that might be exploited to cause harm or loss. Previous work has proposed a number of automated machine learning approaches to detect them. Most of these techniques work at release-level, meaning that they aim at predicting the files that will potentially be vulnerable in a future release. Yet, researchers have shown that a commit-level identification of source code issues might better fit the developer’s needs, speeding up their resolution. Objective. To investigate how currently available machine learning-based vulnerability detection mechanisms can support developers in the detection of vulnerabilities at commit-level. Method. We perform an empirical study where we consider nine projects accounting for 8,991 commits and experiment with eight machine learners built using process, product, and textual metrics. Results. We point out three main findings: (1) basic machine learners rarely perform well; (2) the use of ensemble machine learning algorithms based on boosting can substantially improve the performance; and (3) the combination of more metrics does not necessarily improve the classification capabilities. Conclusion. Further research should focus on just-in-time vulnerability detection, especially with respect to the introduction of smart approaches for feature selection and training strategies.

[J42] EMSE 2022

On the Adequacy of Static Analysis Warnings with Respect to Code Smell Prediction.*

Springer's Journal of Empirical Software Engineering (EMSE)

Code smells are poor implementation choices that developers apply while evolving source code and that affect program maintainability. Multiple automated code smell detectors have been proposed: while most of them relied on heuristics applied over software metrics, a recent trend concerns the definition of machine learning techniques. However, machine learning-based code smell detectors still suffer from low accuracy: one of the causes is the lack of adequate features to feed machine learners.  Download PDF

Journal Software Quality Empirical Software Engineering F. Pecorelli, S. Lujan, V. Lenarduzzi, F. Palomba, A. De Lucia.

On the Adequacy of Static Analysis Warnings with Respect to Code Smell Prediction.*

F. Pecorelli, S. Lujan, V. Lenarduzzi, F. Palomba, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. Code smells are poor implementation choices that developers apply while evolving source code and that affect program maintainability. Multiple automated code smell detectors have been proposed: while most of them relied on heuristics applied over software metrics, a recent trend concerns the definition of machine learning techniques. However, machine learning-based code smell detectors still suffer from low accuracy: one of the causes is the lack of adequate features to feed machine learners. In this paper, we face this issue by investigating the role of static analysis warnings generated by three state-of-the-art tools to be used as features of machine learning models for the detection of seven code smell types. We conduct a three-step study in which we (1) verify the relation between static analysis warnings and code smells and the potential predictive power of these warnings; (2) build code smell prediction models exploiting and combining the most relevant features coming from the first analysis; (3) compare and combine the performance of the best code smell prediction model with the one achieved by a state of the art approach. The results reveal the low performance of the models exploit- ing static analysis warnings alone, while we observe significant improvements when combining the warnings with additional code metrics. Nonetheless, we still find that the best model does not perform better than a random model, hence leaving open the challenges related to the definition of ad-hoc features for code smell prediction.

[J41] TSE 2022

The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study.*

IEEE Transactions on Software Engineering (TSE)

Software vulnerabilities are weaknesses in source code that can be exploited to cause loss or harm. While researchers have been devising a number of methods to deal with vulnerabilities, there is still a noticeable lack of knowledge on their software engineering life cycle, for example how vulnerabilities are introduced and removed by developers. This information can be exploited to design more effective methods for vulnerability prevention and detection, as well as to understand the granularity at which these methods should aim. To investigate the life cycle of known software vulnerabilities, we focus on how, when, and under which circumstances the contributions to the introduction of vulnerabilities in software projects are made, as well as how long, and how they are removed.  Download PDF

Journal Software Quality Empirical Software Engineering E. Iannone, R. Guadagni, F. Ferrucci, A. De Lucia, F. Palomba.

The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study.*

E. Iannone, R. Guadagni, F. Ferrucci, A. De Lucia, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. Software vulnerabilities are weaknesses in source code that can be potentially exploited to cause loss or harm. While researchers have been devising a number of methods to deal with vulnerabilities, there is still a noticeable lack of knowledge on their software engineering life cycle, for example how vulnerabilities are introduced and removed by developers. This information can be exploited to design more effective methods for vulnerability prevention and detection, as well as to understand the granularity at which these methods should aim. To investigate the life cycle of known software vulnerabilities, we focus on how, when, and under which circumstances the contributions to the introduction of vulnerabilities in software projects are made, as well as how long, and how they are removed. We consider 3,663 vulnerabilities with public patches from the National Vulnerability Database—pertaining to 1,096 open-source software projects on Github—and define an eight-step process involving both automated parts (e.g., using a procedure based on the SZZ algorithm to find the vulnerability-contributing commits) and manual analyses (e.g., how vulnerabilities were fixed). The investigated vulnerabilities can be classified in 144 categories, take on average at least 4 contributing commits before being introduced, and half of them remain unfixed for at least more than one year. Most of the contributions are done by developers with high workload, often when doing maintenance activities, and removed mostly with the addition of new source code aiming at implementing further checks on inputs. We conclude by distilling practical implications on how vulnerability detectors should work to assist developers in timely identifying these issues.

[J40] JSEP 2021

Evolving Software Forges: an Experience Report from Apache Allura.*

Wiley's Journal of Software: Evolution and Process (JSEP)

The open-source phenomenon has reached unimaginable proportions to a point in which it is virtually impossible to find large applications that do not rely on open-source as well. However, such proportions may turn into a risk if the organisational and socio-technical aspects (e.g., the contribution and release schemes) behind open-source communities are not explicitly supported by open-source forges by-design.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering D. Tamburri, F. Palomba.

Evolving Software Forges: an Experience Report from Apache Allura.*

D. Tamburri, F. Palomba. Journal Socio-Technical Analytics Empirical Software Engineering

Abstract. The open-source phenomenon has reached unimaginable proportions to a point in which it is virtually impossible to find large applications that do not rely on open-source as well. However, such proportions may turn into a risk if the organisational and socio-technical aspects (e.g., the contribution and release schemes) behind open-source communities are not explicitly supported by open-source forges by-design. In an effort to make such aspects explicit and supported by-design in open-source forges, we conducted empirical software engineering as follows: (a) through online industrial surveying, we elicited organisational and social aspects relevant in open-source com- munities; (b) through action research, we extended a widely known open-source support system and top-level Apache project Allura; (c) through ethnography, we studied the Allura community and, learning from its social and organisational structure, (d) we elicited a metrics framework that support more explicit organisational and socio-technical design principles around open-source communities. This article is an experience report on these results and the lessons we learned in obtaining them. We found that the extensions provided to Apache Allura formed the basis for community awareness by design, providing valuable and usable community characteristics. Ultimately, however, the extensions we provided to Apache Allura were de-activated by its core developers because of performance overheads. Our results and lessons learned allow us to provide recommendations for designing forges, like Github. Architecting a forge is a participatory process that requires active engagement, hence remarking the need for mechanisms enabling it. At the same time, we conclude that a more active support for the governance is required to avoid the failure of the forge.

[J39] EMSE 2021

Software Testing and Android Applications: A Large-Scale Empirical Study.*

Springer's Journal of Empirical Software Engineering (EMSE)

These days, over three billion users rely on mobile applications (a.k.a. apps) on a daily basis to access high-speed connectivity and all kinds of services it enables, from social to emergency needs. Having high-quality apps is therefore a vital requirement for developers to keep staying on the market and acquire new users. For this reason, the research community has been devising automated strategies to better test these applications. Despite the effort spent so far, most developers write their test cases manually without the adoption of any tool.  Download PDF

Journal Software Quality Empirical Software Engineering F. Pecorelli, G. Catolino, F. Ferrucci, A. De Lucia, F. Palomba.

Software Testing and Android Applications: A Large-Scale Empirical Study.*

F. Pecorelli, G. Catolino, F. Ferrucci, A. De Lucia, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. These days, over three billion users rely on mobile applications (a.k.a. apps) on a daily basis to access high-speed connectivity and all kinds of services it enables, from social to emergency needs. Having high-quality apps is therefore a vital requirement for developers to keep staying on the market and acquire new users. For this reason, the research community has been devising automated strategies to better test these applications. Despite the effort spent so far, most developers write their test cases manually without the adoption of any tool. Nevertheless, we still observe a lack of knowledge on the quality of these manually written tests: an enhanced understanding of this aspect may provide evidence-based findings on the current status of testing in the wild and point out future research directions to better support the daily activities of mobile developers. We perform a large-scale empirical study targeting 1,693 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, (3) what is their effectiveness, and (4) how well manual tests can reduce the risk of having defects in production code. In addition, we conduct a focus group with 5 Android testing experts to discuss the findings achieved and gather insights into the next research avenues to undertake. The key results of our study show Android apps are poorly tested and the available tests have low (i) design quality, (ii) effectiveness, and (iii) ability to find defects in production code. Among the various suggestions, testing experts report the need for improved mechanisms to locate potential defects and deal with the complexity of creating tests that effectively exercise the production code.

[J38] IST 2021

On the Impact of Continuous Integration on Refactoring Practice: An Exploratory Study on TravisTorrent.*

Elsevier's Information and Software Technology (IST)

The ultimate goal of Continuous Integration (CI) is to support developers in integrating changes into production constantly and quickly through automated build process. While CI provides developers with prompt feedback on several quality dimensions after each change, such frequent and quick changes may in turn compromise software quality without refactoring. Indeed, recent work emphasized the potential of CI in changing the way developers perceive and apply refactoring. However, we still lack empirical evidence to confirm or refute this assumption.  Download PDF

Journal Software Quality Empirical Software Engineering I. Saidani, A. Ouni, M. Mkaouer, F. Palomba.

On the Impact of Continuous Integration on Refactoring Practice: An Exploratory Study on TravisTorrent.*

I. Saidani, A. Ouni, M. Mkaouer, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. The ultimate goal of Continuous Integration (CI) is to support developers in integrating changes into production constantly and quickly through automated build process. While CI provides developers with prompt feedback on several quality dimensions after each change, such frequent and quick changes may in turn compromise software quality without refactoring. Indeed, recent work emphasized the potential of CI in changing the way developers perceive and apply refactoring. However, we still lack empirical evidence to confirm or refute this assumption. We aim to explore and understand the evolution of refactoring practices, in terms of frequency, size and involved developers, after the switch to CI in order to emphasize the role of this process in changing the way refactoring is applied. We collect a corpus of 99,545 commits and 89,926 refactoring operations extracted from 39 open-source GitHub projects that adopt Travis CI and analyze the changes using Multiple Regression Analysis (MRA). Our study delivers several important findings. We found that the adoption of CI is associated with a drop in the refactoring size as recommended, while refactoring frequency as well as the number (and its related rate) of developers that perform refactoring are estimated to decrease after the shift to CI, indicating that refactoring is less likely to be applied in CI context. Our study uncovers insights about CI theory and practice and adds evidence to existing knowledge about CI practices related especially to quality assurance. Software developers need more customized refactoring tool support in the context of CI to better maintain and evolve their software systems.

[J37] IST 2021

The Do's and Don'ts of Infrastructure Code: a Systematic Grey Literature Review.*

Elsevier's Information and Software Technology (IST)

Infrastructure-as-code (IaC) is the DevOps tactic of managing and provisioning software infrastructures through machine-readable definition files, rather than manual hardware configuration or interactive configuration tools. From a maintenance and evolution perspective, the topic has picked the interest of practitioners and academics alike, given the relative scarcity of supporting patterns and practices in the academic literature.  Download PDF

Journal Empirical Software Engineering I. Kumara, M. Garriga, A. Romeu, D. Di Nucci, F. Palomba, D. Tamburri, W. J. van den Heuvel.

The Do's and Don'ts of Infrastructure Code: a Systematic Grey Literature Review.*

I. Kumara, M. Garriga, A. Romeu, D. Di Nucci, F. Palomba, D. Tamburri, W. J. van den Heuvel. Journal Empirical Software Engineering

Abstract. Infrastructure-as-code (IaC) is the DevOps tactic of managing and provisioning software infrastructures through machine-readable definition files, rather than manual hardware configuration or interactive configuration tools. From a maintenance and evolution perspective, the topic has picked the interest of practitioners and academics alike, given the relative scarcity of supporting patterns and practices in the academic literature. At the same time, a considerable amount of grey literature exists on IaC. Thus we aim to characterize IaC and compile a catalog of best and bad practices for widely used IaC languages, all using grey literature materials. In this paper, we systematically analyze the industrial grey literature on IaC, such as blog posts, tutorials, white papers using qualitative analysis techniques. We proposed a definition for IaC and distilled a broad catalog summa- rized in a taxonomy consisting of 10 and 4 primary categories for best practices and bad practices, respectively, both language-agnostic and language-specific ones, for three IaC languages, namely Ansible, Puppet, and Chef. The practices reflect implementation issues, design issues, and the violation of/adherence to the essential principles of IaC. Our findings reveal critical insights concerning the top languages as well as the best practices adopted by practitioners to address (some of) those challenges. We evidence that the field of development and maintenance IaC is in its infancy and deserves further attention.

[J36] TSE 2021

Within-project Defect Prediction of Infrastructure-as-Code Using Product and Process Metrics.*

IEEE Transactions on Software Engineering (TSE)

Infrastructure-as-code (IaC) is the DevOps practice enabling management and provisioning of infrastructure through the definition of machine-readable files, hereinafter referred to as IaC scripts. Similarly to other source code artefacts, these files may contain defects that can preclude their correct functioning. In this paper, we aim at assessing the role of product and process metrics when predicting defective IaC scripts.  Download PDF

Journal Empirical Software Engineering S. Dalla Palma, D. Di Nucci F. Palomba, D. Tamburri

Within-project Defect Prediction of Infrastructure-as-Code Using Product and Process Metrics.*

S. Dalla Palma, D. Di Nucci F. Palomba, D. Tamburri Journal Empirical Software Engineering

Abstract. Infrastructure-as-code (IaC) is the DevOps practice enabling management and provisioning of infrastructure through the definition of machine-readable files, hereinafter referred to as IaC scripts. Similarly to other source code artefacts, these files may contain defects that can preclude their correct functioning. In this paper, we aim at assessing the role of product and process metrics when predicting defective IaC scripts. We propose a fully integrated machine-learning framework for IaC Defect Prediction, that allows for repository crawling, metrics collection, model building, and evaluation. To evaluate it, we analyzed 104 projects and employed five machine-learning classifiers to compare their performance in flagging suspicious defective IaC scripts. The key results of the study report Random Forest as the best-performing model, with a median AUC-PR of 0.93 and MCC of 0.80. Furthermore, at least for the collected projects, product metrics identify defective IaC scripts more accurately than process metrics. Our findings put a baseline for investigating IaC Defect Prediction and the relationship between the product and process metrics, and IaC scripts' quality.

[J35] EMSE 2020

The Relation of Test-Related Factors to Software Quality: A Case Study on Apache Systems.*

Springer's Journal of Empirical Software Engineering (EMSE)

Testing represents a crucial activity to ensure software quality. Recent studies have shown that test-related factors (e.g., code coverage) can be reliable predictors of software code quality, as measured by post-release defects. While these studies provided initial compelling evidence on the relation between tests and post-release defects, they considered different test-related factors separately: as a consequence, there is still a lack of knowledge of whether these factors are still good predictors when considering all together.  Download PDF

Journal Testing Empirical Software Engineering F. Pecorelli, F. Palomba, A. De Lucia

The Relation of Test-Related Factors to Software Quality: A Case Study on Apache Systems.*

F. Pecorelli, F. Palomba, A. De Lucia Journal Testing Empirical Software Engineering

Abstract. Testing represents a crucial activity to ensure software quality. Recent studies have shown that test-related factors (e.g., code coverage) can be reliable predictors of software code quality, as measured by post-release defects. While these studies provided initial compelling evidence on the relation between tests and post-release defects, they considered different test-related factors separately: as a consequence, there is still a lack of knowledge of whether these factors are still good predictors when considering all together. In this paper, we propose a comprehensive case study on how test-related factors relate to production code quality in Apache systems. We first investigated how the presence of tests relates to post-release defects; then, we analyzed the role played by the test-related factors previously shown as significantly related to post-release defects. The key findings of the study show that, when controlling for other metrics (e.g., size of the production class), test-related factors have a limited connection to post-release defects.

[J34] JSS 2020

Predicting the Emergence of Community Smells using Socio-Technical Metrics: A Machine-Learning Approach.*

Elsevier's Journal of Systems and Software (JSS)

Community smells represent sub-optimal conditions appearing within software development communities (e.g., non-communicating sub-teams, deviant contributors, etc.) that may lead to the emergence of social debt and increase the overall project's cost. Previous work has studied these smells under different perspectives, investigating their nature, diffuseness, and impact on technical aspects of source code.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering F. Palomba, D. Tamburri

Predicting the Emergence of Community Smells using Socio-Technical Metrics: A Machine-Learning Approach.*

F. Palomba, D. Tamburri Journal Socio-Technical Analytics Empirical Software Engineering

Abstract. Community smells represent sub-optimal conditions appearing within software development communities (e.g., non-communicating sub-teams, deviant contributors, etc.) that may lead to the emergence of social debt and increase the overall project's cost. Previous work has studied these smells under different perspectives, investigating their nature, diffuseness, and impact on technical aspects of source code. Furthermore, it has been shown that some socio-technical metrics like, for instance, the well-known socio-technical congruence, can potentially be employed to foresee their appearance. Yet, there is still a lack of knowledge of the actual predictive power of such socio-technical metrics. In this paper, we aim at tackling this problem by empirically investigating (i) the potential value of socio-technical metrics as predictors of community smells and (ii) what is the performance of within- and cross-project community smell prediction models based on socio-technical metrics. To this aim, we exploit a dataset composed of 60 open-source projects and consider four community smells such as Organizational Silo, Black Cloud, Lone Wolf, and Bottleneck. The key results of our work report that a within-project solution can reach F-Measure and AUC-ROC of 77% and 78%, respectively, while cross-project models still require improvements, being however able to reach an F-Measure of 62% and overcome a random baseline. Among the metrics investigated, socio-technical congruence, communicability, and turnover-related metrics are the most powerful predictors of the emergence of community smells.

[J33] ESWA 2020

Code Smell Detection and Identification in Imbalanced Environments.*

Elsevier's Expert Systems with Applications (ESWA)

Code smells are sub-optimal design choices that could lower software maintainability. Previous literature did not consider an important characteristic of the smell detection problem, namely data imbalance. When considering a high number of code smell types, the number of smelly classes is likely to largely exceed the number of non-smelly ones, and vice versa.  Download PDF

Journal Software Quality Empirical Software Engineering Sofien Boutaiba, Slim Bechikha, F. Palomba, Maha Elarbia, Lamjed Ben Saida

Code Smell Detection and Identification in Imbalanced Environments.*

Sofien Boutaiba, Slim Bechikha, F. Palomba, Maha Elarbia, Lamjed Ben Saida Journal Software Quality Empirical Software Engineering

Abstract. Code smells are sub-optimal design choices that could lower software maintainability. Previous literature did not consider an important characteristic of the smell detection problem, namely data imbalance. When considering a high number of code smell types, the number of smelly classes is likely to largely exceed the number of non-smelly ones, and vice versa. Moreover, most studies did address the smell identification problem, which is more likely to present a higher imbalance as the number of smelly classes is relatively much less than the number of non-smelly ones. Furthermore, an additional research gap in the literature consists in the fact that the number of smell type identification methods is very small compared to the detection ones. The main challenges in smell detection and identification in an imbalanced environment are: (1) the structuring of the smell detector that should be able to deal with complex splitting boundaries and small disjuncts, (2) the design of the detector quality evaluation function that should take into account data imbalance, and (3) the efficient search for effective software metrics' thresholds that should well characterize the different smells. Furthermore, the number of smell type identification methods is very small compared to the detection ones. We propose ADIODE, an effective search-based engine that is able to deal with all the above-described challenges not only for the smell detection case but also for the identification one. Indeed, ADIODE is an EA (Evolutionary Algorithm) that evolves a population of detectors encoded as ODTs (Oblique Decision Trees) using the F-measure as a fitness function. This allows ADIODE to efficiently approximate globally-optimal detectors with effective oblique splitting hyper-planes and metrics’ thresholds. Results. A comparative experimental study on six open-source software systems demonstrates the merits and the outperformance of our approach com- pared to four of the most representative and prominent baseline techniques available in literature. The detection results show that the F-measure of ADIODE ranges between 91.23 % and 95.24 %, and its AUC lies between 0.9273 and 0.9573. Similarly, the identification results indicate that the F-measure of ADIODE varies between 86.26 % and 94.5 %, and its AUC is between 0.8653 and 0.9531.

[J32] JSS 2020

Towards a Catalogue of Software Quality Metrics for Infrastructure Code.*

Elsevier's Journal of Systems and Software (JSS)

Infrastructure-as-code (IaC) is a practice to implement continuous deployment by allowing management and provisioning of infrastructure through the definition of machine-readable files and automation around them, rather than physical hardware configuration or interactive configuration tools.  Download PDF

Journal Empirical Software Engineering S. Dalla Palma, D. Di Nucci, F. Palomba, D. Tamburri.

Towards a Catalogue of Software Quality Metrics for Infrastructure Code.*

S. Dalla Palma, D. Di Nucci, F. Palomba, D. Tamburri. Journal Empirical Software Engineering

Abstract. Infrastructure-as-code (IaC) is a practice to implement continuous deployment by allowing management and provisioning of infrastructure through the definition of machine-readable files and automation around them, rather than physical hardware configuration or interactive configuration tools. On the one hand, although IaC represents an ever-increasing widely adopted practice nowadays, still little is known concerning how to best maintain, speedily evolve, and continuously improve the code behind the IaC practice in a measurable fashion. On the other hand, source code measurements are often computed and analyzed to evaluate the different quality aspects of the software developed. However, unlike general-purpose programming languages (GPLs), IaC scripts use domain-specific languages, and metrics used for GPLs may not be applicable for IaC scripts. This article proposes a catalogue consisting of 46 metrics to identify IaC properties focusing on Ansible, one of the most popular IaC language to date, and shows how they can be used to analyze IaC scripts.

[J31] SPE 2020

"The Canary in the Coal Mine..." A Cautionary Tale from the Decline of SourceForge.*

Wiley's Software: Practice and Experience (SPE)

Forges are online collaborative platforms to support the development of distributed open-source software. While once mighty keepers of open-source vitality, software forges are rapidly becoming less and less relevant. For example, of the top 10 forges in 2011, only one survives today — SourceForge — the biggest of them all, but its numbers are dropping and its community is tenuous at best.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering D. Tamburri, K. Blincoe, F. Palomba, R. Kazman.

"The Canary in the Coal Mine..." A Cautionary Tale from the Decline of SourceForge.*

D. Tamburri, K. Blincoe, F. Palomba, R. Kazman. Journal Socio-Technical Analytics Empirical Software Engineering

Abstract. Forges are online collaborative platforms to support the development of distributed open-source software. While once mighty keepers of open-source vitality, software forges are rapidly becoming less and less relevant. For example, of the top 10 forges in 2011, only one survives today — SourceForge — the biggest of them all, but its numbers are dropping and its community is tenuous at best. Through mixed-methods research, this manuscript chronicles and analyze the software practice and experiences of the project's history — in particular its architectural and community/organizational decisions. We discovered a number of sub-optimal social and architectural decisions and circumstances that, may have led to SourceForge's demise. In addition, we found evidence suggesting that the impact of such decisions could have been monitored, reduced, and possibly avoided altogether. The use of socio-technical insights needs to become a basic set of design and software/organization monitoring principles that tell a cautionary tale on what to measure and what not to do in the context of large-scale software forge and community design and management.

[J30] TEM 2020

Success and Failure in Software Engineering: A Followup Systematic Literature Review.*

IEEE Transactions on Engineering Management (TEM)

Success and failure in software engineering are still among the least understood phenomena in the discipline. In a recent special journal issue on the topic, Mantyla et al. started discussing these topics from different angles; the authors focused their contributions on offering a general overview of both topics without deeper detail. Recognising the importance and impact of the topic, we have executed a followup, more in-depth systematic literature review with additional analyses beyond what was previously provided.  Download PDF

Journal Socio-Technical Analytics Systematic Literature Review D. Tamburri, F. Palomba, R. Kazman.

Success and Failure in Software Engineering: A Followup Systematic Literature Review.*

D. Tamburri, F. Palomba, R. Kazman. Journal Socio-Technical Analytics Systematic Literature Review

Abstract. Success and failure in software engineering are still among the least understood phenomena in the discipline. In a recent special journal issue on the topic, Mantyla et al. started discussing these topics from different angles; the authors focused their contributions on offering a general overview of both topics without deeper detail. Recognising the importance and impact of the topic, we have executed a followup, more in-depth systematic literature review with additional analyses beyond what was previously provided. These new analyses offer: (a) a grounded-theory of success and failure factors, harvesting over 500+ factors from the literature; (b) 14 manually-validated clusters of factors that provide relevant areas for success- and failure-specific measurement and risk-analysis; (c) a quality model composed of previously unmeasured organizational structure quantities which are germane to software product, process, and community quality. We show that the topics of success and failure deserve further study as well as further automated tool support, e.g., monitoring tools and metrics able to track the factors and patterns emerging from our study. This paper provides managers with risks as well as a more fine-grained analysis of the parameters that can be appraised to anticipate the risks.

[J29] JSS 2019

On the Performance of Method-Level Defect Prediction: A Negative Result.*

Elsevier's Journal of Systems and Software (JSS)

Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice.  Download PDF

Journal Software Quality Empirical Software Engineering L. Pascarella, F. Palomba, A. Bacchelli.

On the Performance of Method-Level Defect Prediction: A Negative Result.*

L. Pascarella, F. Palomba, A. Bacchelli. Journal Software Quality Empirical Software Engineering

Abstract. Bug prediction is aimed at identifying software artifacts that are more likely to be defective in the future. Most approaches defined so far target the prediction of bugs at class/file level. Nevertheless, past research has provided evidence that this granularity is too coarse-grained for its use in practice. As a consequence, researchers have started proposing defect prediction models targeting a finer granularity (particularly method-level granularity), providing promising evidence that it is possible to operate at this level. Particularly, models mixing product and process metrics provided the best results. We present a study in which we first replicate previous research on method-level bug-prediction, by using different systems and timespans. Afterwards, based on the limitations of existing research, we (1) re-evaluate method-level bug prediction models more realistically and (2) analyze whether alternative features based on textual aspects, code smells, and developer-related factors can be exploited to improve method-level bug prediction abilities. Key results of our study include that (1) the performance of the previously proposed models, tested using the same strategy but on different systems/timespans, is confirmed; but, (2) when evaluated with a more practical strategy, all the models show a dramatic drop in performance, with results close to that of a random classifier. Finally, we find that (3) the contribution of alternative features within such models is limited and unable to improve the prediction capabilities significantly. As a consequence, our replication and negative results indicate that method-level bug prediction is still an open challenge.

[J28] IEEE SW 2019

Gender Diversity and Community Smells: Insights from the Trenches.*

IEEE Software

Effective communication and organization within a software development team might influence the quality of both the software development process and the software created.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering G. Catolino, F. Palomba, D. A. Tamburri, A. Serebrenik, F. Ferrucci.

Gender Diversity and Community Smells: Insights from the Trenches.*

G. Catolino, F. Palomba, D. A. Tamburri, A. Serebrenik, F. Ferrucci. Journal Socio-Tehnical Analytics Empirical Software Engineering

Abstract. Effective communication and organization within a software development team might influence the quality of both the software development process and the software created. It is estimated that the consequences of poor communication in terms of cost reached $37 billion for companies. This motivated the research to understanding so-called "social debt", meant as the presence of non-cohesive development communities whose members have communication or coordination issues, and to identify community smells, namely socio-technical characteristics and patterns, which may lead to the emergence of social and technical debt. In this study, we triangulate the results previously obtained surveying 60 software practitioners to understand dimensions and presumed importance of gender diversity, but also whether there are additional factors to consider to reduce community smells. As a result, we found that practitioners seem not to perceive the phenomenon of gender diversity as an important factor to mitigate the presence of community smells. Nevertheless, practitioners who consider this as an important factor tried to strongly motivate their considerations. Finally, as main takeaway message from the survey, we found that most of the participants suggest taking into account communication skills when hiring and managing teams.

[J27] EMSE 2019

How Developers Engage with Static Analysis Tools in Different Contexts.*

Springer's Journal of Empirical Software Engineering (EMSE)

Automatic static analysis tools (ASATs) are instruments that support code quality assessment by automatically detecting defects and design issues. Despite their popularity, they are characterized by (i) a high false positive rate and (ii) the low comprehensibility of the generated warnings.  Download PDF

Journal Software Quality Empirical Software Engineering C. Vassallo, S. Panichella, F. Palomba, S. Proksch, H. Gall, A. Zaidman.

How Developers Engage with Static Analysis Tools in Different Contexts.*

C. Vassallo, S. Panichella, F. Palomba, S. Proksch, H. Gall, A. Zaidman. Journal Software Quality Empirical Software Engineering

Abstract. Automatic static analysis tools (ASATs) are instruments that support code quality assessment by automatically detecting defects and design issues. Despite their popularity, they are characterized by (i) a high false positive rate and (ii) the low comprehensibility of the generated warnings. However, no prior studies have investigated the usage of ASATs in different development contexts (e.g., code reviews, regular development), nor how open source projects integrate ASATs into their workflows. These perspectives are paramount to improve the prioritization of the identified warnings. To shed light on the actual ASATs usage practices, in this paper we first survey 56 developers (66% from industry and 34% from open source projects) and interview 11 industrial experts leveraging ASATs in their workflow with the aim of understanding how they use ASATs in different contexts. Furthermore, to investigate how ASATs are being used in the workflows of open source projects, we manually inspect the contribution guidelines of 176 open-source systems and extract the ASATs’ configuration and build files from their corresponding GitHub repositories. Our study highlights that (i) 71% of developers do pay attention to different warning categories depending on the development context; (ii) 63% of our respondents rely on specific factors (e.g., team policies and composition) when prioritizing warnings to fix during their programming; and (iii) 66% of the projects define how to use specific ASATs, but only 37% enforce their usage for new contributions. The perceived relevance of ASATs varies between different projects and domains, which is a sign that ASATs use is still not a common practice. In conclusion, this study confirms previous findings on the unwillingness of developers to configure ASATs and it emphasizes the necessity to improve existing strategies for the selection and prioritization of ASATs warnings that are shown to developers.

[J26] JSS 2019

Scented Since the Beginning: On the Diffuseness of Test Smells in Automatically Generated Test Code.*

Elsevier's Journal of Systems and Software (JSS)

Software testing represents a key software engineering practice to ensure source code quality and reliability. To support developers in this activity and reduce testing effort, several automated unit test generation tools have been proposed. Most of these approaches have the main goal of covering as more branches as possible.  Download PDF

Journal Software Testing Empirical Software Engineering G. Grano, F. Palomba, D. Di Nucci, A. De Lucia, H. Gall.

Scented Since the Beginning: On the Diffuseness of Test Smells in Automatically Generated Test Code.*

G. Grano, F. Palomba, D. Di Nucci, A. De Lucia, H. Gall. Journal Software Testing Empirical Software Engineering

Abstract. Software testing represents a key software engineering practice to ensure source code quality and reliability. To support developers in this activity and reduce testing effort, several automated unit test generation tools have been proposed. Most of these approaches have the main goal of covering as more branches as possible. While these approaches have good performance, little is still known on the maintainability of the test code they produce, i.e., whether the generated tests have a good code quality and if they do not possibly introduce issues threatening their effectiveness. To bridge this gap, in this paper we study to what extent existing automated test case generation tools produce potentially problematic test code. We consider seven test smells, i.e., suboptimal design choices applied by programmers during the development of test cases, as measure of code quality of the generated tests, and evaluate their diffuseness in the unit test classes automatically generated by three state-of-the-art tools such as Randoop, JTExpert, and Evosuite. Moreover, we investigate whether there are characteristics of test and production code influencing the generation of smelly tests. Our study shows that all the considered tools tend to generate a high quantity of two specific test smell types, i.e., Assertion Roulette and Eager Test, which are those that previous studies showed to negatively impact the reliability of production code. We also discover that test size is correlated with the generation of smelly tests. Based on our findings, we argue that more effective automated generation algorithms that explicitly take into account test code quality should be further investigated and devised.

[J25] EMSE 2019

Third-Party Libraries in Mobile Apps: When, How, and Why Developers Update Them.*

Springer's Journal of Empirical Software Engineering (EMSE)

When developing new software, third-party libraries are commonly used to reduce implementation efforts. However, even these libraries undergo evolution activities to offer new functionalities and fix bugs or security issues.  Download PDF

Journal Mobile Apps Evolution Empirical Software Engineering P. Salza, F. Palomba, D. Di Nucci, A. De Lucia, F. Ferrucci.

Third-Party Libraries in Mobile Apps: When, How, and Why Developers Update Them.*

P. Salza, F. Palomba, D. Di Nucci, A. De Lucia, F. Ferrucci. Journal Mobile Apps Evolution Empirical Software Engineering

Abstract. When developing new software, third-party libraries are commonly used to reduce implementation efforts. However, even these libraries undergo evolution activities to offer new functionalities and fix bugs or security issues. The research community has mainly investigated third-party libraries in the context of desktop applications, while only little is known regarding the mobile context. In this paper, we bridge this gap by investigating when, how, and why mobile developers update third-party libraries. By mining 2752 mobile apps, we study (i) whether mobile developers update third-party libraries, (ii) how much such apps lag behind the latest version of their dependencies,(iii) which are the categories of libraries that are more prone to be updated, and (iv) what are the common patterns followed by developers when updating a library. Then, we perform a survey with 73 mobile developers that aims at shedding lights on the reasons why they update (or not) third-party libraries. We find that mobile developers rarely update libraries, and when they do, they mainly tend to update libraries related to the Graphical User Interface.Avoiding bug propagation and making the app compatible with new Android releases are the top reasons why developers update their libraries.

[J24] EMSE 2019

Improving Change Prediction Models with Code Smells-Related Information.*

Springer's Journal of Empirical Software Engineering (EMSE)

Code smells are sub-optimal implementation choices applied by developers that have the effect of negatively impacting, among others, the change-proneness of the affected classes.  Download PDF

Journal Software Quality Empirical Software Engineering G. Catolino, F. Palomba, F. Arcelli Fontana, A. De Lucia, A. Zaidman, F. Ferrucci.

Improving Change Prediction Models with Code Smells-Related Information.*

G. Catolino, F. Palomba, F. Arcelli Fontana, A. De Lucia, A. Zaidman, F. Ferrucci. Journal Software Quality Empirical Software Engineering

Abstract. Code smells are sub-optimal implementation choices applied by developers that have the effect of negatively impacting, among others, the change-proneness of the affected classes. Based on this consideration, in this paper we conjecture that code smell-related information can be effectively exploited to improve the performance of change prediction models, i.e., models having the goal of indicating which classes are more likely to change in the future. We exploit the so-called intensity index—a previously defined metric that captures the severity of a code smell—and evaluate its contribution when added as additional feature in the context of three state of the art change prediction models based on product, process, and developer-based features. We also compare the performance achieved by the proposed model with a model based on previously defined antipattern metrics, a set of indicators computed considering the history of code smells in files. Our results report that (i) the prediction performance of the intensity-including models is statistically better than the baselines and, (ii) the intensity is a better predictor than antipattern metrics. We observed some orthogonality between the set of change-prone and non-change-prone classes correctly classified by the models relying on intensity and antipattern metrics: for this reason, we also devise and evaluate a smell-aware combined change prediction model including product, process, developer-based, and smell-related features. We show that the F-Measure of this model is notably higher than other models.

[J23] SCP 2019

A Large-Scale Empirical Exploration on Refactoring Activities in Open Source Software Projects.*

Elsevier's Science of Computer Programming (SCP)

Refactoring is a well-established practice that aims at improving the internal structure of a software system without changing its external behavior. Existing literature provides evidence of how and why developers perform refactoring in practice.  Download PDF

Journal Software Quality Empirical Software Engineering C. Vassallo, G. Grano, F. Palomba, H. Gall, A. Bacchelli.

A Large-Scale Empirical Exploration on Refactoring Activities in Open Source Software Projects.*

C. Vassallo, G. Grano, F. Palomba, H. Gall, A. Bacchelli. Journal Software Quality Empirical Software Engineering

Abstract. Refactoring is a well-established practice that aims at improving the internal structure of a software system without changing its external behavior. Existing literature provides evidence of how and why developers perform refactoring in practice. In this paper, we continue on this line of research by performing a large-scale empirical analysis of refactoring practices in 200 open source systems. Specifically, we analyze the change history of these systems at commit level to investigate: (i) whether developers perform refactoring operations and, if so, which are more diffused and (ii) when refactoring operations are applied, and (iii) which are the main developer-oriented factors leading to refactoring. Based on our results, future research can focus on enabling automatic support for less frequent refactorings and on recommending refactorings based on the developer’s workload, project’s maturity and developer’s commitment to the project.

Download PDF BibTeX
@article{vassallo2019large,
  title={A large-scale empirical exploration on refactoring activities in open source software projects},
  author={Vassallo, Carmine and Grano, Giovanni and Palomba, Fabio and Gall, Harald C and Bacchelli, Alberto},
  journal={Science of Computer Programming},
  volume={180},
  pages={1--15},
  year={2019},
  publisher={Elsevier}
}
[J22] JSS 2019 Recommended

Not All Bugs Are the Same: Understanding, Characterizing, and Classifying Bug Types.*

Elsevier's Journal of Systems and Software (JSS)

Modern version control systems, e.g., GitHub, include bug tracking mechanisms that developers can use to highlight the presence of bugs. This is done by means of bug reports, i.e., textual descriptions reporting the problem and the steps that led to a failure.  Download PDF

Journal Software Quality Empirical Software Engineering G. Catolino, F. Palomba, A. Zaidman, F. Ferrucci.

Not All Bugs Are the Same: Understanding, Characterizing, and Classifying Bug Types.*

G. Catolino, F. Palomba, A. Zaidman, F. Ferrucci. Journal Recommended Software Quality Empirical Software Engineering

Abstract. Modern version control systems, e.g., GitHub, include bug tracking mechanisms that developers can use to highlight the presence of bugs. This is done by means of bug reports, i.e., textual descriptions reporting the problem and the steps that led to a failure. In past and recent years, the research community deeply investigated methods for easing bug triage, that is, the process of assigning the fixing of a reported bug to the most qualified developer. Nevertheless, only a few studies have reported on how to support developers in the process of understanding the type of a reported bug, which is the first and most time-consuming step to perform before assigning a bug-fix operation. In this paper, we target this problem in two ways: first, we analyze 1,280 bug reports of 119 popular projects belonging to three ecosystems such as Mozilla, Apache, and Eclipse, with the aim of building a taxonomy of the types of reported bugs; then, we devise and evaluate an automated classification model able to classify reported bugs according to the defined taxonomy. As a result, we found nine main common bug types over the considered systems. Moreover, our model achieves high F-Measure and AUC-ROC (64% and 74% on overall, respectively).

Download PDF BibTeX
@article{catolino2019not,
  title={Not all bugs are the same: Understanding, characterizing, and classifying bug types},
  author={Catolino, Gemma and Palomba, Fabio and Zaidman, Andy and Ferrucci, Filomena},
  journal={Journal of Systems and Software},
  volume={152},
  pages={165--181},
  year={2019},
  publisher={Elsevier}
}
[J21] TSE 2019 Recommended

Lightweight Assessment of Test-Case Effectiveness using Source-Code-Quality Indicators.*

IEEE Transactions on Software Engineering (TSE)

Test cases are crucial to help developers preventing the introduction of software faults. Unfortunately, not all the tests are properly designed or can effectively capture faults in production code.  Download PDF

Journal Software Testing Empirical Software Engineering G. Grano, F. Palomba, H. Gall.

Lightweight Assessment of Test-Case Effectiveness using Source-Code-Quality Indicators.*

G. Grano, F. Palomba, H. Gall. Journal Recommended Software Testing Empirical Software Engineering

Abstract. Test cases are crucial to help developers preventing the introduction of software faults. Unfortunately, not all the tests are properly designed or can effectively capture faults in production code. Some measures have been defined to assess test-case effectiveness: the most relevant one is the mutation score, which highlights the quality of a test by generating the so-called mutants, i.e., variations of the production code that make it faulty and that the test is supposed to identify. However, previous studies revealed that mutation analysis is extremely costly and hard to use in practice. The approaches proposed by researchers so far have not been able to provide practical gains in terms of mutation testing efficiency. This leaves the problem of efficiently assessing test-case effectiveness as still open. In this paper, we investigate a novel, orthogonal, and lightweight methodology to assess test-case effectiveness: in particular, we study the feasibility to exploit production and test-code-quality indicators to estimate the mutation score of a test case. We firstly select a set of 67 factors and study their relation with test-case effectiveness. Then, we devise a mutation score prediction model exploiting such factors and investigate its performance as well as its most relevant features. The key results of the study reveal that our prediction model only based on static features has 86% of both F-Measure and AUC-ROC. This means that we can estimate the test-case effectiveness, using source-code-quality indicators, with high accuracy and without executing the tests. As a consequence, we can provide a practical approach that is beyond the typical limitations of current mutation testing techniques.

Download PDF BibTeX
@article{grano2019lightweight,
  title={Lightweight Assessment of Test-Case Effectiveness using Source-Code-Quality Indicators},
  author={Grano, Giovanni and Palomba, Fabio and Gall, Harald C},
  journal={IEEE Transactions on Software Engineering},
  year={2019},
  publisher={IEEE}
}
[J20] TSE 2019

Exploring Community Smells in Open-Source: An Automated Approach.*

IEEE Transactions on Software Engineering

Software engineering is now more than ever a community effort. Its success often weighs on balancing distance, culture, global engineering practices and more.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering D. A. Tamburri, F. Palomba, R. Kazman.

Exploring Community Smells in Open-Source: An Automated Approach.*

D. A. Tamburri, F. Palomba, R. Kazman. Journal Socio-Technical Analytics Empirical Software Engineering

Abstract. Software engineering is now more than ever a community effort. Its success often weighs on balancing distance, culture, global engineering practices and more. In this scenario many unforeseen socio-technical events may result into additional project cost or “social" debt, e.g., sudden, collective employee turnover. With industrial research we discovered community smells, that is, sub-optimal patterns across the organisational and social structure in a software development community that are precursors of such nasty socio-technical events. To understand the impact of community smells at large, in this paper we first introduce CODEFACE4SMELLS, an automated approach able to identify four community smell types that reflect socio-technical issues that have been shown to be detrimental both the software engineering and organisational research fields. Then, we perform a large-scale empirical study involving over 100 years worth of releases and communication structures data of 60 open-source communities: we evaluate (i) their diffuseness, i.e., how much are they distributed in open-source, (ii) how developers perceive them, to understand whether practitioners recognize their presence and their negative effects in practice, and (iii) how community smells relate to existing socio-technical factors, with the aim of assessing the inter-relations between them. The key findings of our study highlight that community smells are highly diffused in open-source and are perceived by developers as relevant problems for the evolution of software communities. Moreover, a number of state-of-the-art socio-technical indicators (e.g., socio-technical congruence) can be used to monitor how healthy a community is and possibly avoid the emergence of social debt.

Download PDF BibTeX
@article{tamburri2019exploring,
  title={Exploring Community Smells in Open-Source: An Automated Approach},
  author={Tamburri, Damian Andrew Andrew and Palomba, Fabio and Kazman, Rick},
  journal={IEEE Transactions on Software Engineering},
  year={2019},
  publisher={IEEE}
}
[J19] IST 2019

Machine Learning Techniques for Code Smell Detection: A Systematic Literature Review and Meta-Analysis.*

Elsevier's Information and Software Technology

Code smells indicate suboptimal design or implementation choices in the source code that often lead it to be more change- and faultprone.  Download PDF

Journal Software Quality Systematic Literature Review M. I. Azeem, F. Palomba, L. Shi, Q. Wang.

Machine Learning Techniques for Code Smell Detection: A Systematic Literature Review and Meta-Analysis.*

M. I. Azeem, F. Palomba, L. Shi, Q. Wang. Journal Software Quality Systematic Literature Review

Abstract.
Background: Code smells indicate suboptimal design or implementation choices in the source code that often lead it to be more change- and faultprone. Researchers defined dozens of code smell detectors, which exploit different sources of information to support developers when diagnosing design flaws. Despite their good accuracy, previous work pointed out three important limitations that might preclude the use of code smell detectors in practice: (i) subjectiveness of developers with respect to code smells detected by such tools, (ii) scarce agreement between different detectors, and (iii) difficulties in finding good thresholds to be used for detection. To overcome these limitations, the use of machine learning techniques represents an ever increasing research area.
Objective: While the research community carefully studied the methodologies applied by researchers when defining heuristic-based code smell detectors, there is still a noticeable lack of knowledge on how machine learning approaches have been adopted for code smell detection and whether there are points of improvement to allow a better detection of code smells. Our goal is to provide an overview and discuss the usage of machine learning approaches in the field of code smells.
Method: This paper presents a Systematic Literature Review (SLR) on Machine Learning Techniques for Code Smell Detection. Our work considers papers published between 2000 and 2017. Starting from an initial set of 2,456 papers, we found that 15 of them actually adopted machine learning approaches. We studied them under four different perspectives: (i) code smells considered, (ii) setup of machine learning approaches, (iii) design of the evaluation strategies, and (iv) a meta-analysis on the performance achieved by the models proposed so far.
Results: The analyses performed show that God Class, Long Method, Functional Decomposition, and Spaghetti Code have been heavily considered in the literature. Decision Trees and Support Vector Machines are the most commonly used machine learning algorithms for code smell detection. Models based on a large set of independent variables have performed well. JRip and Random Forest are the most effective classifiers in terms of performance. The analyses also reveal the existence of several open issues and challenges that the research community should focus on in the future.
Conclusion: Based on our findings, we argue that there is still room for the improvement of machine learning techniques in the context of code smell detection. The open issues emerged in this study can represent the input for researchers interested in developing more powerful techniques.

Download PDF BibTeX
@article{azeem2019machine,
  title={Machine learning techniques for code smell detection: A systematic literature review and meta-analysis},
  author={Azeem, Muhammad Ilyas and Palomba, Fabio and Shi, Lin and Wang, Qing},
  journal={Information and Software Technology},
  year={2019},
  publisher={Elsevier}
}
[J17] JSS 2019

Fine-Grained Just-In-Time Defect Prediction.*

Elsevier's Journal of Systems and Software (JSS)

Defect prediction models focus on identifying defect-prone code elements, for example to allow practitioners to allocate testing resources on specific subsystems and to provide assistance during code reviews.  Download PDF

Journal Software Quality Empirical Software Engineering L. Pascarella, F. Palomba, A. Bacchelli.

Fine-Grained Just-In-Time Defect Prediction.*

L. Pascarella, F. Palomba, A. Bacchelli. Journal Software Quality Empirical Software Engineering

Abstract. Defect prediction models focus on identifying defect-prone code elements, for example to allow practitioners to allocate testing resources on specific subsystems and to provide assistance during code reviews. While the research community has been highly active in proposing metrics and methods to predict defects on long-term periods (i.e., at release time), a recent trend is represented by the so-called short-term defect prediction (i.e., at commit-level). Indeed, this strategy represents an effective alternative in terms of effort required to inspect files likely affected by defects. Nevertheless, the granularity considered by such models might be still too coarse. Indeed, existing commit-level models highlight an entire commit as defective even in cases where only specific files actually contain defects. In this paper, we first investigate to what extent commits are partially defective; then, we propose a novel fine-grained just-in-time defect prediction model to predict the specific files, contained in a commit, that are defective. Finally, we evaluate our model in terms of (i) performance and (ii) the extent to which it decreases the effort required to diagnose a defect. Our study highlights that: (1) defective commits are frequently composed of a mixture of defective and nondefective files, (2) our fine-grained model can accurately predict defective files with an AUC-ROC up to 82% and (3) our model would allow practitioners to save inspection efforts with respect to standard just-in-time techniques.

Download PDF BibTeX
@article{pascarella2019fine,
  title={Fine-grained just-in-time defect prediction},
  author={Pascarella, Luca and Palomba, Fabio and Bacchelli, Alberto},
  journal={Journal of Systems and Software},
  volume={150},
  pages={22--36},
  year={2019},
  publisher={Elsevier}
}
[J16] IST 2019

A Survey on Software Coupling Relations and Tools*

Elsevier's Information and Software Technology (IST)

Coupling relations reflect the dependencies between software entities and can be used to assess the quality of a program. For this reason, a vast amount of them has been developed, together with tools to compute their related metrics.  Download PDF

Journal Software Quality Systematic Literature Review E. Fregnan, T. Baum, F. Palomba, A. Bacchelli.

A Survey on Software Coupling Relations and Tools*

E. Fregnan, T. Baum, F. Palomba, A. Bacchelli. Journal Software Quality Systematic Literature Review

Abstract.
Context: Coupling relations reflect the dependencies between software entities and can be used to assess the quality of a program. For this reason, a vast amount of them has been developed, together with tools to compute their related metrics. However, this makes the coupling measures suitable for a given application challenging to find.
Goals: The first objective of this work is to provide a classification of the different kinds of coupling relations, together with the metrics to measure them. The second consists in presenting an overview of the tools proposed until now by the software engineering academic community to extract these metrics.
Method: This work constitutes a systematic literature review in software engineering. To retrieve the referenced publications, publicly available scientific research databases were used. These sources were queried using keywords inherent to software coupling. We included publications from the period 2002 to 2017 and highly cited earlier publications. A snowballing technique was used to retrieve further related material.
Results: Four groups of coupling relations were found: structural, dynamic, semantic and logical. A fifth set of coupling relations includes approaches too recent to be considered an independent group and measures developed for specific environments. The investigation also retrieved tools that extract the metrics belonging to each coupling group.
Conclusion: This study shows the directions followed by the research on software coupling: e.g., developing metrics for specific environments. Concerning the metric tools, three trends have emerged in recent years: use of visualization techniques, extensibility and scalability. Finally, some coupling metrics applications were presented (e.g., code smell detection), indicating possible future research directions.

Download PDF BibTeX
@article{fregnan2018survey,
  title={A survey on software coupling relations and tools},
  author={Fregnan, Enrico and Baum, Tobias and Palomba, Fabio and Bacchelli, Alberto},
  journal={Information and Software Technology},
  year={2018},
  publisher={Elsevier}
}
[J15] TSE 2019

Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells?*

IEEE Transactions on Software Engineering (TSE)

Code smells are poor implementation choices applied by developers during software evolution that often lead to critical flaws or failure. Much in the same way, community smells reflect the presence of organizational and socio-technical issues within a software community that may lead to additional project costs.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering F. Palomba, D. A. Tamburri, F. Arcelli Fontana, R. Oliveto, A. Zaidman, A. Serebrenik.

Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells?*

F. Palomba, D. A. Tamburri, F. Arcelli Fontana, R. Oliveto, A. Zaidman, A. Serebrenik. Journal Socio-Technical Analytics Empirical Software Engineering

Abstract. Code smells are poor implementation choices applied by developers during software evolution that often lead to critical flaws or failure. Much in the same way, community smells reflect the presence of organizational and socio-technical issues within a software community that may lead to additional project costs. Recent empirical studies provide evidence that community smells are often—if not always—connected to circumstances such as code smells. In this paper we look deeper into this connection by conducting a mixed-methods empirical study of 117 releases from 9 open-source systems. The qualitative and quantitative sides of our mixed-methods study were run in parallel and assume a mutually-confirmative connotation. On the one hand, we survey 162 developers of the 9 considered systems to investigate whether developers perceive relationship between community smells and the code smells found in those projects. On the other hand, we perform a fine-grained analysis into the 117 releases of our dataset to measure the extent to which community smells impact code smell intensity (i.e., criticality). We then propose a code smell intensity prediction model that relies on both technical and community-related aspects. The results of both sides of our mixed-methods study lead to one conclusion: community-related factors contribute to the intensity of code smells. This conclusion supports the joint use of community and code smells detection as a mechanism for the joint management of technical and social problems around software development communities.

Download PDF BibTeX
@article{palomba2018beyond,
  title={Beyond technical aspects: How do community smells influence the intensity of code smells?},
  author={Palomba, Fabio and Tamburri, Damian Andrew Andrew and Fontana, Francesca Arcelli and Oliveto, Rocco and Zaidman, Andy and Serebrenik, Alexander},
  journal={IEEE transactions on software engineering},
  year={2018},
  publisher={IEEE}
}
[J14] EMSE 2019

Discovering Community Patterns in Open-Source: A Systematic Approach and Its Evaluation.*

Springer's Journal of Empirical Software Engineering (EMSE)

The open-source phenomenon has reached the point in which it is virtually impossible to find large applications that do not rely on it. Such grand adoption may turn into a risk if the community regulatory aspects behind open-source work (e.g., contribution guidelines or release schemas) are left implicit and their effect untracked.  Download PDF

Journal Socio-Technical Analytics Empirical Software Engineering D. A. Tamburri, F. Palomba, A. Serebrenik, A. Zaidman.

Discovering Community Patterns in Open-Source: A Systematic Approach and Its Evaluation.*

D. A. Tamburri, F. Palomba, A. Serebrenik, A. Zaidman. Journal Socio-Technical Analytics Empirical Software Engineering

Abstract. “There can be no vulnerability without risk; there can be no community without vulnerability; there can be no peace, and ultimately no life, without community.” - [M. Scott Peck]
The open-source phenomenon has reached the point in which it is virtually impossible to find large applications that do not rely on it. Such grand adoption may turn into a risk if the community regulatory aspects behind open-source work (e.g., contribution guidelines or release schemas) are left implicit and their effect untracked. We advocate the explicit study and automated support of such aspects and propose Yoshi (Yielding Open-Source Health Information), a tool able to map open-source communities onto community patterns, sets of known organisational and social structure types and characteristics with measurable core attributes. This mapping is beneficial since it allows, for example, (a) further investigation of community health measuring established characteristics from organisations research, (b) reuse of pattern-specific best-practices from the same literature, and (c) diagnosis of organisational anti-patterns specific to open-source, if any. We evaluate the tool in a quantitative empirical study involving 25 open-source communities from GitHub, finding that the tool offers a valuable basis to monitor key community traits behind open-source development and may form an effective combination with web-portals such as OpenHub or Bitergia. We made the proposed tool open source and publicly available.

Download PDF BibTeX
@article{tamburri2019discovering,
  title={Discovering community patterns in open-source: A systematic approach and its evaluation},
  author={Tamburri, Damian A and Palomba, Fabio and Serebrenik, Alexander and Zaidman, Andy},
  journal={Empirical Software Engineering},
  volume={24},
  number={3},
  pages={1369--1417},
  year={2019},
  publisher={Springer}
}
[J13] IST 2019

On the Impact of Code Smells on the Energy Consumption of Mobile Applications.*

Elsevier's Information and Software Technology (IST)

The demand for green software design is steadily growing higher especially in the context of mobile devices, where the computation is often limited by battery life. Previous studies found how wrong programming solutions have a strong impact on the energy consumption.  Download PDF

Journal Software Quality Empirical Software Engineering F. Palomba, D. Di Nucci, A. Panichella, A. Zaidman, A. De Lucia.

On the Impact of Code Smells on the Energy Consumption of Mobile Applications.*

F. Palomba, D. Di Nucci, A. Panichella, A. Zaidman, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract.
Context. The demand for green software design is steadily growing higher especially in the context of mobile devices, where the computation is often limited by battery life. Previous studies found how wrong programming solutions have a strong impact on the energy consumption.
Objective. Despite the efforts spent so far, only a little knowledge on the influence of code smells, i.e., symptoms of poor design or implementation choices, on the energy consumption of mobile applications is available.
Method. To provide a wider overview on the relationship between smells and energy efficiency, in this paper we conducted a large-scale empirical study on the influence of 9 Android-specific code smells on the energy consumption of 60 Android apps. In particular, we focus our attention on the design flaws that are theoretically supposed to be related to non-functional attributes of source code, such as performance and energy consumption.
Results. The results of the study highlight that methods affected by four code smell types, i.e., Internal Setter, Leaking Thread, Member Ignoring Method, and Slow Loop, consume up to 87 times more than methods affected by other code smells. Moreover, we found that refactoring these code smells reduces energy consumption in all of the situations.
Conclusions. Based on our findings, we argue that more research aimed at designing automatic refactoring approaches and tools for mobile apps is needed.

Download PDF BibTeX
@article{palomba2019impact,
  title={On the impact of code smells on the energy consumption of mobile applications},
  author={Palomba, Fabio and Di Nucci, Dario and Panichella, Annibale and Zaidman, Andy and De Lucia, Andrea},
  journal={Information and Software Technology},
  volume={105},
  pages={43--55},
  year={2019},
  publisher={Elsevier}
}
[J12] JSS 2018

Enhancing Change Prediction Models using Developer-Related Factors.*

Elsevier's Journal of Systems and Software (JSS)

Continuous changes applied during software maintenance risk to deteriorate the structure of a system and threat its maintainability. In this context, predicting the portions of source code where specific maintenance operations should be focused on may be crucial for developers to prevent maintainability issues.  Download PDF

Journal Software Quality Empirical Software Engineering G. Catolino, F. Palomba, A. De Lucia, F. Ferrucci, A. Zaidman.

Enhancing Change Prediction Models using Developer-Related Factors.*

G. Catolino, F. Palomba, A. De Lucia, F. Ferrucci, A. Zaidman. Journal Software Quality Empirical Software Engineering

Abstract. Continuous changes applied during software maintenance risk to deteriorate the structure of a system and threat its maintainability. In this context, predicting the portions of source code where specific maintenance operations should be focused on may be crucial for developers to prevent maintainability issues. Researchers proposed change prediction models based on product metrics, while recent papers have shown the adaptability of process metrics to the same context. However, we believe that existing approaches still miss an important information, i.e., developer-related factors that are able to capture how complex is the development process under different perspectives. In this paper, we firstly investigate three change prediction models that exploit developer-related factors (e.g., number of developers working on a class) as predictors of change-proneness of classes and then we compare them with existing models. Our findings reveal that these factors might improve in some cases the capabilities of change prediction models. Moreover, we observed interesting complementarities among the prediction models. For this reason, we devised a novel change prediction model exploiting the combination of developer-related factors and product and evolution metrics. The results show that such model is up to 20% more effective than the single models in the identification of change-prone classes.

Download PDF BibTeX
@article{catolino2018enhancing,
  title={Enhancing change prediction models using developer-related factors},
  author={Catolino, Gemma and Palomba, Fabio and De Lucia, Andrea and Ferrucci, Filomena and Zaidman, Andy},
  journal={Journal of Systems and Software},
  volume={143},
  pages={14--28},
  year={2018},
  publisher={Elsevier}
}
[J11] IST 2018

A Large-Scale Empirical Study on the Lifecycle of Code Smell Co-occurrences.*

Elsevier's Information and Software Technology (IST)

Code smells are suboptimal design or implementation choices made by programmers during the development of a software system that possibly lead to low code maintainability and higher maintenance costs.  Download PDF

Journal Software Quality Empirical Software Engineering F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R.Oliveto, A. De Lucia.

A Large-Scale Empirical Study on the Lifecycle of Code Smell Co-occurrences.*

F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R.Oliveto, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract.
Context. Code smells are suboptimal design or implementation choices made by programmers during the development of a software system that possibly lead to low code maintainability and higher maintenance costs.
Objective. Previous research mainly studied the characteristics of code smell instances affecting a source code file, while only few studies analyzed the magnitude and effects of smell co-occurrence, i.e., the co-occurrence of different types of smells on the same code component. This paper aims at studying in details this phenomenon.
Method. We analyzed 13 code smell types detected in 395 releases of 30 software systems to firstly assess the extent to which code smells co-occur, and then we analyze (i) which code smells co-occur together, and (ii) how and why they are introduced and removed by developers.
Results. 59% of smelly classes are affected by more than one smell, and in particular there are six pairs of smell types (e.g., Message Chains and Spaghetti Code) that frequently co-occur. Furthermore, we observed that method-level code smells may be the root cause for the introduction of class-level smells. Finally, code smell co-occurrences are generally removed together as a consequence of other maintenance activities causing the deletion of the affected code components (with a consequent removal of the code smell instances) as well as the result of a major restructuring or scheduled refactoring actions.
Conclusions. Based on our findings, we argue that more research aimed at designing co-occurrence-aware code smell detectors and refactoring approaches is needed.

Download PDF BibTeX
@article{palomba2018large,
  title={A large-scale empirical study on the lifecycle of code smell co-occurrences},
  author={Palomba, Fabio and Bavota, Gabriele and Di Penta, Massimiliano and Fasano, Fausto and Oliveto, Rocco and De Lucia, Andrea},
  journal={Information and Software Technology},
  volume={99},
  pages={1--10},
  year={2018},
  publisher={Elsevier}
}
[J10] JSS 2018

Crowdsourcing User Reviews to Support the Evolution of Mobile Apps.*

Elsevier's Journal of Systems and Software (JSS)

In recent software development and distribution scenarios, app stores are playing a major role, especially for mobile apps. On one hand, app stores allow continuous releases of app updates. On the other hand, they have become the premier point of interaction between app providers and users.  Download PDF

Journal Mobile Apps Evolution Empirical Software Engineering F. Palomba, M. Linares Vasquez, G. Bavota, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia.

Crowdsourcing User Reviews to Support the Evolution of Mobile Apps.*

F. Palomba, M. Linares Vasquez, G. Bavota, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia. Journal Mobile Apps Evolution Empirical Software Engineering

Abstract. In recent software development and distribution scenarios, app stores are playing a major role, especially for mobile apps. On one hand, app stores allow continuous releases of app updates. On the other hand, they have become the premier point of interaction between app providers and users. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we empirically investigate—by performing a study on the evolution of 100 open source Android apps and by surveying 73 developers—to what extent app developers take user reviews into account, and whether addressing them contributes to apps’ success in terms of ratings. In order to perform the study, as well as to provide a monitoring mechanism for developers and project managers, we devised an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results of our study indicate that (i) on average, half of the informative reviews are addressed, and over 75% of the interviewed developers claimed to take them into account often or very often, and that (ii) developers implementing user reviews are rewarded in terms of significantly increased user ratings.

Download PDF BibTeX
@article{palomba2018crowdsourcing,
  title={Crowdsourcing user reviews to support the evolution of mobile apps},
  author={Palomba, Fabio and Linares-V{\'a}squez, Mario and Bavota, Gabriele and Oliveto, Rocco and Di Penta, Massimiliano and Poshyvanyk, Denys and De Lucia, Andrea},
  journal={Journal of Systems and Software},
  volume={137},
  pages={143--162},
  year={2018},
  publisher={Elsevier}
}
[J9] EMSE 2018 Recommended

On the Diffuseness and the Impact on Maintainability of Code Smells: A Large Scale Empirical Study.*

Springer's Journal of Empirical Software Engineering (EMSE)

Code smells are symptoms of poor design and implementation choices that may hinder code comprehensibility and maintainability. Despite the effort devoted by the research community in studying code smells, the extent to which code smells in software systems affect software maintainability remains still unclear.  Download PDF

Journal Software Quality Empirical Software Engineering F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R.Oliveto, A. De Lucia.

On the Diffuseness and the Impact on Maintainability of Code Smells: A Large Scale Empirical Study.*

F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R.Oliveto, A. De Lucia. Journal Recommended Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor design and implementation choices that may hinder code comprehensibility and maintainability. Despite the effort devoted by the research community in studying code smells, the extent to which code smells in software systems affect software maintainability remains still unclear. In this paper we present a large scale empirical investigation on the diffuseness of code smells and their impact on code changeand fault-proneness. The study was conducted across a total of 395 releases of 30 open source projects and considering 17,350 manually validated instances of 13 different code smell kinds. The results show that smells characterized by long and/or complex code (e.g., Complex Class) are highly diffused, and that smelly classes have a higher change- and fault-proneness than smell-free classes.

Download PDF BibTeX
@article{palomba2018diffuseness,
  title={On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation},
  author={Palomba, Fabio and Bavota, Gabriele and Di Penta, Massimiliano and Fasano, Fausto and Oliveto, Rocco and De Lucia, Andrea},
  journal={Empirical Software Engineering},
  volume={23},
  number={3},
  pages={1188--1221},
  year={2018},
  publisher={Springer}
}
[J8] TSE 2017

Toward a Smell-aware Bug Prediction Model.*

IEEE Transactions on Software Engineering (TSE)

Code smells are symptoms of poor design and implementation choices. Previous studies empirically assessed the impact of smells on code quality and clearly indicate their negative impact on maintainability, including a higher bug-proneness of components affected by code smells.  Download PDF

Journal Software Quality Empirical Software Engineering F. Palomba, M. Zanoni, F. Arcelli Fontana, A. De Lucia, R. Oliveto.

Toward a Smell-aware Bug Prediction Model.*

F. Palomba, M. Zanoni, F. Arcelli Fontana, A. De Lucia, R. Oliveto. Journal Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor design and implementation choices. Previous studies empirically assessed the impact of smells on code quality and clearly indicate their negative impact on maintainability, including a higher bug-proneness of components affected by code smells. In this paper, we capture previous findings on bug-proneness to build a specialized bug prediction model for smelly classes. Specifically, we evaluate the contribution of a measure of the severity of code smells (i.e., code smell intensity) by adding it to existing bug prediction models based on both product and process metrics, and comparing the results of the new model against the baseline models. Results indicate that the accuracy of a bug prediction model increases by adding the code smell intensity as predictor. We also compare the results achieved by the proposed model with the ones of an alternative technique which considers metrics about the history of code smells in files, finding that our model works generally better. However, we observed interesting complementarities between the set of buggy and smelly classes correctly classified by the two models. By evaluating the actual information gain provided by the intensity index with respect to the other metrics in the model, we found that the intensity index is a relevant feature for both product and process metrics-based models. At the same time, the metric counting the average number of code smells in previous versions of a class considered by the alternative model is also able to reduce the entropy of the model. On the basis of this result, we devise and evaluate a smell-aware combined bug prediction model that included product, process, and smell-related features. We demonstrate how such model classifies bug-prone code components with an F-Measure at least 13% higher than the existing state-of-the-art models.

Download PDF BibTeX
@article{palomba2017toward,
  title={Toward a smell-aware bug prediction model},
  author={Palomba, Fabio and Zanoni, Marco and Fontana, Francesca Arcelli and De Lucia, Andrea and Oliveto, Rocco},
  journal={IEEE Transactions on Software Engineering},
  volume={45},
  number={2},
  pages={194--218},
  year={2017},
  publisher={IEEE}
}
[J7] TSE 2017

The Scent of a Smell: An Extensive Comparison between Textual and Structural Smells.*

IEEE Transactions on Software Engineering (TSE)

Code smells are symptoms of poor design or implementation choices that have a negative effect on several aspects of software maintenance and evolution, such as program comprehension or change- and fault-proneness. This is why researchers have spent a lot of effort on devising methods that help developers to automatically detect them in source code.  Download PDF

Journal Software Quality Empirical Software Engineering F. Palomba, A. Panichella, A. Zaidman, R. Oliveto, A. De Lucia.

The Scent of a Smell: An Extensive Comparison between Textual and Structural Smells.*

F. Palomba, A. Panichella, A. Zaidman, R. Oliveto, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor design or implementation choices that have a negative effect on several aspects of software maintenance and evolution, such as program comprehension or change- and fault-proneness. This is why researchers have spent a lot of effort on devising methods that help developers to automatically detect them in source code. Almost all the techniques presented in literature are based on the analysis of structural properties extracted from source code, although alternative sources of information (e.g., textual analysis) for code smell detection have also been recently investigated. Nevertheless, some studies have indicated that code smells detected by existing tools based on the analysis of structural properties are generally ignored (and thus not refactored) by the developers. In this paper, we aim at understanding whether code smells detected using textual analysis are perceived and refactored by developers in the same or different way than code smells detected through structural analysis. To this aim, we set up two different experiments. We have first carried out a software repository mining study to analyze how developers act on textually or structurally detected code smells. Subsequently, we have conducted a user study with industrial developers and quality experts in order to qualitatively analyze how they perceive code smells identified using the two different sources of information. Results indicate that textually detected code smells are easier to identify and for this reason they are considered easier to refactor with respect to code smells detected using structural properties. On the other hand, the latter are often perceived as more severe, but more difficult to exactly identify and remove.

Download PDF BibTeX
@article{palomba2017scent,
  title={The scent of a smell: An extensive comparison between textual and structural smells},
  author={Palomba, Fabio and Panichella, Annibale and Zaidman, Andy and Oliveto, Rocco and De Lucia, Andrea},
  journal={IEEE Transactions on Software Engineering},
  volume={44},
  number={10},
  pages={977--1000},
  year={2017},
  publisher={IEEE}
}
[J6] TETCI 2017

Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method.*

IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)

In the last decades the research community has devoted a lot of effort in the definition of approaches able to predict the defect proneness of source code files. Such approaches exploit several predictors (e.g., product or process metrics) and use machine learning classifiers to predict classes into buggy or not buggy, or provide the likelihood that a class will exhibit a fault in the near future.  Download PDF

Journal Software Quality Empirical Software Engineering D. Di Nucci, F. Palomba, R.Oliveto, A. De Lucia.

Dynamic Selection of Classifiers in Bug Prediction: An Adaptive Method.*

D. Di Nucci, F. Palomba, R.Oliveto, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. In the last decades the research community has devoted a lot of effort in the definition of approaches able to predict the defect proneness of source code files. Such approaches exploit several predictors (e.g., product or process metrics) and use machine learning classifiers to predict classes into buggy or not buggy, or provide the likelihood that a class will exhibit a fault in the near future. The empirical evaluation of all these approaches indicated that there is no machine learning classifier providing the best accuracy in any context, highlighting interesting complementarity among them. For these reasons ensemble methods have been proposed to estimate the bug-proneness of a class by combining the predictions of different classifiers. Following this line of research, in this paper we propose an adaptive method, named ASCI (Adaptive Selection of Classifiers in bug predIction), able to dynamically select among a set of machine learning classifiers the one which better predicts the bug proneness of a class based on its characteristics. An empirical study conducted on 30 software systems indicates that ASCI exhibits higher performances than 5 different classifiers used independently and combined with the majority voting ensemble method.

Download PDF BibTeX
@article{di2017dynamic,
  title={Dynamic selection of classifiers in bug prediction: An adaptive method},
  author={Di Nucci, Dario and Palomba, Fabio and Oliveto, Rocco and De Lucia, Andrea},
  journal={IEEE Transactions on Emerging Topics in Computational Intelligence},
  volume={1},
  number={3},
  pages={202--212},
  year={2017},
  publisher={IEEE}
}
[J5] TSE 2017

A Developer Centered Bug Prediction Model.*

IEEE Transactions on Software Engineering (TSE)

Several techniques have been proposed to accurately predict software defects. These techniques generally exploit characteristics of the code artefacts (e.g., size, complexity, etc.) and/or of the process adopted during their development and maintenance (e.g., the number of developers working on a component) to spot out components likely containing bugs.  Download PDF

Journal Software Quality Empirical Software Engineering D. Di Nucci, F. Palomba, G. De Rosa, G. Bavota, R.Oliveto, A. De Lucia.

A Developer Centered Bug Prediction Model.*

D. Di Nucci, F. Palomba, G. De Rosa, G. Bavota, R.Oliveto, A. De Lucia. Journal Software Quality Empirical Software Engineering

Abstract. Several techniques have been proposed to accurately predict software defects. These techniques generally exploit characteristics of the code artefacts (e.g., size, complexity, etc.) and/or of the process adopted during their development and maintenance (e.g., the number of developers working on a component) to spot out components likely containing bugs. While these bug prediction models achieve good levels of accuracy, they mostly ignore the major role played by human-related factors in the introduction of bugs. Previous studies have demonstrated that focused developers are less prone to introduce defects than non-focused developers. According to this observation, software components changed by focused developers should also be less error prone than components changed by less focused developers. We capture this observation by measuring the scattering of changes performed by developers working on a component and use this information to build a bug prediction model. Such a model has been evaluated on 26 systems and compared with four competitive techniques. The achieved results show the superiority of our model, and its high complementarity with respect to predictors commonly used in the literature. Based on this result, we also show the results of a “hybrid” prediction model combining our predictors with the existing ones.

Download PDF BibTeX
@article{di2017developer,
  title={A developer centered bug prediction model},
  author={Di Nucci, Dario and Palomba, Fabio and De Rosa, Giuseppe and Bavota, Gabriele and Oliveto, Rocco and De Lucia, Andrea},
  journal={IEEE Transactions on Software Engineering},
  volume={44},
  number={1},
  pages={5--24},
  year={2017},
  publisher={IEEE}
}
[J4] TSE 2017 Recommended

When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away).*

IEEE Transactions on Software Engineering (TSE)

Technical debt is a metaphor introduced by Cunningham to indicate “not quite right code which we postpone making it right”. One noticeable symptom of technical debt is represented by code smells, defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code.  Download PDF

Journal Software Quality Empirical Software Engineering M. Tufano, F. Palomba, G. Bavota, R.Oliveto, M. Di Penta, A. De Lucia, D. Poshyvanyk.

When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away).*

M. Tufano, F. Palomba, G. Bavota, R.Oliveto, M. Di Penta, A. De Lucia, D. Poshyvanyk. Journal Recommended Software Quality Empirical Software Engineering

Abstract. Technical debt is a metaphor introduced by Cunningham to indicate “not quite right code which we postpone making it right”. One noticeable symptom of technical debt is represented by code smells, defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced, what is their survivability, and how they are removed by developers. To empirically corroborate such anecdotal evidence, we conducted a large empirical study over the change history of 200 open source projects. This study required the development of a strategy to identify smell-introducing commits, the mining of over half a million of commits, and the manual analysis and classification of over 10K of them. Our findings mostly contradict common wisdom, showing that most of the smell instances are introduced when an artifact is created and not as a result of its evolution. At the same time, 80% of smells survive in the system. Also, among the 20% of removed instances, only 9% is removed as a direct consequence of refactoring operations.

Download PDF BibTeX
@article{tufano2017and,
  title={When and why your code starts to smell bad (and whether the smells go away)},
  author={Tufano, Michele and Palomba, Fabio and Bavota, Gabriele and Oliveto, Rocco and Di Penta, Massimiliano and De Lucia, Andrea and Poshyvanyk, Denys},
  journal={IEEE Transactions on Software Engineering},
  volume={43},
  number={11},
  pages={1063--1088},
  year={2017},
  publisher={IEEE}
}
[J3] JSEP 2017

There and Back Again: Can you Compile that Snapshot?.*

Wiley's Journal of Software: Evolution and Process (JSEP)

A broken snapshot represents a snapshot from a project’s change history that cannot be compiled. Broken snapshots can have significant implications for researchers, as they could hinder any analysis of the past project history that requires code to be compiled.  Download PDF

Journal Empirical Software Engineering M. Tufano, F. Palomba, G. Bavota, M. Di Penta, R.Oliveto, A. De Lucia, D. Poshyvanyk.

There and Back Again: Can you Compile that Snapshot?.*

M. Tufano, F. Palomba, G. Bavota, M. Di Penta, R.Oliveto, A. De Lucia, D. Poshyvanyk. Journal Empirical Software Engineering

Abstract. A broken snapshot represents a snapshot from a project’s change history that cannot be compiled. Broken snapshots can have significant implications for researchers, as they could hinder any analysis of the past project history that requires code to be compiled. Noticeably, while some broken snapshots may be observable in change history repositories (e.g., no longer available dependencies), some of them may not necessarily happen during the actual development. In this paper we systematically study the compilability of broken snapshots in 219,395 snapshots belonging to 100 Java projects from the Apache Software Foundation, all relying on Maven as an automated build tool. We investigated broken snapshots from two different perspectives: (i) how frequently they happen and (ii) likely causes behind them. The empirical results indicate that broken snapshots occur in most (96%) of the projects we studied and that they are mainly due to problems related to the resolution of dependencies. On average, only 38% of the change history of project systems is currently successfully compilable.

Download PDF BibTeX
@article{tufano2017there,
  title={There and back again: Can you compile that snapshot?},
  author={Tufano, Michele and Palomba, Fabio and Bavota, Gabriele and Di Penta, Massimiliano and Oliveto, Rocco and De Lucia, Andrea and Poshyvanyk, Denys},
  journal={Journal of Software: Evolution and Process},
  volume={29},
  number={4},
  pages={e1838},
  year={2017},
  publisher={Wiley Online Library}
}
[J2] JSS 2015

An Experimental Investigation on the Innate Relationship between Quality and Refactoring.*

Elsevier's Journal of Systems and Software (JSS)

Previous studies have investigated the reasons behind refactoring operations performed by developers, and proposed methods and tools to recommend refactorings based on quality metric profiles, or on the presence of poor design and implementation choices, i.e., code smells.  Download PDF

Journal Software Quality Empirical Software Engineering G. Bavota, A. De Lucia, M. Di Penta, R.Oliveto, F. Palomba.

An Experimental Investigation on the Innate Relationship between Quality and Refactoring.*

G. Bavota, A. De Lucia, M. Di Penta, R.Oliveto, F. Palomba. Journal Software Quality Empirical Software Engineering

Abstract. Previous studies have investigated the reasons behind refactoring operations performed by developers, and proposed methods and tools to recommend refactorings based on quality metric profiles, or on the presence of poor design and implementation choices, i.e., code smells. Nevertheless, the existing literature lacks observations about the relations between metrics/code smells and refactoring activities performed by developers. In other words, the characteristics of code components increasing/decreasing their chances of being object of refactoring operations are still unknown. This paper aims at bridging this gap. Specifically, we mined the evolution history of three Java open source projects to investigate whether refactoring activities occur on code components for which certain indicators—such as quality metrics or the presence of smells as detected by tools—suggest there might be need for refactoring operations. Results indicate that, more often than not, quality metrics do not show a clear relationship with refactoring. In other words, refactoring operations are generally focused on code components for which quality metrics do not suggest there might be need for refactoring operations. Finally, 42% of refactoring operations are performed on code entities affected by code smells. However, only 7% of the performed operations actually remove the code smells from the affected class.

Download PDF BibTeX
@article{bavota2015experimental,
  title={An experimental investigation on the innate relationship between quality and refactoring},
  author={Bavota, Gabriele and De Lucia, Andrea and Di Penta, Massimiliano and Oliveto, Rocco and Palomba, Fabio},
  journal={Journal of Systems and Software},
  volume={107},
  pages={1--14},
  year={2015},
  publisher={Elsevier}
}
[J1] TSE 2015 Recommended

Mining Version Histories for Detecting Code Smells.*

IEEE Transactions on Software Engineering (TSE)

Code smells are symptoms of poor design and implementation choices that may hinder code comprehension, and possibly increase change- and fault-proneness. While most of the detection techniques just rely on structural information, many code smells are intrinsically characterized by how code elements change over time.  Download PDF

Journal Software Quality Empirical Software Engineering F. Palomba, G. Bavota, M. Di Penta, R.Oliveto, D. Poshyvanyk, A. De Lucia.

Mining Version Histories for Detecting Code Smells.*

F. Palomba, G. Bavota, M. Di Penta, R.Oliveto, D. Poshyvanyk, A. De Lucia. Journal Recommended Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor design and implementation choices that may hinder code comprehension, and possibly increase change- and fault-proneness. While most of the detection techniques just rely on structural information, many code smells are intrinsically characterized by how code elements change over time. In this paper, we propose HIST (Historical Information for Smell deTection), an approach exploiting change history information to detect instances of five different code smells, namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy. We evaluate HIST in two empirical studies. The first, conducted on twenty open source projects, aimed at assessing the accuracy of HIST in detecting instances of the code smells mentioned above. The results indicate that the precision of HIST ranges between 72% and 86%, and its recall ranges between 58% and 100%. Also, results of the first study indicate that HIST is able to identify code smells that cannot be identified by competitive approaches solely based on code analysis of a single system’s snapshot. Then, we conducted a second study aimed at investigating to what extent the code smells detected by HIST (and by competitive code analysis techniques) reflect developers’ perception of poor design and implementation choices. We involved twelve developers of four open source projects that recognized more than 75% of the code smell instances identified by HIST as actual design/implementation problems.

Download PDF BibTeX
@article{palomba2014mining,
  title={Mining version histories for detecting code smells},
  author={Palomba, Fabio and Bavota, Gabriele and Di Penta, Massimiliano and Oliveto, Rocco and Poshyvanyk, Denys and De Lucia, Andrea},
  journal={IEEE Transactions on Software Engineering},
  volume={41},
  number={5},
  pages={462--489},
  year={2014},
  publisher={IEEE}
}
[C86] SEAA 2024

An Empirical Study on the Relation between Programming Languages and the Emergence of Community Smells.*

50th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Paris, France.

To provide a measurable representation of social issues in software teams, the research community defined a set of anti-patterns that may lead to the emergence of both social and technical debt, i.e., "community smells". Researchers have investigated community smells from different perspectives; in particular, they have analyzed how product-related aspects of software development, such as architecture and introducing a new language, could influence community smells. However, how technical project characteristics may be in relation to the emergence of community smells is still unknown.  Download PDF

Conference Socio-Technical Analysis G. Annunziata, C. Ferrara, S. Lambiase, F. Palomba, G. Catolino, F. Ferrucci, A. De Lucia.

An Empirical Study on the Relation between Programming Languages and the Emergence of Community Smells.*

G. Annunziata, C. Ferrara, S. Lambiase, F. Palomba, G. Catolino, F. Ferrucci, A. De Lucia. Conference Socio-Technical Analysis

Abstract. To provide a measurable representation of social issues in software teams, the research community defined a set of anti-patterns that may lead to the emergence of both social and technical debt, i.e., "community smells". Researchers have investigated community smells from different perspectives; in particular, they have analyzed how product-related aspects of software development, such as architecture and introducing a new language, could influence community smells. However, how technical project characteristics may be in relation to the emergence of community smells is still unknown. Different from those works, we aim to investigate how adopting specific programming languages might influence the socio-technical alignment and congruence of the development community, possibly inducing their overall ability to communicate and collaborate, leading to the emergence of social anti-patterns, i.e., community smells. We studied the relationship between the most used programming languages and the community smells in 100 open-source projects on GitHub. Key results of the study show a low statistical correlation for specific community smells like Prima Donna Effects, Solution Defiance, and Organizational Skirmish, highlighting the fact that for some programming languages, its adoption could not be an indicator of the presence or absence of community smells.

[C85] SEAA 2024

AGORA: An Approach for Generating Acceptance Test Cases from Use Cases.*

50th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Paris, France.

This paper introduces AGORA, an innovative approach that leverages Large Language Models to automate the definition of acceptance test cases from use cases. AGORA consists of two phases that exploit prompt engineering to 1) identify test cases for specific use cases and 2) generate detailed acceptance tests cases. AGORA was evaluated through a con- trolled experiment involving industry professionals, comparing the effectiveness and efficiency of the proposed approach with the manual method.  Download PDF

Conference Software Testing Empirical Software Engineering G. De Vito, G. Vassallo, F. Palomba, F. Ferrucci.

AGORA: An Approach for Generating Acceptance Test Cases from Use Cases.*

G. De Vito, G. Vassallo, F. Palomba, F. Ferrucci. Conference Socio-Technical Analysis

Abstract. This paper introduces AGORA, an innovative approach that leverages Large Language Models to automate the definition of acceptance test cases from use cases. AGORA consists of two phases that exploit prompt engineering to 1) identify test cases for specific use cases and 2) generate detailed acceptance tests cases. AGORA was evaluated through a controlled experiment involving industry professionals, comparing the effectiveness and efficiency of the proposed approach with the manual method. The results showed that AGORA can generate acceptance test cases with a quality comparable to that obtained manually but improving the process efficiency by over 90% in a fraction of the time. Furthermore, user feedback indicated high satisfaction with using the proposed approach. These findings underscore the potential of AGORA as a tool to enhance the efficiency and quality of the software testing process.

[C84] UKDE 2024

Collecting and Implementing Ethical Guidelines for Emotion Recognition in an Educational Metaverse.*

International Conference on User-Centered Practices of Knowledge Discovery in Educational Data (UKDE 2024), Cagliari, Italy, 2024.

The metaverse represents a persistent, online 3D universe where people can interact, socialize, and work toward common goals. Education represents a key application domain, as it has the potential to enhance experiential learning and collaboration between learners and between learners and educators. However, challenges to the widespread adoption of educational metaverses persist. This paper focuses on emotional isolation, i.e., the feeling of emotional disconnection or loneliness, which can hinder learners' motivation and participation. Machine learning-enabled emotional recognition systems have the potential to address this challenge, offering educators with feedback on the emotional states of learners within the metaverse. Yet, the integration of emotion recognition systems raises ethical concerns regarding consent, privacy, and algorithmic bias.  Download PDF

Conference Socio-Technical Analysis D. Di Dario, V. Pentangelo, M. Colella, F. Palomba, C. Gravino.

Collecting and Implementing Ethical Guidelines for Emotion Recognition in an Educational Metaverse.*

D. Di Dario, V. Pentangelo, M. Colella, F. Palomba, C. Gravino. Conference Socio-Technical Analysis

Abstract. The metaverse represents a persistent, online 3D universe where people can interact, socialize, and work toward common goals. Education represents a key application domain, as it has the potential to enhance experiential learning and collaboration between learners and between learners and educators. However, challenges to the widespread adoption of educational metaverses persist. This paper focuses on emotional isolation, i.e., the feeling of emotional disconnection or loneliness, which can hinder learners' motivation and participation. Machine learning-enabled emotional recognition systems have the potential to address this challenge, offering educators with feedback on the emotional states of learners within the metaverse. Yet, the integration of emotion recognition systems raises ethical concerns regarding consent, privacy, and algorithmic bias. In this paper, we first conduct a literature review on the ethical considerations surrounding the deployment of emotion recognition technology within educational metaverses. Then, we report on the implementation of these guidelines within SENEM, an educational metaverse platform available in the literature. Through this research, we aim to contribute to the responsible deployment of emotion recognition technology in educational settings, ultimately fostering a supportive and inclusive learning environment for all learners.

[C83] FORGE 2024

Is Attention All You Need? Toward a Conceptual Model for Social Awareness in Large Language Models.*

International Conference on AI Foundation Models and Software Engineering (FORGE 2024), Lisbon, Portugal, 2024.

Large Language Models (LLMs) are revolutionizing the landscape of Artificial Intelligence (AI) due to recent technological breakthroughs. Their remarkable success in aiding various Software Engineering (SE) tasks through AI-powered tools and assistants has led to the integration of LLMs as active contributors within development teams, ushering in novel modes of communication and collaboration. However, great power comes with great responsibility: ensuring that these models meet fundamental ethical principles such as fairness is still an open challenge.  Download PDF

Conference Socio-Technical Analysis G. Voria, G. Catolino, F. Palomba.

Is Attention All You Need? Toward a Conceptual Model for Social Awareness in Large Language Models.*

G. Voria, G. Catolino, F. Palomba. Conference Socio-Technical Analysis

Abstract. Large Language Models (LLMs) are revolutionizing the landscape of Artificial Intelligence (AI) due to recent technological breakthroughs. Their remarkable success in aiding various Software Engineering (SE) tasks through AI-powered tools and assistants has led to the integration of LLMs as active contributors within development teams, ushering in novel modes of communication and collaboration. However, great power comes with great responsibility: ensuring that these models meet fundamental ethical principles such as fairness is still an open challenge. In this light, our vision paper analyzes the existing body of knowledge to propose a conceptual model designed to frame ethical, social, and cultural considerations that researchers and practitioners should consider when defining, employing, and validating LLM-based approaches for software engineering tasks.

[C82] CAIN 2024

Unmasking Data Secrets: An Empirical Investigation into Data Smells and Their Impact on Data Quality.*

International Conference on AI Engineering – Software Engineering for AI (CAIN 2024), Lisbon, Portugal, 2024.

Artificial Intelligence (AI) is rapidly advancing with a data-centered approach suitable for various domains. Nevertheless, AI faces significant challenges, particularly in data quality. Data collection from diverse sources can introduce quality issues that may threaten the development of AI-enabled systems. A growing concern in this context is the emergence of data smells – issues specific to the data used in building AI models, which can have long-term consequences. In this paper, we aim at enlarging the current body of knowledge on data smells, by proposing a two-step investigation into the matter.  Download PDF

Conference Software Quality Empirical Software Engineering G. Recupito, R. Rapacciuolo, D. Di Nucci, F. Palomba.

Unmasking Data Secrets: An Empirical Investigation into Data Smells and Their Impact on Data Quality.*

G. Recupito, R. Rapacciuolo, D. Di Nucci, F. Palomba. Conference Software Quality Empirical Software Engineering

Abstract. Artificial Intelligence (AI) is rapidly advancing with a data-centered approach suitable for various domains. Nevertheless, AI faces significant challenges, particularly in data quality. Data collection from diverse sources can introduce quality issues that may threaten the development of AI-enabled systems. A growing concern in this context is the emergence of data smells – issues specific to the data used in building AI models, which can have long-term consequences. In this paper, we aim at enlarging the current body of knowledge on data smells, by proposing a two-step investigation into the matter. First, we updated an existing literature review in an effort of cataloguing the currently existing data smells and the tools to detect them. Afterward, we assess the prevalence of data smells and their correlation with data quality metrics. We identify a novel set composed of 12 data smells distributed across three additional categories. Secondly, we observe that the correlation between data smells and data quality is notably impactful, exhibiting a pronounced and substantial effect, especially in highly diffused data smell instances. This research sheds light on the complex relationship between data smells and data quality, providing valuable insights into the challenges of maintaining AI-enabled systems.

[C81] ICSE 2024

ReFAIR: Toward a Context-Aware Recommender for Fairness Requirements Engineering.*

IEEE/ACM International Conference on Software Engineering (ICSE 2024), Lisbon, Portugal, 2024.

Machine learning (ML) is increasingly being used as a key component of most software systems, yet serious concerns have been raised about the fairness of ML predictions. Researchers have been proposing novel methods to support the development of fair machine learning solutions. Nonetheless, most of them can only be used in late development stages, e.g., during model training, while there is a lack of methods that may provide practitioners with early fairness analytics enabling the treatment of fairness throughout the development lifecycle. This paper proposes ReFair, a novel context-aware requirements engineering framework that allows to classify sensitive features from User Stories.  Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering C. Ferrara, F. Casillo, C. Gravino, A. De Lucia, F. Palomba.

ReFAIR: Toward a Context-Aware Recommender for Fairness Requirements Engineering.*

C. Ferrara, F. Casillo, C. Gravino, A. De Lucia, F. Palomba. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Machine learning (ML) is increasingly being used as a key component of most software systems, yet serious concerns have been raised about the fairness of ML predictions. Researchers have been proposing novel methods to support the development of fair machine learning solutions. Nonetheless, most of them can only be used in late development stages, e.g., during model training, while there is a lack of methods that may provide practitioners with early fairness analytics enabling the treatment of fairness throughout the development lifecycle. This paper proposes ReFair, a novel context-aware requirements engineering framework that allows to classify sensitive features from User Stories. By exploiting natural language processing and word embedding techniques, our framework first identifies both the use case domain and the machine learning task to be performed in the system being developed; afterward, it recommends which are the context-specific sensitive features to be considered during the implementation. We assess the capabilities of ReFair by experimenting it against a synthetic dataset---which we built as part of our research---composed of 12,401 User Stories related to 34 application domains. Our findings showcase the high accuracy of ReFair, other than highlighting its current limitations.

[C80] ICSE 2024

SERGE – Serious Game for the Education of Risk Management in Software Project Management.*

IEEE/ACM International Conference on Software Engineering (ICSE 2024) - Software Engineering Education and Training Track, Lisbon, Portugal, 2024.

Software Project Management is the systematic and disciplined approach for planning, executing, monitoring, controlling, and closing software development projects. It plays a critical role in the success of software projects and encompasses several processes for ensuring the successful completion of a software project. Among them, risk management emerges as a critical pivot to be able to react to the unpredictable events that often affect software projects. Teaching risk management is vital to equip individuals and organi- zations with the adapted skills to prevent and monitor challenges and potential issues. In this paper, we propose a serious game named Serge, conceived to involve students in learning risk management and improve their skills through gamification and simulation of a real-world application context.  Download PDF

Conference Empirical Software Engineering G. Annunziata, S. Lambiase, F. Palomba, F. Ferrucci.

SERGE – Serious Game for the Education of Risk Management in Software Project Management.*

G. Annunziata, S. Lambiase, F. Palomba, F. Ferrucci. Conference Empirical Software Engineering

Abstract. Software Project Management is the systematic and disciplined approach for planning, executing, monitoring, controlling, and closing software development projects. It plays a critical role in the success of software projects and encompasses several processes for ensuring the successful completion of a software project. Among them, risk management emerges as a critical pivot to be able to react to the unpredictable events that often affect software projects. Teaching risk management is vital to equip individuals and organi- zations with the adapted skills to prevent and monitor challenges and potential issues. In this paper, we propose a serious game named Serge, conceived to involve students in learning risk management and improve their skills through gamification and simulation of a real-world application context. The features for the design of Serge were identified through a literature review. An iterative Game Design Phase was employed to build, test, and refine the design of Serge. Finally, the proposed approach was assessed by conducting a controlled experiment to compare risk management skills acquired through a traditional lecture and using Serge. The results show that adopting a serious game as Serge, able to involve the students actively, can improve the acquisition of risk management skills.

[C79] ICSE 2024

Dealing With Cultural Dispersion: a Novel Theoretical Framework for Software Engineering Research and Practice.*

IEEE/ACM International Conference on Software Engineering (ICSE 2024) - Software Engineering in Society Track, Lisbon, Portugal, 2024.

Software development is fundamentally a team-driven process; researchers in software engineering have identified various human and social factors that can significantly impact it. Culture emerged as a critical element, and the diversity deriving from cultural differences can be highly impactful both positively and negatively. Despite existing knowledge about how culture influences software development, limitations persist. Most importantly, a unified and comprehensive (grounded) theory of how cultural differences influence and are managed in software development has yet to exist.  Download PDF

Conference Empirical Software Engineering S. Lambiase, G. Catolino, B. Della Piana, F. Ferrucci, F. Palomba.

Dealing With Cultural Dispersion: a Novel Theoretical Framework for Software Engineering Research and Practice.*

S. Lambiase, G. Catolino, B. Della Piana, F. Ferrucci, F. Palomba. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Software development is fundamentally a team-driven process; researchers in software engineering have identified various human and social factors that can significantly impact it. Culture emerged as a critical element, and the diversity deriving from cultural differences can be highly impactful both positively and negatively. Despite existing knowledge about how culture influences software development, limitations persist. Most importantly, a unified and comprehensive (grounded) theory of how cultural differences influence and are managed in software development has yet to exist. This lack has two significant consequences: (1) it makes research on culture fragmented, leading to the continual definition of new concepts that do not allow state of the art to advance significantly, and (2) it reduces the ability of the research to be transferred to practitioners since there is no framework designed to be understood and used by them. To address the above-mentioned limitation, this work proposed a theoretical framework of "Dealing With Cultural Dispersion", which focuses on challenges and benefits originating from cultural differences and strategies for dealing with them. Such a framework was developed through a qualitative study using an iterative research approach, including interviews and socio-technical grounded theory for data analysis. The proposed framework was designed to reveal the tangible effects of practitioners' culture in software development, allowing software teams to (1) clearly understand the problem and (2) implement the correct strategy for addressing it. Additionally, researchers can use this framework as a foundation to (deductively) develop a more robust and comprehensive theory in this field.

[C78] SCAM 2023

Automating Test-Specific Refactoring Mining: A Mixed-Method Investigation.*

23rd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), Bogotà, Colombia.

Refactoring is a practice commonly used by developers to restructure the source code without changing its external behavior. Over the last decades, the software engineering research community has been making use of mining software repository techniques to investigate refactoring under multiple perspectives, identifying properties and impact of this practice on source code quality, other than using refactoring data coming from software repositories to build automated recommendation systems. While the current state of the art proposes various automated tools to mine refactoring data, there is still a lack of instruments that may help researchers when mining test-specific refactoring data. The availability of those instruments may enable additional, specialized techniques to support developers while refactoring test code. In this paper, we introduce an approach that extends RefactoringMiner - a well-established refactoring mining tool having high precision and recall scores — and is able to detect seven test-specific refactoring operations.  Download PDF

Conference Software Testing Empirical Software Engineering L. Martins, H. Costa, M. Ribeiro, F. Palomba, I. Machado.

Automating Test-Specific Refactoring Mining: A Mixed-Method Investigation.*

L. Martins, H. Costa, M. Ribeiro, F. Palomba, I. Machado. Conference Software Testing Empirical Software Engineering

Abstract. Refactoring is a practice commonly used by developers to restructure the source code without changing its external behavior. Over the last decades, the software engineering research community has been making use of mining software repository techniques to investigate refactoring under multiple perspectives, identifying properties and impact of this practice on source code quality, other than using refactoring data coming from software repositories to build automated recommendation systems. While the current state of the art proposes various automated tools to mine refactoring data, there is still a lack of instruments that may help researchers when mining test-specific refactoring data. The availability of those instruments may enable additional, specialized techniques to support developers while refactoring test code. In this paper, we introduce an approach that extends RefactoringMiner - a well-established refactoring mining tool having high precision and recall scores — and is able to detect seven test-specific refactoring operations. We perform mixed-method research to assess capabilities and usefulness of the approach. First, we compare the test-specific refactoring data extracted by the approach against an oracle of 375 test-specific refactorings. Second, we engage with 15 software engineering researchers and apply a technology acceptance model to investigate how they would benefit from our approach. The key results of the study show that our approach reaches 100% and 92.5% of precision and recall scores, respectively. In addition, the approach is considered useful and suitable for various research tasks, including the definition of novel learning models able to recommend test-specific refactoring actions.

[C77] MENSURA 2023

Please, Be Realistic! An Empirical Study on the Performance of Vulnerability Prediction Models.*

17th International Conference on Software Process and Product Measurement (MENSURA), Rome, Italy.

Software vulnerabilities are infamous threats to the security of computing systems, and it is vital to detect and correct them before releasing any piece of software to the public. Many approaches for the detection of vulnerabilities have been proposed in the literature; in particular, those leveraging machine learning techniques, i.e., vulnerability prediction models, seem quite promising. However, recent work has warned that most models have only been evaluated in in-vitro settings, under certain assumptions that do not resemble the real scenarios in which such approaches are supposed to be employed. This observation ignites the risk that the encouraging results obtained in previous literature may be not as well convenient in practice.  Download PDF

Conference Software Quality Empirical Software Engineering G. Sellitto, A. Sheykina, F. Palomba, A. De Lucia.

Please, Be Realistic! An Empirical Study on the Performance of Vulnerability Prediction Models.*

G. Sellitto, A. Sheykina, F. Palomba, A. De Lucia. Conference Software Quality Empirical Software Engineering

Abstract. Software vulnerabilities are infamous threats to the security of computing systems, and it is vital to detect and correct them before releasing any piece of software to the public. Many approaches for the detection of vulnerabilities have been proposed in the literature; in particular, those leveraging machine learning techniques, i.e., vulnerability prediction models, seem quite promising. However, recent work has warned that most models have only been evaluated in in-vitro settings, under certain assumptions that do not resemble the real scenarios in which such approaches are supposed to be employed. This observation ignites the risk that the encouraging results obtained in previous literature may be not as well convenient in practice. Recognizing the dangerousness of biased and unrealistic evaluations, we aim to dive deep into the problem, by investigating whether and to what extent vulnerability prediction models' performance changes when measured in realistic settings. To do this, we perform an empirical study evaluating the performance of a vulnerability prediction model, configured with three data balancing techniques, executed at three different degrees of realism, leveraging two datasets. Our findings highlight that the outcome of any measurement strictly depends on the experiment setting, calling researchers to take into account the actuality and applicability in practice of the approaches they propose and evaluate.

[C76] MENSURA 2023

Understanding Developer Practices and Code Smells Diffusion in AI-Enabled Software: A Preliminary Study.*

17th International Conference on Software Process and Product Measurement (MENSURA), Rome, Italy.

To deal with continuous change requests and the strict time-to-market, practitioners and big companies constantly update their software systems to meet users' requirements. This practice force developers to release immature products, neglecting best practices to reduce delivery times. As a possible result, technical debt can arise, i.e., potential design issues that can negatively impact software maintenance and evolution and, in turn, increase both the time-to-market and costs. Code smells, i.e., sub-optimal design decisions identifiable by computing software metrics and providing a general overview of code quality, are common symptoms of technical debt. While previous research focused on code smells primarily considering them in the context of Java, the growing popularity of Python, particularly for developing artificial intelligence (AI)-Enabled systems, calls for additional investigations.  Download PDF

Conference Software Quality Empirical Software Engineering G. Giordano, G. Annunziata, A. De Lucia, F. Palomba.

Understanding Developer Practices and Code Smells Diffusion in AI-Enabled Software: A Preliminary Study.*

G. Giordano, G. Annunziata, A. De Lucia, F. Palomba. Conference Software Quality Empirical Software Engineering

Abstract. To deal with continuous change requests and the strict time-to-market, practitioners and big companies constantly update their software systems to meet users' requirements. This practice force developers to release immature products, neglecting best practices to reduce delivery times. As a possible result, technical debt can arise, i.e., potential design issues that can negatively impact software maintenance and evolution and, in turn, increase both the time-to-market and costs. Code smells, i.e., sub-optimal design decisions identifiable by computing software metrics and providing a general overview of code quality, are common symptoms of technical debt. While previous research focused on code smells primarily considering them in the context of Java, the growing popularity of Python, particularly for developing artificial intelligence (AI)-Enabled systems, calls for additional investigations. This preliminary analysis addresses this gap by exploring the diffusion of Python-specific code smells, and the activities performed by developers that induce the introduction of code smells in their systems. To perform our preliminary investigation, we selected 200 AI-Enabled systems available in the Niche dataset; We extracted 10,611 information on the releases using PyDriller, and PySmell to extract information about code smells. The results reveal several insights: 1) Code smells related to object-oriented principles are rarely detected in Python; 2) Complex List Comprehension is the most prevalent and the most long-alive smell; 3) The main activities that can induce code smells are evolutionary. This study fills a critical gap in the literature by providing empirical evidence on the evolution of code smells in Python-based AI-enabled systems.

[C75] SEAA 2023

The Yin and Yang of Software Quality: On the Relationship between Design Patterns and Code Smells.*

49th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Durres, Albania.

Software reuse is considered the silver bullet of software engineering. It has been largely demonstrated that the proper implementation of design and reuse principles can substantially reduce the effort, time, and costs required to develop software systems. Design patterns are one of the most affirmed techniques for source code reuse. While previous work pointed out their benefits in terms of maintainability and understandability, some seem to raise the opposite concern, suggesting that they can negatively impact code quality from the developers' perspectives. We recognize such discrepancy in the literature, and we aim to fill this gap by investigating whether and how design patterns are related to the emergence of issues compromising code understandability, namely the Complex Class, God Class, and Spaghetti Code smells, which have been also shown to increase the change- and fault-proneness of code.

 Best Paper Award

 Download PDF

Conference Software Quality Empirical Software Engineering G. Giordano, G. Sellitto, A. Sepe, F. Palomba, F. Ferrucci.

The Yin and Yang of Software Quality: On the Relationship between Design Patterns and Code Smells.*

G. Giordano, G. Sellitto, A. Sepe, F. Palomba, F. Ferrucci. Conference Software Quality Empirical Software Engineering

Abstract. Software reuse is considered the silver bullet of software engineering. It has been largely demonstrated that the proper implementation of design and reuse principles can substantially reduce the effort, time, and costs required to develop software systems. Design patterns are one of the most affirmed techniques for source code reuse. While previous work pointed out their benefits in terms of maintainability and understandability, some seem to raise the opposite concern, suggesting that they can negatively impact code quality from the developers' perspectives. We recognize such discrepancy in the literature, and we aim to fill this gap by investigating whether and how design patterns are related to the emergence of issues compromising code understandability, namely the Complex Class, God Class, and Spaghetti Code smells, which have been also shown to increase the change- and fault-proneness of code. We perform an empirical evaluation on 15 Java projects evolving over 542 releases, and we find that, although design patterns are supposed to improve code quality without prejudice, they can be related to dangerous issues, as we observe the emergence of code smells in the classes participating in their implementation. From our findings, we distill a number of implications for developers and project managers to support them in dealing with design patterns.

[C74] SEAA 2023

Toward a Secure Educational Metaverse: A Tail of Blockchain Design for Educational Environments.*

49th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Durres, Albania.

In the era of social distancing, distance learning represents a crucial educational challenge. Several 2D information technologies have been provided, yet these share multiple limitations and have negative social, educational, and psychological implications for learners. Metaverse promises to revolutionize education as we know it: this is a persistent, virtual, three-dimensional environment that is supposed to address most of the limitations of 2D information technologies. Nonetheless, there are still software engineering challenges to face to enable such a metaverse, especially when turning to software security and privacy. In this paper, we aim at performing the first steps toward an improved understanding of the security perspective of educational metaverse, by analyzing how blockchain can be employed within educational environments and how applications may be designed.  Download PDF

Conference Software Quality Empirical Software Engineering D. Di Dario, U. Bilotti, M. Sibilio, C. Gravino, F. Palomba.

Toward a Secure Educational Metaverse: A Tail of Blockchain Design for Educational Environments.*

D. Di Dario, U. Bilotti, M. Sibilio, C. Gravino, F. Palomba. Conference Software Quality Empirical Software Engineering

Abstract. In the era of social distancing, distance learning represents a crucial educational challenge. Several 2D information technologies have been provided, yet these share multiple limitations and have negative social, educational, and psychological implications for learners. Metaverse promises to revolutionize education as we know it: this is a persistent, virtual, three-dimensional environment that is supposed to address most of the limitations of 2D information technologies. Nonetheless, there are still software engineering challenges to face to enable such a metaverse, especially when turning to software security and privacy. In this paper, we aim at performing the first steps toward an improved understanding of the security perspective of educational metaverse, by analyzing how blockchain can be employed within educational environments and how applications may be designed. Our ultimate goal is to provide insights into how blockchain can be further tailored in the context of educational metaverse. We conduct a systematic literature review, which targets 20 primary studies. The key findings of the study showcase the use of blockchain in 3 educational tasks, other than describing the blockchain design approaches, which protocol they commonly use and the associated limitations. We conclude by developing a conceptualization of a blockchain-based educational metaverse.

[C73] SEAA 2023

Meet C4SE: Your New Collaborator for Software Engineering Tasks.*

49th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Durres, Albania.

The software industry has rapidly increased in complexity and scale, leading to challenges in managing in- formation and tasks among developer teams, often resulting in inefficiencies, misunderstandings, and delays. Moreover, the increasing search for automated tasks led to the extensive adoption of chatbots—a.k.a. conversational agents—for software development purposes. However, despite their undoubted positive contributions, practitioners started to identify numerous issues deriving from their adoption, both technical and social, first of which, the uselessness of the provided support due to the bot’s lack of full working context. To address such a limitation, we propose C4SE, a chatbot designed to assist software engineers and managers in performing several tasks.  Download PDF

Conference Software Quality Empirical Software Engineering G. De Vito, S. Lambiase, F. Palomba, F. Ferrucci.

Meet C4SE: Your New Collaborator for Software Engineering Tasks.*

G. De Vito, S. Lambiase, F. Palomba, F. Ferrucci. Conference Software Quality Empirical Software Engineering

Abstract. The software industry has rapidly increased in complexity and scale, leading to challenges in managing in- formation and tasks among developer teams, often resulting in inefficiencies, misunderstandings, and delays. Moreover, the increasing search for automated tasks led to the extensive adoption of chatbots—a.k.a. conversational agents—for software development purposes. However, despite their undoubted positive contributions, practitioners started to identify numerous issues deriving from their adoption, both technical and social, first of which, the uselessness of the provided support due to the bot’s lack of full working context. To address such a limitation, we propose C4SE, a chatbot designed to assist software engineers and managers in performing several tasks. The idea behind the bot is to collect information from the different tasks that could be useful for others to provide better support and tailor the bot to the specific operational context-i.e., the development team using it. To enable such task heterogeneity and contextual persistence, we operationalize the GPT 3.5 model for understanding the user’s intent and a specialized data store based on a vector database for long-term memory for maintaining contextual information. With these characteristics, C4SE can provide benefits to the entire software development lifecycle increasing practitioners' productivity. We presented a prototype of the tool able to perform code suggestions, code reviews, GitHub API operationalization, and unit and acceptance test case generation. A preliminary evaluation was carried out reporting encouraging results.

[C72] SEAA 2023

ECHO: An Approach to Enhance Use Case Quality Exploiting Large Language Models.*

49th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Durres, Albania.

UML use cases are commonly used in software engineering to specify the functional requirements of a system since they are an effective tool for interacting with stakeholders thanks to the use of natural languages. However, producing high- quality use cases can be challenging due to the lack of precise guidelines and suitable tools. This can lead to problems, e.g. inaccuracy and incompleteness, in the derived software artifacts and the final product. Recent advancements in Natural Language Processing and Large Language Models (LLMs) can provide the premises for developing tools supporting activities based on natural languages. In this paper, we propose ECHO, a novel approach for supporting software engineers in enhancing the quality of UML use cases using LLMs.  Download PDF

Conference Empirical Software Engineering G. De Vito, F. Palomba, C. Gravino, S. Di Martino, F. Ferrucci.

ECHO: An Approach to Enhance Use Case Quality Exploiting Large Language Models.*

G. De Vito, F. Palomba, C. Gravino, S. Di Martino, F. Ferrucci. Conference Empirical Software Engineering

Abstract. UML use cases are commonly used in software engineering to specify the functional requirements of a system since they are an effective tool for interacting with stakeholders thanks to the use of natural languages. However, producing high- quality use cases can be challenging due to the lack of precise guidelines and suitable tools. This can lead to problems, e.g. inaccuracy and incompleteness, in the derived software artifacts and the final product. Recent advancements in Natural Language Processing and Large Language Models (LLMs) can provide the premises for developing tools supporting activities based on natural languages. In this paper, we propose ECHO, a novel approach for supporting software engineers in enhancing the quality of UML use cases using LLMs. Our approach consists of a co-prompt engineering approach and an iterative and interactive process with the LLM to improve the quality of use cases, based on practitioners’ feedback. To prove the feasibility of the proposal, we instantiated the approach using ChatGPT and performed a controlled experiment to assess its effectiveness by involving seven software engineering professionals. Three were part of the experimental group and used ECHO to improve the quality of the use cases. Three others were the control group and enhanced the quality of use cases manually. Finally, the last participant acted as an oracle, blind w.r.t. the groups, and evaluated the quality of the enhanced use cases, both qualitatively by means of a questionnaire, and quantitatively, by means of the Use Case Points metric. Results show that ECHO can effectively support software engineers to improve use cases’ quality thanks to the prompts suitably designed to interact with ChaGPT.

[C71] SEAA 2023

Security Testing in the Wild: An Interview Study.*

49th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Durres, Albania.

Modern software systems are increasingly complex and the risk of falling into security concerns is high if these systems are not developed with a proper security mindset. Despite the empirical studies and security-oriented approaches proposed by researchers and tool vendors, we still point out a lack of knowledge on the security testing processes applied by companies to reduce risks connected to software security. In this paper, we aim to bridge this gap of knowledge by performing an interview-based study with 19 security experts to understand how companies arrange security testing and how the process of security testing is actually performed in practice.  Download PDF

Conference Software Testing Empirical Software Engineering D. Di Dario, V. Pontillo, S. Lambiase, F. Ferrucci, F. Palomba.

Security Testing in the Wild: An Interview Study.*

D. Di Dario, V. Pontillo, S. Lambiase, F. Ferrucci, F. Palomba. Conference Software Testing Empirical Software Engineering

Abstract. Modern software systems are increasingly complex and the risk of falling into security concerns is high if these systems are not developed with a proper security mindset. Despite the empirical studies and security-oriented approaches proposed by researchers and tool vendors, we still point out a lack of knowledge on the security testing processes applied by companies to reduce risks connected to software security. In this paper, we aim to bridge this gap of knowledge by performing an interview-based study with 19 security experts to understand how companies arrange security testing and how the process of security testing is actually performed in practice. Our results highlight that some companies incorporated the figure of the security tester in the software life cycle, yet practitioners reported a lack of standardized guidelines for security testing. From a management perspective, our results suggest that the introduction of formal communication between development and security testing teams may lead to better performance.

[C70] SEAA 2022

"There and Back Again?" On the Influence of Software Community Dispersion Over Productivity.*

48th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Gran Canaria, Spain.

Estimating and understanding productivity still represents a crucial task for researchers and practitioners. Researchers spent significant effort identifying the factors that influence software developers' productivity, providing several approaches for analyzing and predicting such a metric. Although different works focused on evaluating the impact of human factors on productivity, little is known about the influence of cultural/geographical diversity in software development communities.

 Best Paper Award

 Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering S. Lambiase, G. Catolino, F. Pecorelli, D. Tamburri, F. Palomba, W.J. van den Heuvel, F. Ferrucci.

"There and Back Again?" On the Influence of Software Community Dispersion Over Productivity.*

S. Lambiase, G. Catolino, F. Pecorelli, D. Tamburri, F. Palomba, W.J. van den Heuvel, F. Ferrucci. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Estimating and understanding productivity still represents a crucial task for researchers and practitioners. Researchers spent significant effort identifying the factors that influence software developers' productivity, providing several approaches for analyzing and predicting such a metric. Although different works focused on evaluating the impact of human factors on productivity, little is known about the influence of cultural/geographical diversity in software development communities. Indeed, in previous studies, researchers treated cultural aspects like an abstract concept without providing a quantitative representation. This work provides an empirical assessment of the relationship between cultural and geographical dispersion of a development community---namely, how diverse a community is in terms of cultural attitudes and geographical collocation of the members who belong to it---and its productivity. To reach our aim, we built a statistical model that contained product and socio-technical factors as independent variables to assess the correlation with productivity, i.e., the number of commits performed in a given time. Then, we ran our model considering data of 25 open-source communities on GitHub. Results of our study indicate that cultural and geographical dispersion impact productivity, thus encouraging managers and practitioners to consider such aspects during all the phases of the software development lifecycle.

[C69] SEAA 2022

A Multivocal Literature Review of MLOps Tools and Features.*

48th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Gran Canaria, Spain.

DevOps has become increasingly widespread, with companies employing its methods in different fields. In this context, MLOps automates Machine Learning pipelines by applying DevOps practices. Considering the high number of tools available and the high interest of the practitioners to be supported by tools to automate the steps of Machine Learning pipelines, little is known concerning MLOps tools and their functionalities.  Download PDF

Conference Software Quality Empirical Software Engineering G. Recupito, F. Pecorelli, G. Catolino, S. Moreschini, D. Di Nucci, F. Palomba, D. Tamburri.

A Multivocal Literature Review of MLOps Tools and Features.*

G. Recupito, F. Pecorelli, G. Catolino, S. Moreschini, D. Di Nucci, F. Palomba, D. Tamburri. Conference Software Quality Empirical Software Engineering

Abstract. DevOps has become increasingly widespread, with companies employing its methods in different fields. In this context, MLOps automates Machine Learning pipelines by applying DevOps practices. Considering the high number of tools available and the high interest of the practitioners to be supported by tools to automate the steps of Machine Learning pipelines, little is known concerning MLOps tools and their functionalities. To this aim, we conducted a Multivocal Literature Review (MLR) to (i) extract tools that allow for and support the creation of MLOps pipelines and (ii) analyze their main characteristics and features to provide a comprehensive overview of their value. Overall, we investigate the functionalities of 13 MLOps Tools. Our results show that most MLOps tools support the same features but apply different approaches that can bring different advantages, depending on user requirements.

[C68] SEAA 2022

A Preliminary Conceptualization and Analysis on Automated Static Analysis Tools for Vulnerability Detection in Android Apps.*

48th Euromicro Conference Series on Software Engineering and Advanced Applications (SEAA), Gran Canaria, Spain.

The availability of dependable mobile apps is a crucial need for over three billion people who use apps daily for any social and emergency connectivity. A key challenge for mobile developers concerns the detection of security-related issues. While a number of tools have been proposed over the years—especially for the Android operating system — we point out a lack of empirical investigations on the actual support provided by these tools; these might guide developers in selecting the most appropriate instruments to improve their apps.  Download PDF

Conference Software Quality Empirical Software Engineering G. Giordano, F. Palomba, F. Ferrucci.

A Preliminary Conceptualization and Analysis on Automated Static Analysis Tools for Vulnerability Detection in Android Apps.*

G. Giordano, F. Palomba, F. Ferrucci. Conference Software Quality Empirical Software Engineering

Abstract. The availability of dependable mobile apps is a crucial need for over three billion people who use apps daily for any social and emergency connectivity. A key challenge for mobile developers concerns the detection of security-related issues. While a number of tools have been proposed over the years—especially for the Android operating system — we point out a lack of empirical investigations on the actual support provided by these tools; these might guide developers in selecting the most appropriate instruments to improve their apps. In this paper, we propose a preliminary conceptualization of the vulnerabilities detected by three automated static analysis tools such as AndroBugs2, Trueseeing, and Insider. We first derive a taxonomy of the issues detectable by the tools. Then, we run the tools against a dataset composed of 6,500 Android apps to investigate their detection capabilities in terms of frequency of detection of vulnerabilities and complementarity among tools. Key findings of the study show that current tools identify similar concerns, but they use different naming conventions. Perhaps more importantly, the tools only partially cover the most common vulnerabilities classified by the Open Web Application Security Project (OWASP) Foundation.

[C67] HCII 2022

AI-based Emotion Recognition to Study Users' Perception of Dark Patterns.*

24th International Conference on Human-Computer Interaction (HCII 2022), Virtual, 2022.

Dark Patterns are design patterns used to trick users into acting against their real interest. The web provides an infinite number of services accessible to anyone, which do not always promote a good user experience and are often structured with the aim of leading the user to perform unwanted actions or discourage him from making decisions that could damage the company. This is a very common practice, especially in neuromarketing. Human behavioral and perceptual patterns are cleverly exploited to achieve a specific goal. For this reason, dark pattern developers try to create an environment that invites as much purchase as possible by stimulating the customer's unconscious.  Download PDF

Conference Computer-Human Interaction S. Avolicino, M. Di Gregorio, M. Romano, G. Vitiello, F. Palomba, M. Sebillo.

AI-based Emotion Recognition to Study Users' Perception of Dark Patterns.*

S. Avolicino, M. Di Gregorio, M. Romano, G. Vitiello, F. Palomba, M. Sebillo. Conference Computer-Human Interaction

Abstract. Dark Patterns are design patterns used to trick users into acting against their real interest. The web provides an infinite number of services accessible to anyone, which do not always promote a good user experience and are often structured with the aim of leading the user to perform unwanted actions or discourage him from making decisions that could damage the company. This is a very common practice, especially in neuromarketing. Human behavioral and perceptual patterns are cleverly exploited to achieve a specific goal. For this reason, dark pattern developers try to create an environment that invites as much purchase as possible by stimulating the customer's unconscious. Among the areas in which these strategies are adopted is tourism: online travel agency websites promote "fake discounts" for the products/services they are selling, display inaccurate pricing information leading to incorrect pricing assumptions, thus misleading consumers. One of the goals of this work is to identify which dark patterns are most used in online travel agencies and once identified, they will be used to run scenarios that will simulate booking a vacation online. During the execution of the tests, users will be filmed via webcam track- ing their expressions and emotions through AI-based facial recognition. Finally, the data obtained from the tests will be analyzed to study the emotions and feel- ings that a user feels when he/she is confronted with dark patterns, to under- stand which users are more at risk and which are the types of dark patterns to which they are more vulnerable.

[C66] GECCO 2022

A Bi-level Evolutionary Approach for the Multi-label Detection of Smelly Classes.*

The Genetic and Evolutionary Computation Conference (GECCO 2022), Boston, USA, 2022.

This paper presents a new evolutionary method and tool called BMLDS (Bi-level Multi-Label Detection of Smells) that optimizes a population of classifier chains for the multi-label detection of smells. As the chain is sensitive to the labels' (i.e., smell types) order, the chains induction task is framed as a bi-level optimization problem, where the upper-level role is to search for the optimal order of each considered chain while the lower-level one is to generate the chains. This allows taking into consideration the interactions between smells in the multi-label detection process. The statistical analysis of the experimental results reveals the merits of our proposal with respect to several existing works.  Download PDF

Conference Software Quality Empirical Software Engineering S. Boutaib, M. Elarbi, S. Bechikh, F. Palomba, L. Ben Said.

A Bi-level Evolutionary Approach for the Multi-label Detection of Smelly Classes.*

S. Boutaib, M. Elarbi, S. Bechikh, F. Palomba, L. Ben Said. Conference Software Quality Empirical Software Engineering

Abstract. This paper presents a new evolutionary method and tool called BMLDS (Bi-level Multi-Label Detection of Smells) that optimizes a population of classifier chains for the multi-label detection of smells. As the chain is sensitive to the labels' (i.e., smell types) order, the chains induction task is framed as a bi-level optimization problem, where the upper-level role is to search for the optimal order of each considered chain while the lower-level one is to generate the chains. This allows taking into consideration the interactions between smells in the multi-label detection process. The statistical analysis of the experimental results reveals the merits of our proposal with respect to several existing works.

[C65] CHASE 2022

A Preliminary Study on the Assignment of GitHub Issues to Issue Commenters and the Relationship with Social Smells.*

International Conference on Cooperative and Human Aspects of Software Engineering (CHASE 2022), Pittsburgh, USA, 2022.

GitHub is the world's largest software hosting plat- form. Its features affect millions of developers. Investigating the impact of GitHub features on software teams is essential to gain insights into features' usefulness. As a preliminary step in this direction, this paper explores the relationship between the use of one GitHub feature and the social structure of the projects that adopt the feature. We explore whether the feature is used and whether the feature is associated with positive or negative changes in the team’s social structure.  Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering H. Mumtaz, C. Paradis, F. Palomba, D. Tamburri, R. Kazman, K. Blincoe.

A Preliminary Study on the Assignment of GitHub Issues to Issue Commenters and the Relationship with Social Smells.*

H. Mumtaz, C. Paradis, F. Palomba, D. Tamburri, R. Kazman, K. Blincoe. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. GitHub is the world's largest software hosting platform. Its features affect millions of developers. Investigating the impact of GitHub features on software teams is essential to gain insights into features' usefulness. As a preliminary step in this direction, this paper explores the relationship between the use of one GitHub feature and the social structure of the projects that adopt the feature. We explore whether the feature is used and whether the feature is associated with positive or negative changes in the team’s social structure. In this paper, we report on a preliminary study of 13 projects that used the GitHub "assign issues to issue commenters" feature. We examine the social smells in the software teams before and after the introduction of this new feature using statistical and temporal analysis. Our results indicate that the usage of this feature varied across the analyzed projects. We also find that social smells that reflect low or missing communications (Organizational Silo and Missing Links) decrease in most of the projects that used the feature consistently. The results suggest that the social structure of the teams has a positive relationship with the feature adoption. Still, future research should study the feature’s impact (and its use cases) on other aspects and over longer time periods to learn its diverse and long-term benefits on the social structure of software projects.

[C64] ICPC 2022

Regularity or Anomaly? On The Use of Anomaly Detection for Fine-Grained Just-in-Time Defect Prediction.*

IEEE/ACM International Conference on Program Comprehension (ICPC 2022), Pittsburgh, USA, 2022.

Fine-grained just-in-time defect prediction aims at identifying likely defective files within new commits pushed by developers onto a shared repository. Most of the techniques proposed in literature are based on supervised learning, where machine learning algorithms are fed with historical data. One of the limitations of these techniques is concerned with the use of imbalanced data that only contain a few defective samples to enable a proper learning phase. To overcome this problem, recent work has shown that anomaly detection methods can be used as an alternative to supervised learning, given that these do not necessarily need labelled samples.  Download PDF

Conference Software Quality Empirical Software Engineering F. Lomio, L. Pascarella, F. Palomba, V. Lenarduzzi.

Regularity or Anomaly? On The Use of Anomaly Detection for Fine-Grained Just-in-Time Defect Prediction.*

F. Lomio, L. Pascarella, F. Palomba, V. Lenarduzzi. Conference Software Quality Empirical Software Engineering

Abstract. Fine-grained just-in-time defect prediction aims at identifying likely defective files within new commits pushed by developers onto a shared repository. Most of the techniques proposed in literature are based on supervised learning, where machine learning algorithms are fed with historical data. One of the limitations of these techniques is concerned with the use of imbalanced data that only contain a few defective samples to enable a proper learning phase. To overcome this problem, recent work has shown that anomaly detection methods can be used as an alternative to supervised learning, given that these do not necessarily need labelled samples. We aim at assessing how anomaly detection methods can be employed for the problem of fine-grained just-in-time defect prediction. We conduct an empirical investigation on 32 open-source projects, designing and evaluating three anomaly detection methods for fine-grained just-in-time defect prediction. However, our results are negative because anomaly detection methods, taken alone, do not overcome the prediction performance of existing machine learning solutions.

[C63] ICSE 2022

Good Fences Make Good Neighbours? On the Impact of Cultural and Geographical Dispersion on Community Smells.*

IEEE/ACM International Conference on Software Engineering (ICSE 2022) - Software Engineering in Society Track, Pittsburgh, USA, 2022.

Software development is de facto a social activity that often involves people from all places to join forces globally. In such common instances, project managers must face social challenges, e.g., personality conflicts and language barriers, which often amount literally to "culture shock". In this paper, we seek to analyze and illustrate how cultural and geographical dispersion—that is, how much a community is diverse in terms of its members' cultural attitudes and geographical collocation—influence the emergence of collaboration and communication problems in open-source communities, a.k.a. community smells, the socio-technical precursors of unforeseen, often nasty organizational conditions amounting collectively to the phenomenon called social debt.  Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering S. Lambiase, G. Catolino, D. Tamburri, A. Serebrenik, F. Palomba, F. Ferrucci.

Good Fences Make Good Neighbours? On the Impact of Cultural and Geographical Dispersion on Community Smells.*

S. Lambiase, G. Catolino, D. Tamburri, A. Serebrenik, F. Palomba, F. Ferrucci. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Software development is de facto a social activity that often involves people from all places to join forces globally. In such common instances, project managers must face social challenges, e.g., personality conflicts and language barriers, which often amount literally to "culture shock". In this paper, we seek to analyze and illustrate how cultural and geographical dispersion—that is, how much a community is diverse in terms of its members' cultural attitudes and geographical collocation—influence the emergence of collaboration and communication problems in open-source communities, a.k.a. community smells, the socio-technical precursors of unforeseen, often nasty organizational conditions amounting collectively to the phenomenon called social debt. We perform an extensive empirical study on cultural characteristics of GitHub developers, and build a regression model relating the two types of dispersion—cultural and geographical—with the emergence of four types of commu- nity smells, i.e., Organizational Silo, Lone Wolf, Radio Silence, and Black Cloud. Results indicate that cultural and geographical factors influence collaboration and communication within open-source communities, to an extent which incites—or even more interest- ingly mitigates, in some cases—community smells, e.g., Lone Wolf, in development teams. Managers can use these findings to address their own organizational structure and tentatively diagnose any nasty phenomena related to the conditions under study.

[C62] SANER 2022

Toward Understanding the Impact of Refactoring on Program Comprehension.*

IEEE International Conference on Software Analysis, Engineering, and Reengineering, Honolulu, Hawaii, USA, 2022.

Software refactoring is the activity associated with developers changing the internal structure of source code without modifying its external behavior. The literature argues that refactoring might have beneficial and harmful implications for software maintainability, primarily when performed without the support of automated tools. This paper continues the narrative on the effects of refactoring by exploring the dimension of program comprehension.

 IEEE/TCSE Distinguished Paper Award

 Download PDF

Conference Software Quality Empirical Software Engineering G. Sellitto, E. Iannone, Z. Codabux, V. Lenarduzzi, A. De Lucia, F. Palomba, F. Ferrucci

Toward Understanding the Impact of Refactoring on Program Comprehension.*

G. Sellitto, E. Iannone, Z. Codabux, V. Lenarduzzi, A. De Lucia, F. Palomba, F. Ferrucci. Conference Software Quality Empirical Software Engineering

Abstract. Software refactoring is the activity associated with developers changing the internal structure of source code without modifying its external behavior. The literature argues that refactoring might have beneficial and harmful implications for software maintainability, primarily when performed without the support of automated tools. This paper continues the narrative on the effects of refactoring by exploring the dimension of program comprehension, namely the property that describes how easy it is for developers to understand source code. We start our investigation by assessing the basic unit of program comprehension, namely program readability. Next, we set up a large-scale empirical investigation – conducted on 156 open-source projects – to quantify the impact of refactoring on program readability. First, we mine refactoring data and, for each commit involving a refactoring, we compute (i) the amount and type(s) of refactoring actions performed and (ii) eight state-of-the-art program comprehension metrics. Afterwards, we build statistical models relating the various refactoring operations to each of the readability metrics considered to quantify the extent to which each refactoring impacts the metrics in either a positive or negative manner. The key results are that refactoring has a notable impact on most of the readability metrics considered.

[C61] SANER 2022

On the Evolution of Inheritance and Delegation Mechanisms and Their Impact on Code Quality.*

IEEE International Conference on Software Analysis, Engineering, and Reengineering, Honolulu, Hawaii, USA, 2022.

Source code reuse is considered one of the holy grails of modern software development. Indeed, it has been widely demonstrated that this activity decreases software development and maintenance costs while increasing its overall trustwor- thiness. The Object-Oriented (OO) paradigm provides differ- ent internal mechanisms to favor code reuse, i.e., specification inheritance, implementation inheritance, and delegation.  Download PDF

Conference Software Quality Empirical Software Engineering G. Giordano, A. Fasulo, G. Catolino, F. Palomba, F. Ferrucci, C. Gravino

On the Evolution of Inheritance and Delegation Mechanisms and Their Impact on Code Quality.*

G. Giordano, A. Fasulo, G. Catolino, F. Palomba, F. Ferrucci, C. Gravino. Conference Software Quality Empirical Software Engineering

Abstract. Source code reuse is considered one of the holy grails of modern software development. Indeed, it has been widely demonstrated that this activity decreases software development and maintenance costs while increasing its overall trustworthiness. The Object-Oriented (OO) paradigm provides different internal mechanisms to favor code reuse, i.e., specification inheritance, implementation inheritance, and delegation. While previous studies investigated how inheritance relations impact source code quality, there is still a lack of understanding of their evolutionary aspects and, more particular, of how these mechanisms may impact source code quality over time. To bridge this gap of knowledge, this paper proposes an empirical investigation into the evolution of specification inheritance, implementation inheritance, and delegation and their impact on the variability of source code quality attributes. First, we assess how the implementation of those mechanisms varies over 15 releases of three software systems. Second, we devise a statistical approach with the aim of understanding how inheritance and delegation let source code quality—as indicated by the severity of code smells—vary in either positive or negative manner. The key results of the study indicate that inheritance and delegation evolve over time, but not in a statistically significant manner. At the same time, their evolution often leads code smell severity to be reduced, hence possibly contributing to improve code maintainability.

[C60] SANER 2022

Gender Diversity and Community Smells: A Double-Replication Study on Brazilian Software Teams.*

IEEE International Conference on Software Analysis, Engineering, and Reengineering, Honolulu, Hawaii, USA, 2022.

Social debts in software teams are gaining increasing attention from the research community due to their potential adverse effects on software quality. For instance, community smells are indicators of sub-optimal organizational structures and may well lead to the emergence of social debt. Previous studies analyzed which factors influence the emergence/mitigation of such smells. In particular, studies by Catolino et al. showed how factors related to team composition, particularly gender diversity, correlated to the mitigation of community smells.  Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering C. Sarmento, T. Massoni, A. Serebrenik, G. Catolino, D. Tamburri, F. Palomba.

Gender Diversity and Community Smells: A Double-Replication Study on Brazilian Software Teams.*

C. Sarmento, T. Massoni, A. Serebrenik, G. Catolino, D. Tamburri, F. Palomba. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Social debts in software teams are gaining increasing attention from the research community due to their potential adverse effects on software quality. For instance, community smells are indicators of sub-optimal organizational structures and may well lead to the emergence of social debt. Previous studies analyzed which factors influence the emergence/mitigation of such smells. In particular, studies by Catolino et al. showed how factors related to team composition, particularly gender diversity, correlated to the mitigation of community smells. However, a confirmation survey on 60 practitioners suggested that these results were not aligned with the experts’ perceptions. In addition, in a separate survey, Catolino et al. collected the most common team refactoring strategies for those community smells. In this work we replicate two studies by those authors, focusing on the Brazilian software teams; culture-specific expectations on the behavior of people of different genders might have affected the perception of the importance of gender diversity and refactoring strategies when mitigating community smells. We translated the survey instrument used by Catolino et al. to Brazilian Portuguese and recruited 184 Brazilian developers. Results did not show significant differences from the original study; indeed, participants perceived gender diversity as less valuable to mitigate community smells than such factors like experience or team size. Additionally, we performed a qualitative analysis of an open question within the questionnaire for the refactoring strategies. Brazilian developers agree with the original studies for most smells, mainly promoting restructuring communities, creating a communication plan and mentoring. We believe these results provide further evidence on the problem and its implications when managing software teams, avoiding technical debt and maintenance issues due to team communication and coordination problems.

[C59] QRS 2021

A Possibilistic Evolutionary Approach to Handle the Uncertainty of Software Metrics Thresholds in Code Smells Detection.*

IEEE International Conference on Software Quality, Reliability, and Security, Hainan Island, China, 2021.

A code smells detection rule is a combination of metrics with their corresponding crisp thresholds and labels. The goal of this paper is to deal with metrics' thresholds uncertainty; as usual, such thresholds could not be exactly determined to judge the smelliness of a particular software class. To deal with this issue, we first propose to encode each metric value into a binary possibility distribution with respect to a threshold computed from a discretization technique; using the Possibilistic C-means classifier.  Download PDF

Conference Software Quality Empirical Software Engineering S. Boutaib, M. Elarbi, S. Bechikh, F. Palomba, L. Ben Said.

A Possibilistic Evolutionary Approach to Handle the Uncertainty of Software Metrics Thresholds in Code Smells Detection.*

S. Boutaib, M. Elarbi, S. Bechikh, F. Palomba, L. Ben Said. Conference Software Quality Empirical Software Engineering

Abstract. A code smells detection rule is a combination of metrics with their corresponding crisp thresholds and labels. The goal of this paper is to deal with metrics' thresholds uncertainty; as usual, such thresholds could not be exactly determined to judge the smelliness of a particular software class. To deal with this issue, we first propose to encode each metric value into a binary possibility distribution with respect to a threshold computed from a discretization technique; using the Possibilistic C-means classifier. Then, we propose ADIPOK-UMT as an evolutionary algorithm that evolves a population of PK-NN classifiers for the detection of smells under thresholds' uncertainty. The experimental results reveal that the possibility distribution-based encoding allows the implicit weighting of software metrics (features) with respect to their computed discretization thresholds. Moreover, ADIPOK-UMT is shown to outperform four relevant state-of-art approaches on a set of commonly adopted benchmark software systems.

[C58] ICSE 2021

Understanding Community Smells Variability: A Statistical Approach.*

IEEE/ACM International Conference on Software Engineering (ICSE 2021) - Software Engineering in Society Track, Madrid, Spain, 2021.

Social debt has been defined as the presence in a project of costly sub-optimal organizational conditions, e.g., non-cohesive development communities whose members have communication or coordination issues. Community smells are indicators of such sub-optimal organizational structures and may well lead to social debt.  Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering G. Catolino, F. Palomba, D. Tamburri, A. Serebrenik.

Understanding Community Smells Variability: A Statistical Approach.*

G. Catolino, F. Palomba, D. Tamburri, A. Serebrenik. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Social debt has been defined as the presence in a project of costly sub-optimal organizational conditions, e.g., non-cohesive development communities whose members have communication or coordination issues. Community smells are indicators of such sub-optimal organizational structures and may well lead to social debt. Recently, several studies analyzed actors affecting presence of community smells and their harmfulness, or proposed refactoring strategies to mitigate them. However, to the best of our knowledge, there is still a limited understanding of the factors influencing the variability of community smells, namely how they increase/decrease in magnitude over time. In this paper, we aim at conducting the first statistical experimentation on the matter, by analyzing how a set of 40 socio-technical factors, e.g., turnover or communicability, impact the variability of four community smells on a dataset composed of 60 open-source communities. The results of the study reveal that communicability is, in most cases, important to reduce the risk of an increase of community smell instances, while broadening the collaboration network does not always have a positive effect.

[C57] ESEC/FSE 2020

tsDetect: An Open Source Test Smells Detection Tool.*

ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Sacramento (California), USA, 2020.

The test code, just like production source code, is subject to bad design and programming practices, also known as smells. The presence of test smells in a software project may affect the quality, maintainability, and extendability of test suites making them less effective in finding potential faults and quality issues in the project's production code.  Download PDF

Conference Software Testing Software Quality A. Peruma, K. Almalki, C. Newman, M. Mkaouer, A. Ouni, F. Palomba.
A. Peruma, K. Almalki, C. Newman, M. Mkaouer, A. Ouni, F. Palomba. Conference Software Testing Empirical Software Engineering

Abstract. The test code, just like production source code, is subject to bad design and programming practices, also known as smells. The presence of test smells in a software project may affect the quality, maintainability, and extendability of test suites making them less effective in finding potential faults and quality issues in the project's production code. In this paper, we introduce tsDetect, an automated test smell detection tool for Java software systems that uses a set of detection rules to locate existing test smells in test code. We evaluate the effectiveness of tsDetect on a benchmark of 65 unit test files containing instances of 19 test smell types. Results show that tsDetect achieves a high detection accuracy with an average precision score of 96% and an average recall score of 97%. tsDetect is publicly available, with a demo video, at: https://testsmells.github.io/

[C56] ICSME 2020

Pizza versus Pinsa: On the Perception and Measurability of Unit Test Code Quality.*

IEEE International Conference on Software Maintenance and Evolution, Adelaide, Australia, 2020.

Test cases are an essential asset to evaluate software quality. The research community has provided various alternatives to help developers assessing the quality of tests, like code or mutation coverage. Despite the effort spent so far, however, little is known on how practitioners perceive unit test code quality and whether the existing metrics reflect their perception.  Download PDF

Conference Software Testing Empirical Software Engineering G. Grano, C. De Iaco, F. Palomba, H. Gall.
G. Grano, C. De Iaco, F. Palomba, H. Gall. Conference Software Testing Empirical Software Engineering

Abstract. Test cases are an essential asset to evaluate software quality. The research community has provided various alternatives to help developers assessing the quality of tests, like code or mutation coverage. Despite the effort spent so far, however, little is known on how practitioners perceive unit test code quality and whether the existing metrics reflect their perception. This paper aims at addressing this gap of knowledge. We first conduct semi-structured interviews and surveys with practitioners to establish a taxonomy of relevant factors for unit test quality and collect a dataset of tests rated by developers based on their perceived quality. Then, we devise a statistical model to measure how the metrics available in literature reflect the perceived quality of test cases. The findings of our study show that readability and maintainability are the key aspects for developers to diagnose the outcome of test cases and drive debugging activities. On the contrary, code coverage metrics are necessary but not sufficient to evaluate the capability of tests. Finally, we discover that available metrics are effective in characterizing poor-quality tests, while limited when distinguishing high-quality ones.

[C55] ICSME 2020

The Making of Accessible Android Applications: An Empirical Study on the State of the Practice.*

IEEE International Conference on Software Maintenance and Evolution - Registered Report, Adelaide, Australia, 2020.

Nowadays, mobile applications represent the principal means to enable human interaction. Being so pervasive, these applications should be made usable for all users: accessibility collects the guidelines that developers should follow to include features allowing users with disabilities (e.g., visual impairments) to better interact with an application.  Download PDF

Conference Software Quality Computer-Human Interaction M. Di Gregorio, D. Di Nucci, F. Palomba, G. Vitiello.
M. Di Gregorio, D. Di Nucci, F. Palomba, G. Vitiello. Conference Software Quality Computer-Human Interaction

Abstract. Context. Nowadays, mobile applications represent the principal means to enable human interaction. Being so pervasive, these applications should be made usable for all users: accessibility collects the guidelines that developers should follow to include features allowing users with disabilities (e.g., visual impairments) to better interact with an application. Problem. While research in this field is gaining interest, there is still a notable lack of knowledge on how developers practically deal with the problem: (i) whether they are aware and take accessibility guidelines into account when developing apps, (ii) which guidelines are harder for them to implement, and (iii) which tools they use to be supported in this task. Objective. To bridge the gap of knowledge on the state of the practice concerning the accessibility of mobile applications. Method. Adopting a mixed-method research approach, we aim to (i) verify how accessibility guidelines are implemented in mobile applications through a coding strategy and (ii) survey mobile developers on the issues and challenges of dealing with accessibility in practice. Limitations. Threats are represented by the size of the app sample and the number of answers to our survey study.

[C54] AVI 2020

VITRuM - A Plug-In for the Visualization of Test-Related Metrics.*

ACM International Conference on Advanced Visual Interfaces, Ischia, Italy, 2020.

Software testing is the first weapon against software faults, used by developers to preventively locate implementation errors in the exercised production code that may cause critical failures to the inner-working of software systems. According to recent findings, the effectiveness of testing might be not only due to its ability to cover the production code but also to some other properties, like code quality.  Download PDF

Conference Software Testing Computer-Human Interaction F. Pecorelli, G. Di Lillo, F. Palomba, A. De Lucia.
F. Pecorelli, G. Di Lillo, F. Palomba, A. De Lucia. Conference Software Testing Computer-Human Interaction

Abstract. Software testing is the first weapon against software faults, used by developers to preventively locate implementation errors in the exercised production code that may cause critical failures to the inner-working of software systems. According to recent findings, the effectiveness of testing might be not only due to its ability to cover the production code but also to some other properties, like code quality. Among other aspects, the literature reported that an advanced visualization of test-related metrics, e.g., test code coverage on production code, result to be a key strength for developers when dealing with software faults. In this paper, we propose VITRuM (VIsualization of Test-Related Metrics), an IntelliJ plug-in able to provide developers with an advanced visual interface of both static and dynamic test-related metrics that has the potential of making them more able to diagnose production code faults. The plug-in is available in the official JetBrains Plugins Repository. A video showing the tool in action is available at https://youtu.be/kFE81eYPgUg.

[C53] AVI 2020

cASpER: A Plug-in for Automated Code Smell Detection and Refactoring.*

ACM International Conference on Advanced Visual Interfaces, Ischia, Italy, 2020.

During software evolution, code is inevitably subject to continuous changes that are often performed by developers within short and strict deadlines. As a consequence, good design practices are often sacrificed, possibly leading to the introduction of sub-optimal de- sign or implementation solutions, the so-called code smells.  Download PDF

Conference Software Quality Computer-Human Interaction M. De Stefano, M. Gambardella, F. Pecorelli, F. Palomba, A. De Lucia.
M. De Stefano, M. Gambardella, F. Pecorelli, F. Palomba, A. De Lucia. Conference Software Quality Computer-Human Interaction

Abstract. During software evolution, code is inevitably subject to continuous changes that are often performed by developers within short and strict deadlines. As a consequence, good design practices are often sacrificed, possibly leading to the introduction of sub-optimal design or implementation solutions, the so-called code smells. Several studies have shown that the presence of code smells makes the source code more change- and fault-prone, reduces productivity, and causes greater rework and more significant design efforts for developers. Refactoring is the practice that developers may use to remove code smells without changing the external behavior of the source code. However, it requires much time and effort and is poorly automated, often leading developers to prefer keeping low- quality code instead of spending time in designing and performing refactoring operations. To mitigate this problem and support developers throughout the process of code smell identification and refactoring, in this paper we present cASpER, a IntelliJ IDEA plugin that provides visual and semi-automatic support for detection and refactoring four different types of code smells.

[C52] AVI 2020

Counterterrorism for Cyber-Physical Spaces: A Computer Vision Approach.*

ACM International Conference on Advanced Visual Interfaces, Ischia, Italy, 2020.

Simulating terrorist scenarios in cyber-physical spaces — that is, urban open or (semi-) closed spaces combined with a cyber-physical systems counterparts — is challenging given the context and variables therein. This paper addresses the aforementioned issue with Alter, a framework featuring computer vision and Generative Adversarial Neural Networks (GANs) over terrorist scenarios.  Download PDF

Conference Computer-Human Interaction G. Cascavilla, J. Slabber, F. Palomba, D. Di Nucci, D. Tamburri, W.J. van den Heuvel.
G. Cascavilla, J. Slabber, F. Palomba, D. Di Nucci, D. Tamburri, W.J. van den Heuvel. Conference Computer-Human Interaction

Abstract. Simulating terrorist scenarios in cyber-physical spaces — that is, urban open or (semi-) closed spaces combined with a cyber-physical systems counterparts — is challenging given the context and variables therein. This paper addresses the aforementioned issue with Alter, a framework featuring computer vision and Generative Adversarial Neural Networks (GANs) over terrorist scenarios. We obtained the data for the terrorist scenarios by creating a synthetic dataset, exploiting the Grand Theft Auto V (GTAV) videogame, and the Unreal Game Engine behind it, in combination with Open-StreetMap data. The results of the proposed approach show its feasibility to predict criminal activities in cyber-physical spaces. Moreover, the usage of our synthetic scenarios elicited from GTAV is promising in building datasets for cybersecurity and Cyber-Threat Intelligence (CTI) featuring simulated videogaming platforms. We learned that local authorities can simulate terrorist scenarios for their own cities based on previous or related reference and this helps them in three ways: (1) better determine the necessary security measures; (2) better use the expertise of the authorities; (3) refine preparedness scenarios and drills for sensitive areas.

[C51] ICPC 2020

Just-In-Time Test Smell Detection and Refactoring: The DARTS Project.*

IEEE/ACM International Conference on Program Comprehension (ICPC 2020) - Tool Demo Track, Seoul, South Korea, 2020.

Test smells represent sub-optimal design or implementation solutions applied when developing test cases. Previous research has shown that these smells may decrease both maintainability and effectiveness of tests and, as such, researchers have been devising methods to automatically detect them.  Download PDF

Conference Software Quality S. Lambiase, A. Cupito, F. Pecorelli, A. De Lucia, F. Palomba.

Just-In-Time Test Smell Detection and Refactoring: The DARTS Project.*

S. Lambiase, A. Cupito, F. Pecorelli, A. De Lucia, F. Palomba. Conference Software Quality

Abstract. Test smells represent sub-optimal design or implementation solutions applied when developing test cases. Previous research has shown that these smells may decrease both maintainability and effectiveness of tests and, as such, researchers have been devising methods to automatically detect them. Nevertheless, there is still a lack of tools that developers can use within their integrated devel- opment environment to identify test smells and refactor them. In this paper, we present DARTS (Detection And Refactoring of Test Smells), an Intellij plug-in which (1) implements a state-of-the-art detection mechanism to detect instances of three test smell types, i.e., General Fixture, Eager Test, and Lack of Cohesion of Test Meth- ods, at commit-level and (2) enables their automated refactoring through the integrated APIs provided by Intellij.

[C50] ICPC 2020

Refactoring Android-specific Energy Smells: A Plugin for Android Studio.*

IEEE/ACM International Conference on Program Comprehension (ICPC 2020) - Tool Demo Track, Seoul, South Korea, 2020.

Mobile applications are major means to perform daily actions, including social and emergency connectivity. However, their usability is threatened by energy consumption that may be impacted by code smells, i.e., symptoms of bad implementation and design practices. In particular, researchers derived a set of mobile-specific code smells resulting in increased energy consumption of mobile apps and removing such smells through refactoring can mitigate the problem.  Download PDF

Conference Mobile Apps Evolution Software Quality E. Iannone, F. Pecorelli, D. Di Nucci, F. Palomba, A. De Lucia.

Refactoring Android-specific Energy Smells: A Plugin for Android Studio.*

E. Iannone, F. Pecorelli, D. Di Nucci, F. Palomba, A. De Lucia. Conference Mobile Apps Evolution Software Quality

Abstract. Mobile applications are major means to perform daily actions, including social and emergency connectivity. However, their usability is threatened by energy consumption that may be impacted by code smells, i.e., symptoms of bad implementation and design practices. In particular, researchers derived a set of mobile-specific code smells resulting in increased energy consumption of mobile apps and removing such smells through refactoring can mitigate the problem. In this paper, we extend and revise aDoctor, a tool that we previously implemented to identify energy-related smells. On the one hand, we present and implement automated refactoring solutions to those smells. On the other hand, we make the tool completely open-source and available in Android Studio as a plugin pub- lished in the official store. The video showing the tool in action is available at: https://www.youtube.com/watch?v=1c2EhVXiKis

[C49] ICPC 2020

OpenSZZ: A Free, Open-Source, Web-Accessible Implementation of the SZZ Algorithm.*

IEEE/ACM International Conference on Program Comprehension (ICPC 2020) - Tool Demo Track, Seoul, South Korea, 2020.

The accurate identification of defect-inducing commits represents a key problem for researchers interested in studying the naturalness of defects and defining defect prediction models. To tackle this problem, software engineering researchers have relied on and proposed several implementations of the well-known Sliwerski-Zimmermann-Zeller (SZZ) algorithm.  Download PDF

Conference Software Quality V. Lenarduzzi, F. Palomba, D. Taibi, D. Tamburri.

OpenSZZ: A Free, Open-Source, Web-Accessible Implementation of the SZZ Algorithm.*

V. Lenarduzzi, F. Palomba, D. Taibi, D. Tamburri. Conference Software Quality

Abstract. The accurate identification of defect-inducing commits represents a key problem for researchers interested in studying the naturalness of defects and defining defect prediction models. To tackle this problem, software engineering researchers have relied on and proposed several implementations of the well-known Sliwerski-Zimmermann-Zeller (SZZ) algorithm. Despite its popularity and wide usage, no open-source, publicly available, and web-accessible implementation of the algorithm has been proposed so far. In this paper, we prototype and make available one such implementation for further use by practitioners and researchers alike. The evaluation of the proposed prototype showed competitive results and lays the foundation for future work. This paper outlines our prototype, illustrating its usage and reporting on its evaluation in action.

[C48] MSR 2020

Developer-Driven Code Smell Prioritization.*

IEEE/ACM International Conference on Mining Software Repositories (MSR 2020), Seoul, South Korea, 2020.

Code smells are symptoms of poor implementation choices applied during software evolution. While previous research has devoted effort in the definition of automated solutions to detect them, still little is known on how to support developers when prioritizing them.  Download PDF

Conference Software Quality Empirical Software Engineering F. Pecorelli, F. Palomba, F. Khomh, A. De Lucia.

Developer-Driven Code Smell Prioritization.*

F. Pecorelli, F. Palomba, F. Khomh, A. De Lucia. Conference Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor implementation choices applied during software evolution. While previous research has devoted effort in the definition of automated solutions to detect them, still little is known on how to support developers when prioritizing them. Some works attempted to deliver solutions that can rank smell instances based on their severity, computed on the basis of software metrics. However, this may not be enough since it has been shown that the recommendations provided by current approaches do not take the developer's perception of design issues into account. In this paper, we perform a first step toward the concept of developer-driven code smell prioritization and propose an approach based on machine learning able to rank code smells according to the perceived criticality that developers assign to them. We evaluate our technique in an empirical study to investigate its accuracy and the features that are more relevant for classifying the developer's perception. Finally, we compare our approach with a state-of-the-art technique. Key findings show that the our solution has an F-Measure up to 85% and outperforms the baseline approach.

[C47] ICPC 2020

Testing of Mobile Applications in the Wild: A Large-Scale Empirical Study on Android Apps.*

IEEE/ACM International Conference on Program Comprehension (ICPC 2020), Seoul, South Korea, 2020.

Nowadays, mobile applications (a.k.a., apps) are used by over two billion users for every type of need, including social and emergency connectivity. Their pervasiveness in today's world has inspired the software testing research community in devising approaches to allow developers to better test their apps and improve the quality of the tests being developed.  Download PDF

Conference Mobile Apps Evolution Empirical Software Engineering F. Pecorelli, G. Catolino, F. Ferrucci, A. De Lucia, F. Palomba.

Testing of Mobile Applications in the Wild: A Large-Scale Empirical Study on Android Apps.*

F. Pecorelli, G. Catolino, F. Ferrucci, A. De Lucia, F. Palomba. Conference Mobile Apps Evolution Empirical Software Engineering

Abstract. Nowadays, mobile applications (a.k.a., apps) are used by over two billion users for every type of need, including social and emergency connectivity. Their pervasiveness in today's world has inspired the software testing research community in devising approaches to allow developers to better test their apps and improve the quality of the tests being developed. In spite of this research effort, we still notice a lack of empirical studies aiming at assessing the actual quality of test cases developed by mobile developers: this perspective could provide evidence-based findings on the current status of testing in the wild as well as on the future research directions in the field. As such, we performed a large-scale empirical study targeting 1,780 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, and (3) what is their effectiveness. The key results of our study show that mobile developers still tend not to properly test their apps. Furthermore, we discovered that the test cases of the considered apps have a low (i) design quality, both in terms of test code metrics and test smells, and (ii) effectiveness when considering code coverage as well as assertion density.

[C46] ICSE 2020

Refactoring Community Smells in the Wild: The Practitioner's Field Manual.*

IEEE/ACM International Conference on Software Engineering (ICSE 2020) - Software Engineering in Society Track, Seoul, South Korea, 2020.

Community smells have been defined as sub-optimal organizational structures that may lead to social debt. Previous studies have shown that they are highly diffused in both open- and closed-source projects, are perceived as harmful by practitioners, and can even lead to the introduction of technical debt in source code.  Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering G. Catolino, F. Palomba, D. Tamburri, A. Serebrenik, F. Ferrucci.

Refactoring Community Smells in the Wild: The Practitioner's Field Manual.*

G. Catolino, F. Palomba, D. Tamburri, A. Serebrenik, F. Ferrucci Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. Community smells have been defined as sub-optimal organizational structures that may lead to social debt. Previous studies have shown that they are highly diffused in both open- and closed-source projects, are perceived as harmful by practitioners, and can even lead to the introduction of technical debt in source code. Despite the presence of this body of research, little is known on the practitioners’ perceived prominence of community smells in practice as well as on the strategies adopted to deal with them. This paper aims at bridging this gap by proposing an empirical study in which 76 software practitioners are inquired on (i) the prominence of four well-known community smells, i.e., Organizational Silo, Black Cloud, Lone Wolf, and Radio Silence, in their contexts and (ii) the methods they adopted to "refactor" them. Our results first reveal that community smells frequently manifest themselves in software projects and, more importantly, there exist specific refactoring practices to deal with each of the considered community smells.

[C45] CHI 2020

UI Dark Patterns and Where to Find Them: A Study on Mobile Applications and User Perception.*

38th ACM CHI Conference on Human Factors in Computing Systems, Honolulu (Hawai), USA, 2020.

A Dark Pattern (DP) is an interface maliciously crafted to deceive users into performing actions they did not mean to do. Although design experts have reported on DPs extensively, little effort has been made to study how pervasive they are, especially in mobile applications.  Download PDF

Conference Mobile Apps Evolution Computer-Human Interaction L. Di Geronimo, L. Braz, E. Fregnan, F. Palomba, A. Bacchelli.

UI Dark Patterns and Where to Find Them: A Study on Mobile Applications and User Perception.*

L. Di Geronimo, L. Braz, E. Fregnan, F. Palomba, A. Bacchelli. Conference Mobile Apps Evolution Computer-Human Interaction

Abstract. A Dark Pattern (DP) is an interface maliciously crafted to deceive users into performing actions they did not mean to do. Although design experts have reported on DPs extensively, little effort has been made to study how pervasive they are, especially in mobile applications. In this work, we analyze DPs in 240 popular apps and conduct an online study with 589 users on how they perceive DPs. The results of the analysis showed that 95% of apps contain one or more forms of DPs and, on average, popular applications include at least seven different types of deceiving UIs. The online study shows that most users do not recognize DPs, and they would change their behavior on app usage once informed about them. We discuss the impact of our work and what measures could be applied to alleviate malicious design issues.

[C44] CASCON 2019

On the Distribution of Test Smells in Open Source Android Applications: An Exploratory Study.*

29th International Conference on Computer Science and Software Engineering, Ontario, Canada, 2019.

The impact of bad programming practices, such as code smells, in production code has been the focus of numerous studies in soft- ware engineering. Like production code, unit tests are also affected by bad programming practices which can have a negative impact on the quality and maintenance of a software system.  Download PDF

Conference Software Testing Empirical Software Engineering A. Peruma, K. Almalki, C. Newman, M. Mkaouer, A. Ouni, F. Palomba.

On the Distribution of Test Smells in Open Source Android Applications: An Exploratory Study.*

A. Peruma, K. Almalki, C. Newman, M. Mkaouer, A. Ouni, F. Palomba. Conference Software Testing Empirical Software Engineering

Abstract. The impact of bad programming practices, such as code smells, in production code has been the focus of numerous studies in software engineering. Like production code, unit tests are also affected by bad programming practices which can have a negative impact on the quality and maintenance of a software system. While several studies addressed code and test smells in desktop applications, there is little knowledge of test smells in the context of mobile applications. In this study, we extend the existing catalog of test smells by identifying and defining new smells and survey over 40 developers who confirm that our proposed smells are bad programming practices in test suites. Additionally, we perform an empirical study on the occurrences and distribution of the proposed smells on 656 open-source Android apps. Our findings show a widespread occurrence of test smells in apps. We also show that apps tend to exhibit test smells early in their lifetime with different degrees of co-occurrences on different smell types. This empirical study demonstrates that test smells can be used as an indicator for necessary preventive software maintenance for test suites.

[C43] ICSME 2019

How the Experience of Development Teams Relates to Assertion Density of Test Classes.*

35th IEEE Internation Conference on Software Maintenance and Evolution (ICSME), Cleveland, USA, 2019.

The impact of developers’ experience on several development practices has been widely investigated in the past. One of the most promising research fields is software testing, as many researchers found significant correlations between developers’ experience and testing effectiveness.  Download PDF

Conference Software Testing Empirical Software Engineering G. Catolino, F. Palomba, A. Zaidman, F. Ferrucci.

How the Experience of Development Teams Relates to Assertion Density of Test Classes.*

G. Catolino, F. Palomba, A. Zaidman, F. Ferrucci. Conference Software Testing Empirical Software Engineering

Abstract. The impact of developers’ experience on several development practices has been widely investigated in the past. One of the most promising research fields is software testing, as many researchers found significant correlations between developers’ experience and testing effectiveness. In this paper, we aim at further studying this relation, by focusing on how development teams’ experience is associated with the assertion density, i.e., the number of assertions per test class KLOC, that has previously been shown as an effective way to decrease fault density. We perform a mixed-methods empirical study. First, we devise a statistical model relating development teams’ experience and other control factors to the assertion density of test classes belonging to 12 software projects. This model enables us to investigate whether experience comes out as a statistically significant factor to explain assertion density. Second, we contrast the statistical findings with a survey study conducted with 57 developers, who were asked their opinions on how developer’s experience is related to the way they add assertions in test code. Our findings suggest the existence of a relationship: on the one hand, the development team’s experience is a statistically significant factor in most of the systems that we have investigated; on the other hand, developers confirm the importance of experience and team composition for the effective testing of production code.

[C42] ICSME 2019

Adoption, Support, and Challenges of Infrastructure-as-Code: Insights from Industry.*

35th IEEE Internation Conference on Software Maintenance and Evolution (ICSME), Industrial Track, Cleveland, USA, 2019.

Infrastructure-as-code (IaC) is the DevOps tactic of managing and provisioning infrastructure through machinereadable definition files, rather than physical hardware configuration or interactive configuration tools.  Download PDF

Conference Empirical Software Engineering M. Guerriero, M. Garriga, D. A. Tamburri, F. Palomba.

Adoption, Support, and Challenges of Infrastructure-as-Code: Insights from Industry.*

M. Guerriero, M. Garriga, D. A. Tamburri, F. Palomba. Conference Empirical Software Engineering

Abstract. Infrastructure-as-code (IaC) is the DevOps tactic of managing and provisioning infrastructure through machinereadable definition files, rather than physical hardware configuration or interactive configuration tools. From a maintenance and evolution perspective, the topic has piqued the interest of practitioners and academics alike, given the relative scarcity of supporting patterns, best practices, tools, and software engineering techniques. Using the data coming from 44 semi-structured interviews in as many companies, in this paper we shed light on the state of the practice in the adoption of IaC and the key software engineering challenges in the field. Particularly, we investigate (i) how practitioners adopt and develop IaC, (ii) which support is currently available, i.e., the typically used tools and their advantages/disadvantages, and (iii) what are the practitioner’s needs when dealing with IaC development, maintenance, and evolution. Our findings clearly highlight the need for more research in the field: the support provided by currently available tools is still limited, and developers feel the need of novel techniques for testing and maintaining IaC code.

[C41] ESEC/ FSE 2019 Recommended

Understanding Flaky Tests: The Developer's Perspective.*

27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Tallinn, Estonia, 2019.

Flaky tests are software tests that exhibit a seemingly random outcome (pass or fail) when run against the same, identical code. Previous work has examined fixes to flaky tests and has proposed automated solutions to locate as well as fix flaky tests—we complement it by examining the perceptions of software developers about the nature, relevance, and challenges of this phenomenon.  Download PDF

Conference Software Testing Empirical Software Engineering M. Eck, F. Palomba, M. Castelluccio, A. Bacchelli.

Understanding Flaky Tests: The Developer's Perspective.*

M. Eck, F. Palomba, M. Castelluccio, A. Bacchelli. Conference Recommended Software Testing Empirical Software Engineering

Abstract. Flaky tests are software tests that exhibit a seemingly random outcome (pass or fail) when run against the same, identical code. Previous work has examined fixes to flaky tests and has proposed automated solutions to locate as well as fix flaky tests—we complement it by examining the perceptions of software developers about the nature, relevance, and challenges of this phenomenon. We asked 21 professional developers to classify 200 flaky tests they previously fixed, in terms of the nature of the flakiness, the origin of the flakiness, and the fixing effort. We complement this analysis with information about the fixing strategy. Subsequently, we conducted an online survey with 121 developers with a median industrial programming experience of five years. Our research shows that: The flakiness is due to several different causes, four of which have never been reported before, despite being the most costly to fix; flakiness is perceived as significant by the vast majority of developers, regardless of their team’s size and project’s domain, and it can have effects on resource allocation, scheduling, and the perceived reliability of the test suite; and the challenges developers report to face regard mostly the reproduction of the flaky behavior and the identification of the cause for the flakiness.

Download PDF BibTeX
@article{eck2019understanding,
  title={Understanding Flaky Tests: The Developer’s Perspective},
  author={Eck, Moritz and Palomba, Fabio and Castelluccio, Marco and Bacchelli, Alberto},
  year={2019}
}
[C40] MSR 2019

On the Effectiveness of Manual and Automatic Unit Test Generation: Ten Years Later.*

IEEE/ACM Working Conference on Mining Software Repositories (MSR 2019), Montreal, Canada, 2019.

Good unit tests play a paramount role when it comes to foster and evaluate software quality. However, writing effective tests is an extremely costly and time consuming practice.  Download PDF

Conference Software Testing Empirical Software Engineering D. Serra, G. Grano, F. Palomba, F. Ferrucci, H. Gall, A. Bacchelli.

On the Effectiveness of Manual and Automatic Unit Test Generation: Ten Years Later.*

D. Serra, G. Grano, F. Palomba, F. Ferrucci, H. Gall, A. Bacchelli. Conference Software Testing Empirical Software Engineering

Abstract. Good unit tests play a paramount role when it comes to foster and evaluate software quality. However, writing effective tests is an extremely costly and time consuming practice. To reduce such a burden for developers, researchers devised ingenious techniques to automatically generate test suite for existing code bases. Nevertheless, how automatically generated test cases fare against manually written ones is an open research question. In 2008, Bacchelli et al. conducted an initial case study comparing automatic and manually generated test suites. Since in the last ten years we have witnessed a huge amount of work on novel approaches and tools for automatic test generation, in this paper we revise their study using current tools as well as complementing their research method by evaluating these tools’ ability in finding regressions.

Download PDF BibTeX
@inproceedings{serra2019effectiveness,
  title={On the effectiveness of manual and automatic unit test generation: ten years later},
  author={Serra, Domenico and Grano, Giovanni and Palomba, Fabio and Ferrucci, Filomena and Gall, Harald C and Bacchelli, Alberto},
  booktitle={Proceedings of the 16th International Conference on Mining Software Repositories},
  pages={121--125},
  year={2019},
  organization={IEEE Press}
}
[C39] ICPC 2019

Comparing Machine Learning and Heuristic Approaches for Metric-Based Code Smell Detection.*

IEEE/ACM International Conference on Program Comprehension (ICPC 2019), Montreal, Canada, 2019.

Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically detect them have been devised.  Download PDF

Conference Software Quality Empirical Software Engineering F. Pecorelli, F. Palomba, D. Di Nucci, A. De Lucia.

Comparing Machine Learning and Heuristic Approaches for Metric-Based Code Smell Detection.*

F. Pecorelli, F. Palomba, D. Di Nucci, A. De Lucia. Conference Software Quality Empirical Software Engineering

Abstract. Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically detect them have been devised. Most of these techniques are based on heuristics, namely they compute a set of code metrics and combine them by creating detection rules; while they have a reasonable accuracy, a recent trend is represented by the use of machine learning where code metrics are used as predictors of the smelliness of code artefacts. Despite the recent advances in the field, there is still a noticeable lack of knowledge of whether machine learning can actually be more accurate than traditional heuristic-based approaches. To fill this gap, in this paper we propose a large-scale study to empirically compare the performance of heuristic-based and machine-learning-based techniques for metric-based code smell detection. We consider five code smell types and compare machine learning models with DECOR, a state-of-the-art heuristic-based approach. Key findings emphasize the need of further research aimed at improving the effectiveness of both machine learning and heuristic approaches for code smell detection: while DECOR generally achieves better performance than a machine learning baseline, its precision is still too low to make it usable in practice.

[C38] ICSE 2019

Gender Diversity and Women in Software Teams: How Do They Affect Community Smells?*

IEEE/ACM International Conference on Software Engineering (ICSE 2019) - Software Engineering in Society Track, Montreal, Canada, 2019.

As social as software engineers are, there is a known and established gender imbalance in our community structures, regardless of their open- or closed-source nature.

 Invited for the Special Issue

 Download PDF

Conference Socio-Technical Analytics Empirical Software Engineering G. Catolino, F. Palomba, D. A. Tamburri, A. Serebrenik, F. Ferrucci.

Gender Diversity and Women in Software Teams: How Do They Affect Community Smells?*

G. Catolino, F. Palomba, D. A. Tamburri, A. Serebrenik, F. Ferrucci. Conference Socio-Technical Analytics Empirical Software Engineering

Abstract. As social as software engineers are, there is a known and established gender imbalance in our community structures, regardless of their open- or closed-source nature. To shed light on the actual benefits of achieving such balance, this empirical study looks into the relations between such balance and the occurrence of community smells, that is, sub-optimal circumstances and patterns across the software organizational structure. Example of community smells are Organizational Silo effects (overly disconnected sub-groups) or Lone Wolves (defiant community members). Results indicate that the presence of women generally reduces the amount of community smells. We conclude that women are instrumental to reducing community smells in software development teams.

Download PDF BibTeX
@inproceedings{catolino2019gender,
  title={Gender diversity and women in software teams: How do they affect community smells?},
  author={Catolino, Gemma and Palomba, Fabio and Tamburri, Damian A and Serebrenik, Alexander and Ferrucci, Filomena},
  booktitle={Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Society},
  pages={11--20},
  year={2019},
  organization={IEEE Press}
}
[C37] ICSE 2019

Test-Driven Code Review: An Empirical Study.*

IEEE/ACM International Conference on Software Engineering (ICSE 2019), Montreal, Canada, 2019.

Test-Driven Code Review (TDR) is a code review practice in which a reviewer inspects a patch by examining the changed test code before the changed production code. Although this practice has been mentioned positively by practitioners in informal literature and interviews, there is no systematic knowledge on its effects, prevalence, problems, and advantages.  Download PDF

Conference Software Testing Empirical Software Engineering D. Spadini, F. Palomba, T. Baum, S. Hanenberg, M. Bruntink, A. Bacchelli.

Test-Driven Code Review: An Empirical Study.*

D. Spadini, F. Palomba, T. Baum, S. Hanenberg, M. Bruntink, A. Bacchelli. Conference Software Testing Empirical Software Engineering

Abstract. Test-Driven Code Review (TDR) is a code review practice in which a reviewer inspects a patch by examining the changed test code before the changed production code. Although this practice has been mentioned positively by practitioners in informal literature and interviews, there is no systematic knowledge on its effects, prevalence, problems, and advantages. In this paper, we aim at empirically understanding whether this practice has an effect on code review effectiveness and how developers’ perceive TDR. We conduct (i) a controlled experiment with 93 developers that perform more than 150 reviews, and (ii) 9 semi-structured interviews and a survey with 103 respondents to gather information on how TDR is perceived. Key results from the experiment show that developers adopting TDR find the same proportion of defects in production code, but more in test code, at the expenses of less maintainability issues in production code. Furthermore, we found that most developers prefer to review production code as they deem it more important and tests should follow from it. Moreover, widespread poor test code quality and no tool support hinder the adoption of TDR.

[C36] CSCW 2018

Information Needs in Contemporary Code Review.*

ACM Conference on Computer Supported Cooperative Work (CSCW 2018), New York, USA, 2018.

Contemporary code review is a widespread practice used by software engineers to maintain high software quality and share project knowledge. However, conducting proper code review takes time and developers often have limited time for review.

 CSCW 2018 Best Paper Honorable Mention

 Download PDF

Conference Software Quality L. Pascarella, D. Spadini, F. Palomba, M. Bruntik, A. Bacchelli.

Information Needs in Contemporary Code Review.*

L. Pascarella, D. Spadini, F. Palomba, M. Bruntik, A. Bacchelli. Conference Software Quality

Abstract. Contemporary code review is a widespread practice used by software engineers to maintain high software quality and share project knowledge. However, conducting proper code review takes time and developers often have limited time for review. In this paper, we aim at investigating the information that reviewers need to conduct a proper code review, to better understand this process and how research and tool support can make developers become more effective and efficient reviewers. Previous work has provided evidence that a successful code review process is one in which reviewers and authors actively participate and collaborate. In these cases, the threads of discussions that are saved by code review tools are a precious source of information that can be later exploited for research and practice. In this paper, we focus on this source of information as a way to gather reliable data on the aforementioned reviewers’ needs. We manually analyze 900 code review comments from three large open-source projects and organize them in categories by means of a card sort. Our results highlight the presence of seven high-level information needs, such as knowing the uses of methods and variables declared/modified in the code under review. Based on these results we suggest ways in which future code review tools can better support collaboration and the reviewing task.

Download PDF BibTeX
@article{pascarella2018information,
  title={Information needs in contemporary code review},
  author={Pascarella, Luca and Spadini, Davide and Palomba, Fabio and Bruntink, Magiel and Bacchelli, Alberto},
  journal={Proceedings of the ACM on Human-Computer Interaction},
  volume={2},
  number={CSCW},
  pages={135},
  year={2018},
  publisher={ACM}
}
[C35] ASE 2018

Mining File Histories: Should We Consider Branches?*

International Conference of Automated Software Engineering (ASE 2018), Montpellier, France, 2018.

Modern distributed version control systems, such as Git, offer support for branching — the possibility to develop parts of software outside the master trunk. Consideration of the repository structure in Mining Software Repository (MSR) studies requires a thorough approach to mining, but there is no well-documented, widespread methodology regarding the handling of merge commits and branches.  Download PDF

Conference Empirical Software Engineering V. Kovalenko, F. Palomba, A. Bacchelli.

Mining File Histories: Should We Consider Branches?*

V. Kovalenko, F. Palomba, A. Bacchelli. Conference Empirical Software Engineering

Abstract. Modern distributed version control systems, such as Git, offer support for branching — the possibility to develop parts of software outside the master trunk. Consideration of the repository structure in Mining Software Repository (MSR) studies requires a thorough approach to mining, but there is no well-documented, widespread methodology regarding the handling of merge commits and branches. Moreover, there is still a lack of knowledge of the extent to which considering branches during MSR studies impacts the results of the studies. In this study, we set out to evaluate the importance of proper handling of branches when calculating file modification histories. We analyze over 1,400 Git repositories of four open source ecosystems and compute modification histories for over two million files, using two different algorithms. One algorithm only follows the first parent of each commit when traversing the repository, the other returns the full modification history of a file across all branches. We show that the two algorithms consistently deliver different results, but the scale of the difference varies across projects and ecosystems. Further, we evaluate the importance of accurate mining of file histories by comparing the performance of common techniques that rely on file modification history — reviewer recommendation, change recommendation, and defect prediction — for two algorithms of file history retrieval. We find that considering full file histories leads to an increase in the techniques’ performance that is rather modest.

Download PDF BibTeX
@inproceedings{kovalenko2018mining,
  title={Mining file histories: should we consider branches?},
  author={Kovalenko, Vladimir and Palomba, Fabio and Bacchelli, Alberto},
  booktitle={Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering},
  pages={202--213},
  year={2018},
  organization={ACM}
}
[C34] ASE 2018

Continuous Code Quality: Are We (Really) Doing That?*

International Conference of Automated Software Engineering (ASE 2018), Montpellier, France, 2018.

Continuous Integration (CI) is a software engineering practice where developers constantly integrate their changes to a project through an automated build process. The goal of CI is to provide developers with prompt feedback on several quality dimensions after each change.  Download PDF

Conference Software Quality Empirical Software Engineering C. Vassallo, F. Palomba, A. Bacchelli, H. Gall.

Continuous Code Quality: Are We (Really) Doing That?*

C. Vassallo, F. Palomba, A. Bacchelli, H. Gall. Conference Software Quality Empirical Software Engineering

Abstract. Continuous Integration (CI) is a software engineering practice where developers constantly integrate their changes to a project through an automated build process. The goal of CI is to provide developers with prompt feedback on several quality dimensions after each change. Indeed, previous studies provided empirical evidence on a positive association between properly following CI principles and source code quality. A core principle behind CI is Continuous Code Quality (also known as CCQ, which includes automated testing and automated code inspection) may appear simple and effective, yet we know little about its practical adoption. In this paper, we propose a preliminary empirical investigation aimed at understanding how rigorously practitioners follow CCQ. Our study reveals a strong dichotomy between theory and practice: developers do not perform continuous inspection but rather control for quality only at the end of a sprint and most of the times only on the release branch.

Download PDF BibTeX
@inproceedings{vassallo2018continuous,
  title={Continuous code quality: are we (really) doing that?},
  author={Vassallo, Carmine and Palomba, Fabio and Bacchelli, Alberto and Gall, Harald C},
  booktitle={Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering},
  pages={790--795},
  year={2018},
  organization={ACM}
}
[C33] ICSME 2018

Continuous Refactoring in CI: A Preliminary Study On the Perceived Advantages and Barriers*

International Conference of Software Maintenance and Evolution (ICSME 2018), Madrid, Spain, 2018.

By definition, the practice of Continuous Integration (CI) promotes continuous software quality improvement. In systems adopting such a practice, quality assurance is usually performed by using static and dynamic analysis tools (e.g., SonarQube) that compute overall metrics such as maintainability or reliability measures.  Download PDF

Conference Software Quality Empirical Software Engineering C. Vassallo, F. Palomba, H. Gall.

Continuous Refactoring in CI: A Preliminary Study On the Perceived Advantages and Barriers*

C. Vassallo, F. Palomba, H. Gall. Conference Software Quality Empirical Software Engineering

Abstract. By definition, the practice of Continuous Integration (CI) promotes continuous software quality improvement. In systems adopting such a practice, quality assurance is usually performed by using static and dynamic analysis tools (e.g., SonarQube) that compute overall metrics such as maintainability or reliability measures. Furthermore, developers usually define quality gates, i.e., source code quality thresholds that must be reached by the software product after every newly committed change. If a quality gate fails (e.g., a maintainability metric is below a certain threshold), developers should refactor the code possibly addressing some of the proposed warnings. While previous research findings showed that refactoring is often not done in practice, it is still unclear whether and how the adoption of a CI philosophy has changed the way developers perceive and adopt refactoring. In this paper, we preliminarily study—running a survey study that involves 31 developers—how developers perform refactoring in CI, which needs they have and the barriers they face while continuously refactor source code.

Download PDF BibTeX
@inproceedings{vassallo2018continuous,
  title={Continuous refactoring in ci: A preliminary study on the perceived advantages and barriers},
  author={Vassallo, Carmine and Palomba, Fabio and Gall, Harald C},
  booktitle={2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
  pages={564--568},
  year={2018},
  organization={IEEE}
}
[C32] ICSME 2018

On The Relation of Test Smells to Software Code Quality*

International Conference of Software Maintenance and Evolution (ICSME 2018), Madrid, Spain, 2018.

Test smells are sub-optimal design choices in the implementation of test code. As reported by recent studies, their presence might not only negatively affect the comprehension of test suites, but can also lead to test cases being less effective in finding bugs in production code.  Download PDF

Conference Software Testing Empirical Software Engineering D. Spadini, F. Palomba, A. Zaidman, M. Bruntink, A. Bacchelli.

On The Relation of Test Smells to Software Code Quality*

D. Spadini, F. Palomba, A. Zaidman, M. Bruntink, A. Bacchelli. Conference Software Testing Empirical Software Engineering

Abstract. Test smells are sub-optimal design choices in the implementation of test code. As reported by recent studies, their presence might not only negatively affect the comprehension of test suites, but can also lead to test cases being less effective in finding bugs in production code. Although important steps toward understanding test smells, there is still a notable absence of studies assessing their association with software quality. In this paper, we investigate the relationship between the presence of test smells and the change- and defect-proneness of test code, as well as the defect-proneness of the production code being tested. To this aim, we collect data pertaining to 221 releases of ten software systems and we analyze more than a million test cases to investigate the association of six test smells and their co-occurrence with software quality. Key results of our study include: (i) tests with smells are more change- and defect-prone, (ii) ‘Indirect Testing’, ‘Eager Test’, and ‘Assertion Roulette’ are the most significant smells for change-proneness and, (iii) production code is more defect-prone when tested by smelly tests.

Download PDF BibTeX
@inproceedings{spadini2018relation,
  title={On the relation of test smells to software code quality},
  author={Spadini, Davide and Palomba, Fabio and Zaidman, Andy and Bruntink, Magiel and Bacchelli, Alberto},
  booktitle={2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
  pages={1--12},
  year={2018},
  organization={IEEE}
}
[C31] ICSME 2018

Automatic Test Smell Detection Using Information Retrieval Techniques*

International Conference of Software Maintenance and Evolution (ICSME 2018), Madrid, Spain, 2018.

Software testing is a key activity to control the reliability of production code. Unfortunately, the effectiveness of test cases can be threatened by the presence of faults. Recent work showed that static indicators can be exploited to identify testrelated issues.  Download PDF

Conference Software Testing Empirical Software Engineering F. Palomba, A. Zaidman, A. De Lucia.

Automatic Test Smell Detection Using Information Retrieval Techniques*

F. Palomba, A. Zaidman, A. De Lucia. Conference Software Testing Empirical Software Engineering

Abstract. Software testing is a key activity to control the reliability of production code. Unfortunately, the effectiveness of test cases can be threatened by the presence of faults. Recent work showed that static indicators can be exploited to identify testrelated issues. In particular test smells, i.e., sub-optimal design choices applied by developers when implementing test cases, have been shown to be related to test case effectiveness. While some approaches for the automatic detection of test smells have been proposed so far, they generally suffer of poor performance: as a consequence, current detectors cannot properly provide support to developers when diagnosing the quality of test cases. In this paper, we aim at making a step ahead toward the automated detection of test smells by devising a novel textual-based detector, coined TASTE (Textual AnalySis for Test smEll detection), with the aim of evaluating the usefulness of textual analysis for detecting three test smell types, General Fixture, Eager Test, and Lack of Cohesion of Methods. We evaluate TASTE in an empirical study that involves a manually-built dataset composed of 494 test smell instances belonging to 12 software projects, comparing the capabilities of our detector with those of two code metrics-based techniques proposed by Van Rompaey et al. and Greiler et al. Our results show that the structural-based detection applied by existing approaches cannot identify most of the test smells in our dataset, while TASTE is up to 44% more effective. Finally, we find that textual and structural approaches can identify different sets of test smells, thereby indicating complementarity.

Download PDF BibTeX
@inproceedings{palomba2018automatic,
  title={Automatic test smell detection using information retrieval techniques},
  author={Palomba, Fabio and Zaidman, Andy and De Lucia, Andrea},
  booktitle={2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
  pages={311--322},
  year={2018},
  organization={IEEE}
}
[C30] SANER 2018

BECLoMA: Augmenting Stack Traces with User Review Information.*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018) - Formal Tool Demo, Campobasso, Italy.

Mobile devices such as smartphones, tablets and wearables are changing the way we do things, radically modifying our approach to technology. To sustain the high competition characterizing the mobile market, developers need to deliver high quality applications in a short release cycle.

 SANER 2018 Best Tool Demo Paper Award

 Download PDF

Conference Mobile Apps Evolution Tool Demo L. Pelloni, G. Grano, A. Ciurumelea, S. Panichella, F. Palomba, H. Gall.

BECLoMA: Augmenting Stack Traces with User Review Information.*

L. Pelloni, G. Grano, A. Ciurumelea, S. Panichella, F. Palomba, H. Gall. Conference Mobile Apps Evolution Tool Demo

Abstract. Mobile devices such as smartphones, tablets and wearables are changing the way we do things, radically modifying our approach to technology. To sustain the high competition characterizing the mobile market, developers need to deliver high quality applications in a short release cycle. To reveal and fix bugs as soon as possible, researchers and practitioners proposed tools to automate the testing process. However, such tools generate a high number of redundant inputs, lacking of contextual information and generating reports difficult to analyze. In this context, the content of user reviews represents an unmatched source for developers seeking for defects in their applications. However, no prior work explored the adoption of information available in user reviews for testing purposes. In this demo we present BECLOMA, a tool to enable the integration of user feedback in the testing process of mobile apps. BECLOMA links information from testing tools and user reviews, presenting to developers an augmented testing report combining stack traces with user reviews information referring to the same crash. We show that BECLOMA facilitates not only the diagnosis and fix of app bugs, but also presents additional benefits: it eases the usage of testing tools and automates the analysis of user reviews from the Google Play Store.

Download PDF BibTeX
@inproceedings{pelloni2018becloma,
  title={Becloma: Augmenting stack traces with user review information},
  author={Pelloni, Lucas and Grano, Giovanni and Ciurumelea, Adelina and Panichella, Sebastiano and Palomba, Fabio and Gall, Harald C},
  booktitle={2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={522--526},
  year={2018},
  organization={IEEE}
}
[C29] ICPC 2018

Do Developers Update Third-Party Libraries in Mobile Apps?*

International Conference on Program Comprehension (ICPC 2018), Gothenburg, Sweden, 2018.

One of the most common strategies to develop new software is to take advantage of existing source code, which is available in comprehensive packages called third-party libraries. As for all software systems, even these libraries change to offer new functionalities and fix bugs or security issues.

 Invited for the Special Issue

 Download PDF

Conference Mobile Apps Evolution Empirical Software Engineering P. Salza, F. Palomba, D. Di Nucci, C. D'Uva, F. Ferrucci, A. De Lucia.

Do Developers Update Third-Party Libraries in Mobile Apps?*

P. Salza, F. Palomba, D. Di Nucci, C. D'Uva, F. Ferrucci, A. De Lucia. Conference Mobile Apps Evolution Empirical Software Engineering

Abstract. One of the most common strategies to develop new software is to take advantage of existing source code, which is available in comprehensive packages called third-party libraries. As for all software systems, even these libraries change to offer new functionalities and fix bugs or security issues. The way the changes are propagated has been studied by researchers, interested in understanding their impact on the non-functional attributes of the systems source code. While the research community mainly focused on the change propagation phenomenon in the context of traditional applications, only little is known regarding the mobile context. In this paper, we aim at bridging this gap by conducting an empirical study on the evolution history of 291 mobile apps, by investigating (i) whether mobile developers actually update third-party libraries, (ii) which are the categories of libraries with respect to the developers’ proneness to update their apps, (iii) what are the common patterns followed by developers when updating a software library, and (iv) whether high- and low-rated apps present peculiar update patterns. The results of the study showed that mobile developers rarely update their apps with respect to the used libraries, and when they do, they mainly tend to update the libraries related to the Graphical User Interface, with the aim of keeping the mobile apps updated with the latest design tendencies. In some cases developers ignore updates because of a poor awareness of the benefits, or a too high cost/benefit ratio. Finally, high- and low-rated apps present strong differences.

Download PDF BibTeX
@inproceedings{salza2018developers,
  title={Do developers update third-party libraries in mobile apps?},
  author={Salza, Pasquale and Palomba, Fabio and Di Nucci, Dario and D'Uva, Cosmo and De Lucia, Andrea and Ferrucci, Filomena},
  booktitle={Proceedings of the 26th Conference on Program Comprehension},
  pages={255--265},
  year={2018},
  organization={ACM}
}
[C28] MSR 2018

How Is Video Game Development Different from Software Development in Open Source?*

IEEE/ACM Working Conference on Mining Software Repositories (MSR 2018), Gothenburg, Sweden, 2018.

Recent research has provided evidence that, in the industrial context, developing video games diverges from developing software systems in other domains, such as office suites and system utilities.  Download PDF

Conference Software Quality Empirical Software Engineering L. Pascarella, F. Palomba, M. Di Penta, A. Bacchelli.

How Is Video Game Development Different from Software Development in Open Source?*

L. Pascarella, F. Palomba, M. Di Penta, A. Bacchelli. Conference Software Quality Empirical Software Engineering

Abstract. Recent research has provided evidence that, in the industrial context, developing video games diverges from developing software systems in other domains, such as office suites and system utilities. In this paper, we consider video game development in the open source system (OSS) context. Specifically, we investigate how developers contribute to video games vs. non-games by working on different kinds of artifacts, how they handle malfunctions, and how they perceive the development process of their projects. To this purpose, we conducted a mixed, qualitative and quantitative study on a broad suite of 60 OSS projects. Our results confirm the existence of significant differences between game and non-game development, in terms of how project resources are organized and in the diversity of developers’ specializations. Moreover, game developers responding to our survey perceive more difficulties than other developers when reusing code as well as performing automated testing, and they lack a clear overview of their system’s requirements.

Download PDF BibTeX
@inproceedings{pascarella2018video,
  title={How is video game development different from software development in open source?},
  author={Pascarella, Luca and Palomba, Fabio and Di Penta, Massimiliano and Bacchelli, Alberto},
  booktitle={2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)},
  pages={392--402},
  year={2018},
  organization={IEEE}
}
[C27] MSR 2018

A Graph-based Dataset of Commit History of Real-World Android apps.*

IEEE/ACM Working Conference on Mining Software Repositories (MSR 2018), Gothenburg, Sweden, 2018.

Empirical studies on the engineering of Android apps need to be based on open datasets and tools to allow comparisons, improve generalizability, and enable replicability. However, obtaining a good dataset is problematic and this state of things slows down empirical research on this topic.  Download PDF

Conference Software Quality Dataset F. Geiger, I. Malavolta, L. Pascarella, F. Palomba, D. Di Nucci, A. Bacchelli.

A Graph-based Dataset of Commit History of Real-World Android apps.*

F. Geiger, I. Malavolta, L. Pascarella, F. Palomba, D. Di Nucci, A. Bacchelli. Conference Software Quality Dataset

Abstract. Empirical studies on the engineering of Android apps need to be based on open datasets and tools to allow comparisons, improve generalizability, and enable replicability. However, obtaining a good dataset is problematic and this state of things slows down empirical research on this topic. In this paper, we contribute to overcome this challenge by presenting the first, self-contained, publicly available dataset weaving spread-out data sources about real-world, open-source Android apps. Our dataset is encoded as a graph-based database and contains the following information about 8,431 real open-source Android apps: (i) metadata about their GitHub projects, (ii) Git repositories with full commit history and (iii) metadata extracted from the Google Play store, such as app ratings and permissions. The dataset is available in Docker images to ease adoption.

Download PDF BibTeX
@inproceedings{geiger2018graph,
  title={A graph-based dataset of commit history of real-world android apps},
  author={Geiger, Franz-Xaver and Malavolta, Ivano and Pascarella, Luca and Palomba, Fabio and Di Nucci, Dario and Bacchelli, Alberto},
  booktitle={Proceedings of the 15th International Conference on Mining Software Repositories},
  pages={30--33},
  year={2018},
  organization={ACM}
}
[C26] MOBILE SOFT 2018

Self-Reported Activities of Android Developers.*

International Conference on Mobile Software Engineering and Systems (MobileSoft 2018), Gothenburg, Sweden, 2018.

To gain a deeper empirical understanding of how developers work on Android apps, we investigate self-reported activities of Android developers and to what extent these activities can be classified with machine learning techniques.  Download PDF

Conference Mobile Apps Evolution Empirical Software Engineering L. Pascarella, F. Geiger, F. Palomba, D. Di Nucci, I. Malavolta, A. Bacchelli.

Self-Reported Activities of Android Developers.*

L. Pascarella, F. Geiger, F. Palomba, D. Di Nucci, I. Malavolta, A. Bacchelli. Conference Mobile Apps Evolution Empirical Software Engineering

Abstract. To gain a deeper empirical understanding of how developers work on Android apps, we investigate self-reported activities of Android developers and to what extent these activities can be classified with machine learning techniques. To this aim, we firstly create a taxonomy of self-reported activities coming from the manual analysis of 5,000 commit messages from 8,280 Android apps. Then, we study the frequency of each category of self-reported activities identified in the taxonomy, and investigate the feasibility of an automated classification approach. Our findings can inform be used by both practitioners and researchers to take informed decisions or support other software engineering activities.

Download PDF BibTeX
@inproceedings{pascarella2018self,
  title={Self-reported activities of android developers},
  author={Pascarella, Luca and Geiger, Franz-Xaver and Palomba, Fabio and Di Nucci, Dario and Malavolta, Ivano and Bacchelli, Alberto},
  booktitle={2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft)},
  pages={144--155},
  year={2018},
  organization={IEEE}
}
[C25] SANER 2018

Re-evaluating Method-Level Bug Prediction.*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018 - RENE Track), Campobasso, Italy, 2018.

Bug prediction is aimed at supporting developers in the identification of code artifacts more likely to be defective. Most approaches defined so far target the prediction of bugs at class-level, thus pinpointing the presence of a bug in an entire source file.  Download PDF

Conference Software Quality Empirical Software Engineering L. Pascarella, F. Palomba, A. Bacchelli.

Re-evaluating Method-Level Bug Prediction.*

L. Pascarella, F. Palomba, A. Bacchelli. Conference Software Quality Empirical Software Engineering

Abstract. Bug prediction is aimed at supporting developers in the identification of code artifacts more likely to be defective. Most approaches defined so far target the prediction of bugs at class-level, thus pinpointing the presence of a bug in an entire source file. Nevertheless, past research has provided evidence that this granularity might be too coarse-grained, thus reducing the usability of bug prediction in practice. As a consequence, researchers have started proposing method-level bug prediction models, showing promising evidence that it is possible to operate at this level of granularity. In this study, we first replicate previous research on methodlevel bug prediction on different systems/timespans. Afterwards, we reflect on the evaluation strategy and propose a more realistic one. Key results of our study show that the performance of the method-level bug prediction model is similar to what previously reported also for different systems/timespans, when evaluated with the same strategy. However—when evaluated with a more realistic strategy—all the models show a dramatic drop in performance showing results close to that of a random classifiers. Our replication and negative results indicate that method-level bug prediction is still an open challenge.

Download PDF BibTeX
@inproceedings{pascarella2018re,
  title={Re-evaluating method-level bug prediction},
  author={Pascarella, Luca and Palomba, Fabio and Bacchelli, Alberto},
  booktitle={2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={592--601},
  year={2018},
  organization={IEEE}
}
[C24] SANER 2018

Detecting Code Smells using Machine Learning Techniques: Are We There Yet?*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018 - RENE Track), Campobasso, Italy, 2018.

Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed.  Download PDF

Conference Software Quality Empirical Software Engineering D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, A. De Lucia.

Detecting Code Smells using Machine Learning Techniques: Are We There Yet?*

D. Di Nucci, F. Palomba, D. A. Tamburri, A. Serebrenik, A. De Lucia. Conference Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor design and implementation choices weighing heavily on the quality of produced source code. During the last decades several code smell detection tools have been proposed. However, the literature shows that the results of these tools can be subjective and are intrinsically tied to the nature and approach of the detection. In a recent work Arcelli Fontana et al. [1] proposed the use of Machine-Learning (ML) techniques for code smell detection, possibly solving the issue of tool subjectivity giving to a learner the ability to discern between smelly and non-smelly source code elements. While this work opened a new perspective for code smell detection, in the context of our research we found a number of possible limitations that might threaten the results of this study. The most important issue is related to the metric distribution of smelly instances in the used dataset, which is strongly different than the one of nonsmelly instances. In this work, we investigate this issue and our findings show that the high performance achieved in the study by Arcelli Fontana et al. was in fact due to the specific dataset employed rather than the actual capabilities of machine-learning techniques for code smell detection.

Download PDF BibTeX
@inproceedings{di2018detecting,
  title={Detecting code smells using machine learning techniques: are we there yet?},
  author={Di Nucci, Dario and Palomba, Fabio and Tamburri, Damian A and Serebrenik, Alexander and De Lucia, Andrea},
  booktitle={2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={612--621},
  year={2018},
  organization={IEEE}
}
[C23] SANER 2018

Context Is King: The Developer Perspective on the Usage of Static Analysis Tools.*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018), Campobasso, Italy, 2018.

Automatic static analysis tools (ASATs) are tools that support automatic code quality evaluation of software systems with the aim of (i) avoiding and/or removing bugs and (ii) spotting design issues. Hindering their wide-spread acceptance are their (i) high false positive rates and (ii) low comprehensibility of the generated warnings.

 Invited for the Special Issue

 Download PDF

Conference Software Quality Empirical Software Engineering C. Vassallo, S. Panichella, F. Palomba, S. Proksch, A. Zaidman, H. Gall.

Context Is King: The Developer Perspective on the Usage of Static Analysis Tools.*

C. Vassallo, S. Panichella, F. Palomba, S. Proksch, A. Zaidman, H. Gall. Conference Software Quality Empirical Software Engineering

Abstract. Automatic static analysis tools (ASATs) are tools that support automatic code quality evaluation of software systems with the aim of (i) avoiding and/or removing bugs and (ii) spotting design issues. Hindering their wide-spread acceptance are their (i) high false positive rates and (ii) low comprehensibility of the generated warnings. Researchers and ASATs vendors have proposed solutions to prioritize such warnings with the aim of guiding developers toward the most severe ones. However, none of the proposed solutions considers the development context in which an ASAT is being used to further improve the selection of relevant warnings. To shed light on the impact of such contexts on the warnings configuration, usage and adopted prioritization strategies, we surveyed 42 developers (69% in industry and 31% in open source projects) and interviewed 11 industrial experts that integrate ASATs in their workflow. While we can confirm previous findings on the reluctance of developers to configure ASATs, our study highlights that (i) 71% of developers do pay attention to different warning categories depending on the development context, and (ii) 63% of our respondents rely on specific factors (e.g., team policies and composition) when prioritizing warnings to fix during their programming. Our results clearly indicate ways to better assist developers by improving existing warning selection and prioritization strategies.

Download PDF BibTeX
@inproceedings{vassallo2018context,
  title={Context is king: The developer perspective on the usage of static analysis tools},
  author={Vassallo, Carmine and Panichella, Sebastiano and Palomba, Fabio and Proksch, Sebastian and Zaidman, Andy and Gall, Harald C},
  booktitle={2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={38--49},
  year={2018},
  organization={IEEE}
}
[C22] SANER 2018

Exploring the Integration of User Feedback in Automated Testing of Android Applications.*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018), Campobasso, Italy, 2018.

The intense competition characterizing mobile application’s marketplaces forces developers to create and maintain high-quality mobile apps in order to ensure their commercial success and acquire new users. This motivated the research community to propose solutions that automate the testing process of mobile apps.

 Invited for the Special Issue

 Download PDF

Conference Software Testing Empirical Software Engineering G. Grano, A. Ciurumelea, S. Panichella, F. Palomba, H. Gall.

Exploring the Integration of User Feedback in Automated Testing of Android Applications.*

G. Grano, A. Ciurumelea, S. Panichella, F. Palomba, H. Gall. Conference Software Testing Empirical Software Engineering

Abstract. The intense competition characterizing mobile application’s marketplaces forces developers to create and maintain high-quality mobile apps in order to ensure their commercial success and acquire new users. This motivated the research community to propose solutions that automate the testing process of mobile apps. However, the main problem of current testing tools is that they generate redundant and random inputs that are insufficient to properly simulate the human behavior, thus leaving feature and crash bugs undetected until they are encountered by users. To cope with this problem, we conjecture that information available in user reviews—that previous work showed as effective for maintenance and evolution problems—can be successfully exploited to identify the main issues users experience while using mobile applications, e.g., GUI problems and crashes. In this paper we provide initial insights into this direction, investigating (i) what type of user feedback can be actually exploited for testing purposes, (ii) how complementary user feedback and automated testing tools are, when detecting crash bugs or errors and (iii) whether an automated system able to monitor crashrelated information reported in user feedback is sufficiently accurate. Results of our study, involving 11,296 reviews of 8 mobile applications, show that user feedback can be exploited to provide contextual details about errors or exceptions detected by automated testing tools. Moreover, they also help detecting bugs that would remain uncovered when rely on testing tools only. Finally, the accuracy of the proposed automated monitoring system demonstrates the feasibility of our vision, i.e., integrate user feedback into testing process.

Download PDF BibTeX
@inproceedings{grano2018exploring,
  title={Exploring the integration of user feedback in automated testing of android applications},
  author={Grano, Giovanni and Ciurumelea, Adelina and Panichella, Sebastiano and Palomba, Fabio and Gall, Harald C},
  booktitle={2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={72--83},
  year={2018},
  organization={IEEE}
}
[C20] ICPC 2017

Developer-Related Factors in Change Prediction: An Empirical Assessment.*

25th International Conference on Program Comprehension (ICPC 2017), Buenos Aires, Argentina, 2017.

Predicting the areas of the source code having a higher likelihood to change in the future is a crucial activity to allow developers to plan preventive maintenance operations such as refactoring or peer-code reviews. In the past the research community was active in devising change prediction models based on structural metrics extracted from the source code.  Download PDF

Conference Software Quality Empirical Software Engineering G. Catolino, F. Palomba, A. De Lucia, F. Ferrucci, A. Zaidman.

Developer-Related Factors in Change Prediction: An Empirical Assessment.*

G. Catolino, F. Palomba, A. De Lucia, F. Ferrucci, A. Zaidman. Conference Software Quality Empirical Software Engineering

Abstract. Predicting the areas of the source code having a higher likelihood to change in the future is a crucial activity to allow developers to plan preventive maintenance operations such as refactoring or peer-code reviews. In the past the research community was active in devising change prediction models based on structural metrics extracted from the source code. More recently, Elish et al. showed how evolution metrics can be more efficient for predicting change-prone classes. In this paper, we aim at making a further step ahead by investigating the role of different developer-related factors, which are able to capture the complexity of the development process under different perspectives, in the context of change prediction. We also compared such models with existing change-prediction models based on evolution and code metrics. Our findings reveal the capabilities of developer-based metrics in identifying classes of a software system more likely to be changed in the future. Moreover, we observed interesting complementarities among the experimented prediction models, that may possibly lead to the definition of new combined models exploiting developer-related factors as well as product and evolution metrics.

Download PDF BibTeX
@inproceedings{catolino2017developer,
  title={Developer-related factors in change prediction: an empirical assessment},
  author={Catolino, Gemma and Palomba, Fabio and De Lucia, Andrea and Ferrucci, Filomena and Zaidman, Andy},
  booktitle={Proceedings of the 25th International Conference on Program Comprehension},
  pages={186--195},
  year={2017},
  organization={IEEE Press}
}
[C19] ICPC 2017

An Exploratory Study on the Relationship between Changes and Refactoring.*

25th International Conference on Program Comprehension (ICPC 2017), Buenos Aires, Argentina, 2017.

Refactoring aims at improving the internal structure of a software system without changing its external behavior. Previous studies empirically assessed, on the one hand, the benefits of refactoring in terms of code quality and developers’ productivity, and on the other hand, the underlying reasons that push programmers to apply refactoring.  Download PDF

Conference Software Quality Empirical Software Engineering F. Palomba, A. Zaidman, R. Oliveto, A. De Lucia.

An Exploratory Study on the Relationship between Changes and Refactoring.*

F. Palomba, A. Zaidman, R. Oliveto, A. De Lucia. Conference Software Quality Empirical Software Engineering

Abstract. Refactoring aims at improving the internal structure of a software system without changing its external behavior. Previous studies empirically assessed, on the one hand, the benefits of refactoring in terms of code quality and developers’ productivity, and on the other hand, the underlying reasons that push programmers to apply refactoring. Results achieved in the latter investigations indicate that besides personal motivation such as the responsibility concerned with code authorship, refactoring is mainly performed as a consequence of changes in the requirements rather than driven by software quality. However, these findings have been derived by surveying developers, and therefore no studies performed on the actual modifications made on software repositories have been carried out to corroborate the achieved findings. To bridge this gap, we provide a quantitative investigation on the relationship between different types of code changes (i.e., Fault Repairing Modification, Feature Introduction Modification, and General Maintenance Modification) and 28 different refactoring types coming from 3 open source projects. Results showed that developers tend to apply a higher number of refactoring operations aimed at improving maintainability and comprehensibility of the source code when fixing bugs. Instead, when new features are implemented, more complex refactoring operations are performed to improve code cohesion. Most of the times, the underlying reasons behind the application of such refactoring operations are represented by the presence of duplicate code or previously introduced self-admitted technical debts.

Download PDF BibTeX
@inproceedings{palomba2017exploratory,
  title={An exploratory study on the relationship between changes and refactoring},
  author={Palomba, Fabio and Zaidman, Andy and Oliveto, Rocco and De Lucia, Andrea},
  booktitle={2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC)},
  pages={176--185},
  year={2017},
  organization={IEEE}
}

[C18] ICSE 2017

PETrA: a Software-Based Tool for Estimating the Energy Profile of Android Applications.*

39th International Conference on Software Engineering (ICSE 2017) - Formal Tool Demo, Buenos Aires, Argentina, 2017.

Energy efficiency is a vital characteristic of any mobile application, and indeed is becoming an important factor for user satisfaction. For this reason, in recent years several approaches and tools for measuring the energy consumption of mobile devices have been proposed.  Download PDF

Conference Mobile Apps Evolution Tool Demo D. Di Nucci, F. Palomba, A. Prota, A. Panichella, A. Zaidman, A. De Lucia.

PETrA: a Software-Based Tool for Estimating the Energy Profile of Android Applications.*

D. Di Nucci, F. Palomba, A. Prota, A. Panichella, A. Zaidman, A. De Lucia. Conference Mobile Apps Evolution Tool Demo

Abstract. Energy efficiency is a vital characteristic of any mobile application, and indeed is becoming an important factor for user satisfaction. For this reason, in recent years several approaches and tools for measuring the energy consumption of mobile devices have been proposed. Hardware-based solutions are highly precise, but at the same time they require costly hardware toolkits. Model-based techniques require a possibly difficult calibration of the parameters needed to correctly create a model on a specific hardware device. Finally, software-based solutions are easier to use, but they are possibly less precise than hardware-based solution. In this demo, we present PETRA, a novel software-based tool for measuring the energy consumption of Android apps. With respect to other tools, PETRA is compatible with all the smartphones with Android 5.0 or higher. We also provide evidence that our tool is able to perform similarly to hardware-based solutions.

Download PDF BibTeX
@inproceedings{di2017petra,
  title={Petra: a software-based tool for estimating the energy profile of android applications},
  author={Di Nucci, Dario and Palomba, Fabio and Prota, Antonio and Panichella, Annibale and Zaidman, Andy and De Lucia, Andrea},
  booktitle={2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C)},
  pages={3--6},
  year={2017},
  organization={IEEE}
}
[C17] ICSE 2017

Recommending and Localizing Code Changes for Mobile Apps based on User Reviews.*

39th International Conference on Software Engineering (ICSE 2017), Buenos Aires, Argentina, 2017.

Researchers have proposed several approaches to extract information from user reviews useful for maintaining and evolving mobile apps. However, most of them just perform automatic classification of user reviews according to specific keywords (e.g., bugs, features).  Download PDF

Conference Mobile Apps Evolution Empirical Software Engineering F. Palomba, P. Salza, A. Ciurumelea, S. Panichella, H. Gall, F. Ferrucci, A. De Lucia.

Recommending and Localizing Code Changes for Mobile Apps based on User Reviews.*

F. Palomba, P. Salza, A. Ciurumelea, S. Panichella, H. Gall, F. Ferrucci, A. De Lucia. Conference Mobile Apps Evolution Empirical Software Engineering

Abstract. Researchers have proposed several approaches to extract information from user reviews useful for maintaining and evolving mobile apps. However, most of them just perform automatic classification of user reviews according to specific keywords (e.g., bugs, features). Moreover, they do not provide any support for linking user feedback to the source code components to be changed, thus requiring a manual, time-consuming, and error-prone task. In this paper, we introduce CHANGEADVISOR, a novel approach that analyzes the structure, semantics, and sentiments of sentences contained in user reviews to extract useful (user) feedback from maintenance perspectives and recommend to developers changes to software artifacts. It relies on natural language processing and clustering algorithms to group user reviews around similar user needs and suggestions for change. Then, it involves textual based heuristics to determine the code artifacts that need to be maintained according to the recommended software changes. The quantitative and qualitative studies carried out on 44 683 user reviews of 10 open source mobile apps and their original developers showed a high accuracy of CHANGEADVISOR in (i) clustering similar user change requests and (ii) identifying the code components impacted by the suggested changes. Moreover, the obtained results show that CHANGEADVISOR is more accurate than a baseline approach for linking user feedback clusters to the source code in terms of both precision (+47%) and recall (+38%).

Download PDF BibTeX
@inproceedings{palomba2017recommending,
  title={Recommending and localizing change requests for mobile apps based on user reviews},
  author={Palomba, Fabio and Salza, Pasquale and Ciurumelea, Adelina and Panichella, Sebastiano and Gall, Harald and Ferrucci, Filomena and De Lucia, Andrea},
  booktitle={Proceedings of the 39th international conference on software engineering},
  pages={106--117},
  year={2017},
  organization={IEEE Press}
}
[C16] SANER 2017

Lightweight Detection of Android-specific Code Smells: the aDoctor Project.*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017) - Tool Track, Klagenfurt, Austria, 2017.

Code smells are symptoms of poor design solutions applied by programmers during the development of software systems. While the research community devoted a lot of effort to studying and devising approaches for detecting the traditional code smells defined by Fowler, little knowledge and support is available for an emerging category of Mobile app code smells.  Download PDF

Conference Mobile Apps Evolution Tool Demo F. Palomba, D. Di Nucci, A. Panichella, A. Zaidman, A. De Lucia.

Lightweight Detection of Android-specific Code Smells: the aDoctor Project.*

F. Palomba, D. Di Nucci, A. Panichella, A. Zaidman, A. De Lucia. Conference Mobile Apps Evolution Tool Demo

Abstract. Code smells are symptoms of poor design solutions applied by programmers during the development of software systems. While the research community devoted a lot of effort to studying and devising approaches for detecting the traditional code smells defined by Fowler, little knowledge and support is available for an emerging category of Mobile app code smells. Recently, Reimann et al. proposed a new catalogue of Androidspecific code smells that may be a threat for the maintainability and the efficiency of Android applications. However, current tools working in the context of Mobile apps provide limited support and, more importantly, are not available for developers interested in monitoring the quality of their apps. To overcome these limitations, we propose a fully automated tool, coined ADOCTOR, able to identify 15 Android-specific code smells from the catalogue by Reimann et al. An empirical study conducted on the source code of 18 Android applications reveals that the proposed tool reaches, on average, 98% of precision and 98% of recall. We made ADOCTOR publicly available.

Download PDF BibTeX
@inproceedings{palomba2017lightweight,
  title={Lightweight detection of Android-specific code smells: The aDoctor project},
  author={Palomba, Fabio and Di Nucci, Dario and Panichella, Annibale and Zaidman, Andy and De Lucia, Andrea},
  booktitle={2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
  pages={487--491},
  year={2017},
  organization={IEEE}
}
[C15] SANER 2017

Software-based Energy Profiling of Android Apps: Simple, Efficient and Reliable?*

International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017), Klagenfurt, Austria, 2017.

Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile.  Download PDF

Conference Mobile Apps Evolution Empirical Software Engineering D. Di Nucci, F. Palomba, A. Prota, A. Panichella, A. Zaidman, A. De Lucia.

Software-based Energy Profiling of Android Apps: Simple, Efficient and Reliable?*

D. Di Nucci, F. Palomba, A. Prota, A. Panichella, A. Zaidman, A. De Lucia. Conference Mobile Apps Evolution Empirical Software Engineering

Abstract. Modeling the power profile of mobile applications is a crucial activity to identify the causes behind energy leaks. To this aim, researchers have proposed hardware-based tools as well as model-based and software-based techniques to approximate the actual energy profile. However, all these solutions present their own advantages and disadvantages. Hardware-based tools are highly precise, but at the same time their use is bound to the acquisition of costly hardware components. Model-based tools require the calibration of parameters needed to correctly create a model on a specific hardware device. Software-based approaches are cheaper and easier to use than hardware-based tools, but they are believed to be less precise. In this paper, we take a deeper look at the pros and cons of softwarebased solutions investigating to what extent their measurements depart from hardware-based solutions. To this aim, we propose a software-based tool named PETRA that we compare with the hardware-based MONSOON toolkit on 54 Android apps. The results show that PETRA performs similarly to MONSOON despite not using any sophisticated hardware components. In fact, the mean relative error with respect to MONSOON is always lower than 0.05. Moreover, 95% of the estimation errors are within 5% of the actual values measured using the hardware-based toolkit.

Download PDF BibTeX
@inproceedings{di2017software,
  title={Software-based energy profiling of android apps: Simple, efficient and reliable?},
  author={Di Nucci, Dario and Palomba, Fabio and Prota, Antonio and Panichella, Annibale and Zaidman, Andy and De Lucia, Andrea},
  booktitle={2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER)},
  pages={103--114},
  year={2017},
  organization={IEEE}
}
[C14] ASE 2016

An Empirical Investigation into the Nature of Test Smells.*

International Conference on Automated Software Engineering (ASE 2016), Singapore, Singapore, 2016.

Test smells have been defined as poorly designed tests and, as reported by recent empirical studies, their presence may negatively affect comprehension and consequently maintenance of test suites. Despite this, there are no available automated tools to support identification and removal of test smells.  Download PDF

Conference Software Testing Empirical Software Engineering M. Tufano, F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk.

An Empirical Investigation into the Nature of Test Smells.*

M. Tufano, F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk. Conference Software Testing Empirical Software Engineering

Abstract. Test smells have been defined as poorly designed tests and, as reported by recent empirical studies, their presence may negatively affect comprehension and consequently maintenance of test suites. Despite this, there are no available automated tools to support identification and removal of test smells. In this paper, we firstly investigate developers’ perception of test smells in a study with 19 developers. The results show that developers generally do not recognize (potentially harmful) test smells, highlighting that automated tools for identifying such smells are much needed. However, to build effective tools, deeper insights into the test smells phenomenon are required. To this aim, we conducted a large-scale empirical investigation aimed at analyzing (i) when test smells occur in source code, (ii) what their survivability is, and (iii) whether their presence is associated with the presence of design problems in production code (code smells). The results indicate that test smells are usually introduced when the corresponding test code is committed in the repository for the first time, and they tend to remain in a system for a long time. Moreover, we found various unexpected relationships between test and code smells. Finally, we show how the results of this study can be used to build effective automated tools for test smell detection and refactoring.

Download PDF BibTeX
@inproceedings{tufano2016empirical,
  title={An empirical investigation into the nature of test smells},
  author={Tufano, Michele and Palomba, Fabio and Bavota, Gabriele and Di Penta, Massimiliano and Oliveto, Rocco and De Lucia, Andrea and Poshyvanyk, Denys},
  booktitle={2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)},
  pages={4--15},
  year={2016},
  organization={IEEE}
}

[C13] ICSME 2016

Alternative Sources of Information for Code Smell Detection: Postcards from Far Away.*

International Conference on Software Maintenance and Evolution (ICSME 2016) - Doctoral Symposium, Raleight, USA, 2016.

Code smells have been defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code.  Download PDF

Conference Software Quality F. Palomba.

Alternative Sources of Information for Code Smell Detection: Postcards from Far Away.*

F. Palomba. Conference Software Quality

Abstract. Code smells have been defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code. For this reasons, several detection techniques have been proposed. Most of them rely on the analysis of the properties extractable from the source code. In the context of this work, we highlight several aspects that can possibly contribute to the improvement of the current state of the art and propose our solutions, based on the analysis on how code smells are actually introduced as well as the usefulness of historical and textual information to realize more reliable code smell detectors. Finally, we present an overview of the open issues and challenges related to code smell detection and management that the research community should focus on in the next future.

Download PDF BibTeX
@inproceedings{palomba2016alternative,
  title={Alternative sources of information for code smell detection: Postcards from far away},
  author={Palomba, Fabio},
  booktitle={2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
  pages={636--640},
  year={2016},
  organization={IEEE}
}
[C12] ICSME 2016

Smells like Teen Spirit: Improving Bug Prediction Performance Using the Intensity of Code Smells.*

International Conference on Software Maintenance and Evolution (ICSME 2016), Raleight, USA, 2016.

Code smells are symptoms of poor design and implementation choices. Previous studies empirically assessed the impact of smells on code quality and clearly indicate their negative impact on maintainability, including a higher bugproneness of components affected by code smells.  Download PDF

Conference Software Quality Empirical Software Engineering F. Palomba, M. Zanoni, F. Arcelli Fontana, A. De Lucia, R. Oliveto.

Smells like Teen Spirit: Improving Bug Prediction Performance Using the Intensity of Code Smells.*

F. Palomba, M. Zanoni, F. Arcelli Fontana, A. De Lucia, R. Oliveto. Conference Software Quality Empirical Software Engineering

Abstract. Code smells are symptoms of poor design and implementation choices. Previous studies empirically assessed the impact of smells on code quality and clearly indicate their negative impact on maintainability, including a higher bugproneness of components affected by code smells. In this paper we capture previous findings on bug-proneness to build a specialized bug prediction model for smelly classes. Specifically, we evaluate the contribution of a measure of the severity of code smells (i.e., code smell intensity) by adding it to existing bug prediction models and comparing the results of the new model against the baseline model. Results indicate that the accuracy of a bug prediction model increases by adding the code smell intensity as predictor. We also evaluate the actual gain provided by the intensity index with respect to the other metrics in the model, including the ones used to compute the code smell intensity. We observe that the intensity index is much more important as compared to other metrics used for predicting the buggyness of smelly classes.

Download PDF BibTeX
@inproceedings{palomba2016smells,
  title={Smells like teen spirit: Improving bug prediction performance using the intensity of code smells},
  author={Palomba, Fabio and Zanoni, Marco and Fontana, Francesca Arcelli and De Lucia, Andrea and Oliveto, Rocco},
  booktitle={2016 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
  pages={244--255},
  year={2016},
  organization={IEEE}
}
[C11] ISSTA 2016

Automatic Test Case Generation: What if Test Code Quality Matters?*

International Symposium on Software Testing and Analysis (ISSTA 2016), Saarbrucken, Germany, 2016.

Test case generation tools that optimize code coverage have been extensively investigated. Recently, researchers have suggested to add other non-coverage criteria, such as memory consumption or readability, to increase the practical usefulness of generated tests.  Download PDF

Conference Software Testing Empirical Software Engineering F. Palomba, A. Panichella, A. Zaidman, R. Oliveto, A. De Lucia.

Automatic Test Case Generation: What if Test Code Quality Matters?*

F. Palomba, A. Panichella, A. Zaidman, R. Oliveto, A. De Lucia. Software Testing Empirical Software Engineering

Abstract. Test case generation tools that optimize code coverage have been extensively investigated. Recently, researchers have suggested to add other non-coverage criteria, such as memory consumption or readability, to increase the practical usefulness of generated tests. In this paper, we observe that test code quality metrics, and test cohesion and coupling in particular, are valuable candidates as additional criteria. Indeed, tests with low cohesion and/or high coupling have been shown to have a negative impact on future maintenance activities. In an exploratory investigation we show that most generated tests are indeed affected by poor test code quality. For this reason, we incorporate cohesion and coupling metrics into the main loop of search-based algorithm for test case generation. Through an empirical study we show that our approach is not only able to generate tests that are more cohesive and less coupled, but can (i) increase branch coverage up to 10% when enough time is given to the search and (ii) result in statistically shorter tests.

Download PDF BibTeX
@inproceedings{palomba2016automatic,
  title={Automatic test case generation: What if test code quality matters?},
  author={Palomba, Fabio and Panichella, Annibale and Zaidman, Andy and Oliveto, Rocco and De Lucia, Andrea},
  booktitle={Proceedings of the 25th International Symposium on Software Testing and Analysis},
  pages={130--141},
  year={2016},
  organization={ACM}
}
[C10] ICPC 2016

A Textual-based Technique for Smell Detection.*

24th International Conference on Program Comprehension (ICPC 2016), Austin, USA, 2016.

In this paper, we present TACO (Textual Analysis for Code Smell Detection), a technique that exploits textual analysis to detect a family of smells of different nature and different levels of granularity.

 Invited for the Special Issue

 Download PDF

Conference Software Quality Empirical Software Engineering F. Palomba, A. Panichella, A. De Lucia, R. Oliveto, A. Zaidman.

A Textual-based Technique for Smell Detection.*

F. Palomba, A. Panichella, A. De Lucia, R. Oliveto, A. Zaidman. Conference Software Quality Empirical Software Engineering

Abstract. In this paper, we present TACO (Textual Analysis for Code Smell Detection), a technique that exploits textual analysis to detect a family of smells of different nature and different levels of granularity. We run TACO on 10 open source projects, comparing its performance with existing smell detectors purely based on structural information extracted from code components. The analysis of the results indicates that TACO’s precision ranges between 67% and 77%, while its recall ranges between 72% and 84%. Also, TACO often outperforms alternative structural approaches confirming, once again, the usefulness of information that can be derived from the textual part of code components.

Download PDF BibTeX
@inproceedings{palomba2016textual,
  title={A textual-based technique for smell detection},
  author={Palomba, Fabio and Panichella, Annibale and De Lucia, Andrea and Oliveto, Rocco and Zaidman, Andy},
  booktitle={2016 IEEE 24th international conference on program comprehension (ICPC)},
  pages={1--10},
  year={2016},
  organization={IEEE}
}
[C9] ICSME 2015

User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps.*

31st IEEE International Conference on Software Maintenance and Evolution (ICSME 2015), Bremen, Germany, 2015.

Nowadays software applications, and especially mobile apps, undergo frequent release updates through app stores. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features.  Download PDF

Conference Mobile Apps Evolution Empirical Software Engineering F. Palomba, M. Linares Vasquez, G. Bavota, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia.

User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps.*

F. Palomba, M. Linares Vasquez, G. Bavota, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia. Conference Mobile Apps Evolution Empirical Software Engineering

Abstract. Nowadays software applications, and especially mobile apps, undergo frequent release updates through app stores. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we show—by performing a study on 100 Android apps—how applications addressing user reviews increase their success in terms of rating. Specifically, we devise an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results indicate that developers implementing user reviews are rewarded in terms of ratings. This poses the need for specialized recommendation systems aimed at analyzing informative crowd reviews and prioritizing feedback to be satisfied in order to increase the apps success.

Download PDF BibTeX
@inproceedings{palomba2015user,
  title={User reviews matter! tracking crowdsourced reviews to support evolution of successful apps},
  author={Palomba, Fabio and Linares-Vasquez, Mario and Bavota, Gabriele and Oliveto, Rocco and Di Penta, Massimiliano and Poshyvanyk, Denys and De Lucia, Andrea},
  booktitle={2015 IEEE international conference on software maintenance and evolution (ICSME)},
  pages={291--300},
  year={2015},
  organization={IEEE}
}
[C8] ICSME 2015

On the Role of Developer’s Scattered Chan