1. Identity statement | |
Reference Type | Journal Article |
Site | mtc-m21c.sid.inpe.br |
Holder Code | isadg {BR SPINPE} ibi 8JMKD3MGPCW/3DT298S |
Identifier | 8JMKD3MGP3W34R/43BQLBH |
Repository | sid.inpe.br/mtc-m21c/2020/10.02.16.01 (restricted access) |
Last Update | 2020:10.02.16.01.04 (UTC) simone |
Metadata Repository | sid.inpe.br/mtc-m21c/2020/10.02.16.01.04 |
Metadata Last Update | 2022:01.04.01.35.26 (UTC) administrator |
DOI | 10.1016/j.infsof.2020.106395 |
ISSN | 0950-5849 |
Citation Key | WatanabeFeCaSoCaVi:2020:ReEfSo |
Title | Reducing efforts of software engineering systematic literature reviews updates using text classification ![](http://mtc-m21c.sid.inpe.br/col/dpi.inpe.br/banon/2000/01.23.20.24/doc/externalLink.gif) |
Year | 2020 |
Month | Dec. |
Access Date | 2024, July 26 |
Type of Work | journal article |
Secondary Type | PRE PI |
Number of Files | 1 |
Size | 1242 KiB |
|
2. Context | |
Author | 1 Watanabe, Willian Massami 2 Felizardo, Katia Romero 3 Candido Júnior, Arnaldo Candido 4 Souza, Érica Ferreira de 5 Campos Neto, José Ede de 6 Vijaykumar, Nandamudi Lankalapalli |
Resume Identifier | 1 2 3 4 5 6 8JMKD3MGP5W/3C9JHTU |
Group | 1 2 3 4 5 6 LABAC-COCTE-INPE-MCTIC-GOV-BR |
Affiliation | 1 Universidade Tecnológica Federal do Paraná (UTFPR) 2 Universidade Tecnológica Federal do Paraná (UTFPR) 3 Universidade Tecnológica Federal do Paraná (UTFPR) 4 Universidade Tecnológica Federal do Paraná (UTFPR) 5 Universidade Tecnológica Federal do Paraná (UTFPR) 6 Instituto Nacional de Pesquisas Espaciais (INPE) |
Author e-Mail Address | 1 wwatanabe@utfpr.edu.br 2 katiascannavino@utfpr.edu.br 3 arnaldoc@utfpr.edu.br 4 ericasouza@utfpr.edu.br 5 6 vijay.nl@inpe.br |
Journal | Information and Software Technology |
Volume | 128 |
Pages | e106395 |
Secondary Mark | A2_MEDICINA_I A2_CIÊNCIA_DA_COMPUTAÇÃO B1_INTERDISCIPLINAR B2_SOCIOLOGIA |
History (UTC) | 2020-10-02 16:01:58 :: simone -> administrator :: 2020 2022-01-04 01:35:26 :: administrator -> simone :: 2020 |
|
3. Content and structure | |
Is the master or a copy? | is the master |
Content Stage | completed |
Transferable | 1 |
Content Type | External Contribution |
Version Type | publisher |
Keywords | Systematic literature review SLR Automatic selection Review update Text classification Document classification Text categorization |
Abstract | Context: Systematic Literature Reviews (SLRs) are frequently used to synthesize evidence in Software Engineering (SE), however replicating and keeping SLRs up-to-date is a major challenge. The activity of studies selection in SLR is labor intensive due to the large number of studies that must be analyzed. Different approaches have been investigated to support SLR processes, such as: Visual Text Mining or Text Classification. But acquiring the initial dataset is time-consuming and labor intensive. Objective: In this work, we proposed and evaluated the use of Text Classification to support the studies selection activity of new evidences to update SLRs in SE. Method: We applied Text Classification techniques to investigate how effective and how much effort could be spared during the studies selection phase of an SLR update. Considering the SLRs update scenario, the studies analyzed in the primary SLR could be used as a classified dataset to train Supervised Machine Learning algorithms. We conducted an experiment with 8 Software Engineering SLRs. In the experiments, we investigated the use of multiple preprocessing and feature extraction tasks such as tokenization, stop words removal, word lemmatization, TF-IDF (Term-Frequency/Inverse-Document-Frequency) with Decision Tree and Support Vector Machines as classification algorithms. Furthermore, we configured the classifier activation threshold for maximizing Recall, hence reducing the number of Missed selected studies. Results: The techniques accuracies were measured and the results achieved on average a F-Score of 0.92 and 62% of exclusion rate when varying the activation threshold of the classifiers, with a 4% average number of Missed selected studies. Both the Exclusion rate and number of Missed selected studies were significantly different when compared to classifier which did not use the configuration of the activation threshold. Conclusion: The results showed the potential of the techniques in reducing the effort required of SLRs updates. |
Area | COMP |
Arrangement | urlib.net > BDMCI > Fonds > Produção anterior à 2021 > LABAC > Reducing efforts of... |
doc Directory Content | access |
source Directory Content | there are no files |
agreement Directory Content | |
|
4. Conditions of access and use | |
Language | en |
Target File | watanabe_reducing.pdf |
User Group | simone |
Reader Group | administrator simone |
Visibility | shown |
Read Permission | deny from all and allow from 150.163 |
Update Permission | not transferred |
|
5. Allied materials | |
Next Higher Units | 8JMKD3MGPCW/3ESGTTP |
Citing Item List | sid.inpe.br/bibdigital/2013/09.22.23.14 4 sid.inpe.br/mtc-m21/2012/07.13.14.56.50 3 |
Host Collection | urlib.net/www/2017/11.22.19.04 |
|
6. Notes | |
Empty Fields | alternatejournal archivingpolicy archivist callnumber copyholder copyright creatorhistory descriptionlevel dissemination e-mailaddress format isbn label lineage mark mirrorrepository nextedition notes number orcid parameterlist parentrepositories previousedition previouslowerunit progress project rightsholder schedulinginformation secondarydate secondarykey session shorttitle sponsor subject tertiarymark tertiarytype url |
|
7. Description control | |
e-Mail (login) | simone |
update | |
|