Using Deep Learning Transformer Networks to Identify Symptoms Associated with COVID-19 on Twitter

Vítor Machado, Clécio R Bom, Kary Ocaña, Rafael Terra, Miriam B.F. Chaves

Resumo


This study aims to present a methodology to identify, through Twitter posts, predefined symptoms
of COVID-19 aided by Deep Learning techniques, namely Transformers Networks. The proposed approach
was evaluated on a public Twitter database in Brazilian Portuguese, using user reports of COVID-19 symptoms.
We mine the Twitter database, extract phrases with symptoms, compare distributions, and build a database to
construct high accuracy Deep Learning networks, which can be used to identify symptoms. We use a crossvalidation
procedure to evaluate the result’s performance. Additionally, we interpret the results using a Local
Interpretable Model-Agnostic Explanations (LIME) algorithm. We identified 907 tweets containing one or more
of the 14 previously chosen COVID-19 symptoms. The most frequently reported symptoms were a cough (392),
headache (154), runny nose (143), fever (124), nausea (106), and diarrhea (105) amongst users who reported at
least one symptom. The BERT architecture identified all 14 symptoms reported in Twitter phrases in Portuguese,
resulting in identifying each symptom with over 97% accuracy and over 0.95 of AUC-ROC at the test dataset.
This project is a step towards a complementary tool to identify symptoms in future automated clinical settings,
e.g., medical chatbots, to support faster clinical assessment in Portuguese.


Texto completo:

PDF

Referências


A. Reyner, W. Tjiptomongsoguno, A. Chen,

H. Sanyoto, E. Irwansyah, and B. Kanigoro,

Medical Chatbot Techniques: A Review (2020), pp. 1–11,

ISBN 978-3-030-63321-9.

L. Athota, V. K. Shukla, N. Pandey, and A. Rana,

Chatbot for Healthcare System Using Artificial Intelligence

(2020).

W.-j. Guan, Z.-y. Ni, Y. Hu, W.-h. Liang, C.-q. Ou, J.-

x. He, L. Liu, H. Shan, C.-l. Lei, D. S. Hui, et al.,

New England Journal of Medicine 382, 1708 (2020),

https://doi.org/10.1056/NEJMoa2002032, URL https://

doi.org/10.1056/NEJMoa2002032.

Y. Bai, L. Yao, T. Wei, F. Tian, D.-Y. Jin, L. Chen,

and M. Wang, JAMA 323, 1406 (2020), ISSN 0098-7484,

https://jamanetwork.com/journals/jama/articlepdf/2762028/ja

ma bai 2020 ld 200013.pdf, URL https://doi.org/10.

/jama.2020.2565.

Q. Li, X. Guan, P. Wu, X. Wang, L. Zhou, Y. Tong,

R. Ren, K. S. Leung, E. H. Lau, J. Y. Wong, et al.,

New England Journal of Medicine 382, 1199 (2020),

pMID: 31995857, https://doi.org/10.1056/NEJMoa2001316,

URL https://doi.org/10.1056/NEJMoa2001316.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova (2019),

04805.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,

A. N. Gomez, L. Kaiser, and I. Polosukhin (2017),

03762.

A. Adhikari, A. Ram, R. Tang, and J. Lin, CoRR

abs/1904.08398 (2019), 1904.08398, URL http://arxiv.

org/abs/1904.08398.

Y. Liu and M. Lapata, CoRR abs/1908.08345 (2019),

08345, URL http://arxiv.org/abs/1908.08345.

M. A. Khan, N. Hussain, A. Majid, M. Alhaisoni, B. Syed

Ahmad Chan, S. Kadry, Y. Nam, and Z. Yu-Dong, Computers,

Materials, & Continua pp. 2923–2938 (2021).

C. Shorten, T. M. Khoshgoftaar, and B. Furht, Journal of big

Data 8, 1 (2021).

T. B. Alakus and I. Turkoglu, Chaos, Solitons & Fractals 140,

(2020).

A. Sarker, S. Lakamana, W. Hogg-Bremer, A. Xie, M. A.

Al-Garadi, and Y.-C. Yang, Journal of the American

Medical Informatics Association 27, 1310 (2020), ISSN

-974X, https://academic.oup.com/jamia/articlepdf/

/8/1310/34153333/ocaa116.pdf, URL https:

//doi.org/10.1093/jamia/ocaa116.

D. Kumar, N. Kumar, and S. Mishra,

NLP@NISER: Classification of COVID19 tweets containing

symptoms (2021), URL https://aclanthology.org/

smm4h-1.19.

T. e. a. Nadarzynski1, Digital Health 5, 1 (2019).

A. Valdes, J. Lopez, and M. Montes, in

Proceedings of the Sixth Social Media Mining for Health

(#SMM4H) Workshop and Shared Task (Association for

Computational Linguistics, Mexico City, Mexico, 2021), pp.

–68, URL https://aclanthology.org/2021.smm4h-1.

Y. Luo, L. Pereira, and K. Ichiro, in

Proceedings of the Sixth Social Media Mining for Health

(#SMM4H) Workshop and Shared Task (Association for

Computational Linguistics, Mexico City, Mexico, 2021),

pp. 123–125, URL https://aclanthology.org/2021.

smm4h-1.25.

T. Mackey, V. Purushothaman, J. Li, N. Shah, M. Nali,

C. Bardier, B. Liang, M. Cai, and R. Cuomo, JMIR Public

Health Surveill 6, e19509 (2020), ISSN 2369-2960, URL

http://publichealth.jmir.org/2020/2/e19509/.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning

(MIT press, 2016).

S. Lai, L. Xu, K. Liu, and J. Zhao,

Recurrent convolutional neural networks for text classification

(2015), URL https://www.aaai.org/ocs/index.php/

AAAI/AAAI15/paper/view/9745/9552.

X. Liu, K. Duh, L. Liu, and J. Gao, CoRR abs/2008.07772

(2020), 2008.07772, URL https://arxiv.org/abs/2008.

I. Lopez-Gazpio, M. Maritxalar, M. Lapata, and E. Agirre,

Expert Systems with Applications 132, 1 (2019), ISSN 0957-

, URL https://www.sciencedirect.com/science/

article/pii/S0957417419302842.

S. Gonz´alez-Carvajal and E. C. Garrido-Merch´an, CoRR

abs/2005.13012 (2020), 2005.13012, URL https://arxiv.

org/abs/2005.13012.

W. Chang, H. Yu, K. Zhong, Y. Yang, and I. S. Dhillon, CoRR

abs/1905.02331 (2019), 1905.02331, URL http://arxiv.

org/abs/1905.02331.

Z. Gao, A. Feng, X. Song, and X.Wu, IEEE Access 7, 154290

(2019).

H. T. Madabushi, E. Kochkina, and M. Castelle, CoRR

abs/2003.11563 (2020), 2003.11563, URL https://arxiv.

org/abs/2003.11563.

F. Souza, R. Nogueira, and R. Lotufo, arXiv preprint

arXiv:1909.10649 (2019), URL http://arxiv.org/abs/

10649.

M. T. Ribeiro, S. Singh, and C. Guestrin,

”why should i trust you?”: Explaining the predictions of any

classifier (2016), 1602.04938.