Otwarty dostęp

Public Reaction to Scientific Research via Twitter Sentiment Prediction


Zacytuj

Figure 1

Number of tweets related to research articles for the years 2011–2017.
Number of tweets related to research articles for the years 2011–2017.

Figure 2

Number of tweets for each Scopus subject.
Number of tweets for each Scopus subject.

Figure 3

Number of articles for each Scopus subject.
Number of articles for each Scopus subject.

Figure 4

Correlation matrix of features with two class labels – case 4.
Correlation matrix of features with two class labels – case 4.

Figure 5

Performance of classification models with two class labels – case 4.
Performance of classification models with two class labels – case 4.

Figure 6

Important features for two-class label classification.
Important features for two-class label classification.

Figure 7

Correlation matrix of features with three class labels – case 4.
Correlation matrix of features with three class labels – case 4.

Figure 8

Performance of classification models with three class labels – case 4.
Performance of classification models with three class labels – case 4.

Figure 9

Important features for three-class label classification.
Important features for three-class label classification.

Best results for cases 1–3 with two-class labels.

Dataset A: Tweets with article's titles

Case Number Model Accuracy F-1 Score
1 Random Forest 0.81 0.81
2 Random Forest 0.83 0.83
3 Random Forest 0.85 0.85

Sentiment distribution of articles using SentiStrength and Sentiment140 libraries.

Sentiment library Metric for multiple sentiments Number of positive sentiments Number of negative sentiments Number of neutral sentiments
SentiStrength mean 11,443 (≈ 7.7%) 31,212 (≈ 21%) 106,057 (≈ 71.3%)
SentiStrength median 14,905 (≈ 10%) 39,091 (≈ 26.3%) 94,716 (≈ 63.7%)
Sentiment140 mean 3,528 (≈ 2.4%) 6,254 (≈ 4.2%) 138,930 (≈ 93.4%)
Sentiment140 median 3,544 (≈ 2.4%) 3,168 (≈ 2.1%) 142,000 (≈ 95.5%)

Best results for cases 1–3 with three labels.

Dataset A: Tweets with article's titles

Case Number Model Accuracy F-1 Score
1 Random Forest 0.46 0.46
2 Random Forest 0.49 0.45
3 Random Forest 0.68 0.66

Sentiments on dataset B using different libraries and metrics.

Experiment Sentiment library Metric for multiple sentiments Number of positive sentiments Number of negative sentiments Number of neutral sentiments
case 1 VADER mean 44,866 (≈ 42.4%) 26,664 (≈ 25.1%) 34,304 (≈ 32.4%)
case 2 VADER median 38,038 (≈ 35.9%) 23,124 (≈ 21.8%) 44,672 (≈ 42.2%)
case 3 TextBlob mean 54,169 (≈ 51.1%) 11,841 (≈ 11.1%) 39,824 (≈ 37.6%)
case 4 TextBlob median 45,254 (≈ 42.7%) 9,551 (≈ 9%) 51,029 (≈ 48.2%)

Results of the regression models.

Dataset A: Tweets with article's titles

Model Mean Squared Error R-Squared
Multiple Linear Regression 0.091 0.008
Decision Tree 0.189 −1.051
Random Forest 0.104 −0.130
Support Vector Regression 0.093 −0.014

Segregation of sentiments score.

Score range Sentiment
[−1,0) Negative
0 Neutral
(0,1] Positive

Examples of sentiment label assignment.

Article 1st Tweet and Sentiment 2nd Tweet and Sentiment 3rd Tweet and Sentiment Mean of tweets’ sentiment Final sentiment class label
Article 1 Researchers in Norway investigate mortality risk of individuals after the death of a spouse (−0.7184) Can you die of a broken heart? If your spouse dies, your death risk substantially increases (−0.9186) A sad study: spouses much more likely to die after being widowed (−0.885) −0.8407 Negative
Article 2 Presentation of the ABC Best Paper Award 2013 to Sherrie Elzey. Read the winning paper (0.9022) ABC Best Paper Award 2013 goes to lead authors Sherrie Elzey and De-Hao Tsai. Read their article for free (0.9001) NA 0.90115 Positive
Article 3 Latest article from our research team has been published about using School Function Assessment! (0) Article on using School Function Assessment now online (0) NA 0 Neutral

Selected features from the Altmetrics dataset.

Feature Description
Scopus subject Subject of a research article.
Article title Title of a research article.
Article abstract Abstract of a research article.
Abstract length Number of words in the abstract of a research paper.
Follower count Number of followers a Twitter user has.
Author count Number of authors credited on the research article.
Tweet Tweet about a research article.

Derived features from the dataset.

Original feature Derived feature Description
Article title Title sentiment Sentiment score of the title of a research article.
Article abstract Abstract sentiment Sentiment score of a research article abstract.
Follower count Tweet reach The mean number of followers of each user who tweeted about the research article (i.e. one article can be tweeted by many users, who may differ from each other in the number of followers they have).
Tweet Tweet sentiment Sentiment score of a tweet related to a research article.

Sentiments on dataset A using different libraries and metrics.

Experiment Sentiment library Metric for multiple sentiments Number of positive sentiments Number of negative sentiments Number of neutral sentiments
case 1 VADER mean 55,833 (≈ 37.5%) 37,957 (≈ 25.5%) 54,922 (≈ 36.9%)
case 2 VADER median 45,606 (≈ 30.6%) 32,754 (≈ 22%) 70,352 (≈ 47.3%)
case 3 TextBlob mean 67,035 (≈ 45%) 16,881 (≈ 11.3%) 64,796 (≈ 43.6%)
case 4 TextBlob median 53,466 (≈ 36%) 13,748 (≈ 9.2%) 81,498 (≈ 54.8%)

Top 25 positive and negative words in title, abstract, and tweets of research articles.

Title Abstract Tweets



Positive Negative Positive Negative Positive Negative
best boring awesome awful awesome awful
delicious devastating best bleak best bleak
excellent disgusting delicious boring breathtaking boring
greatest evil excellent cruel delicious cruel
perfect grim exquisite devastating delightful devastating
superb vicious flawless disgusted excellent disgusting
wonderful worst greatest dreadful exquisite dreadful
brilliant fearful impressed evil greatest evil
ideal repellent legendary grim impressed grim
incredible retard magnificent gruesome legendary gruesome
beautiful base marvelous horrible magnificent horrible
splendid bloody masterful horrific marvelous horrific
attractive doubtful perfect hysterical masterful hysterical
experienced filthy superb insane perfect insane
expressive grief wonderful insulting priceless insulting
favored hate artesian menacing superb miserable
great violent brilliant outrageous wonderful nasty
happy stupid ideal ruthless brilliant outrageous
intelligent tragic incredible shocking ideal pathetic
joy sick beautiful terrible incredible shocking
proud anger attractive terrifying beautiful terrible
uncommon crude brave vicious splendid terrifying
unforgettable frustrated elect worst attractive vicious
win painful experienced fearful brave worst
remarkable shocked expressive hated elect fearful
eISSN:
2543-683X
Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining