Friday, February 15, 2019

Follow the white robot - Exploring retweets of Austrian politicians with Botometer in R

botometer_publish.utf8.md

Hi folks!

I guess you are aware that social media bots become more and more relevant for politics. Those bots are mainly used to influence voters by systematically spreading misinformation aka fake news. If you want to know more about this topic in general, then this Scientific American article is a good starting point. Living in Austria, I want to explore a little whether our politicians can be associated with any bots. To do so, we will look at the twitter accounts of one top politician per party. From the administration, I have selected our chancellor Sebastian Kurz @sebastiankurz for the ÖVP and the vice-chancellor H.C. Strache @HCStracheFP for the FPÖ. Those two were easy, but picking good representatives of the parties in the opposition was a little harder because there has been quite a lot of change in the top positions. Since the new chairperson of the SPÖ, P. Rendi-Wagner, is not very active on twitter we will use the managing director, Thomas Drozda @thomasdrozda, instead. For the GRÜNE I have chosen the most well known Grünen at the moment, our president, A. Van der Bellen @vanderbellen. For the NEOS we will use their new head, Beate Meinl-Reisinger @BMeinl, and for JETZT their founder, Peter Pilz @Peter_Pilz.

With the help of the program Botometer, which was developed at Indiana University and scores twitter accounts between 0 (surely a human) and 5 (surely a bot), we will check, whether

  1. bots supported the politicians and
  2. politicians supported bots

by retweeting.



If you are only here for the juicy differences between politicians, then you can stop reading right now. In a nutshell, I have not found any association between their retweets and bots. However, if you are here for seeing how I have come to this conclusion, then please: read on!

Preparations

Let’s start by loading some packages we will need.

library(tidyverse)
library(magrittr)
library(twitteR)

Next, we need access to the Twitter and the Botometer API.

Connect R to Twitter

To retrieve tweets you a) need a Twitter account and b) register yourself as a Twitter developer and create an app.

After filling out some basic information about your app (name and how do you plan to use it) you can get the needed OAuth credentials from Keys and Access Tokens.

Assign your credentials accordingly:

consumer_key <- "your_consumer_key"
consumer_secret <- "your_consumer_secret"
access_token <- "your_access_token"
access_secret <- "your_access_secret"

And use them to connect to Twitter:

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
## [1] "Using direct authentication"

Connect R to Botometer

We will use the R client library botcheck provided by Joey Marshall to access the Botometer API.

Let’s start by installing the package.

devtools::install_github("marsha5813/botcheck")

Next, we load the package …

library(botcheck)

… and its dependencies.

library(httr)
library(xml2)
library(RJSONIO)

Then, we need to set the mashape key (the Botometer API is hosted at mashape - you can get your key after signing up for free at https://market.mashape.com/)

Mashape_key = "your_mashape_key"

Finally, we connect our twitter app for botcheck.

myapp = oauth_app("twitter", key=consumer_key, secret=consumer_secret)
sig = sign_oauth1.0(myapp, token=access_token, token_secret=access_secret)

Let’s try if it worked with my twitter handle (I am human, so my score should be close to 0).

botcheck("b_piskernik")
## [1] 0.7565855

Seems human enough.

Get the Tweets

Next, we retrieve the tweet timeline of the selected politicians.

Let’s get the data.

(dat_polit <-
  ## First, we enter their twitter-handles ...
  tibble(handle =
          c("sebastiankurz",
            "HCStracheFP",
            "thomasdrozda",
            "vanderbellen",
            "BMeinl",
            "Peter_Pilz"
            )) %>%
  mutate(
    ## ... next, we get the user profiles ...
    user = map(handle, getUser),
    ## ... and finally retrieve their timelines.
    tweets = map(user, userTimeline,
                 n=3200, ## max number of tweets
                 includeRts = T, ## include retweets
                 excludeReplies = T ## ignore replies
                 )
  ))
## # A tibble: 6 x 3
##   handle        user       tweets
##   <chr>         <list>     <list>
## 1 sebastiankurz <S4: user> <list [2,799]>
## 2 HCStracheFP   <S4: user> <list [3,175]>
## 3 thomasdrozda  <S4: user> <list [1,396]>
## 4 vanderbellen  <S4: user> <list [1,972]>
## 5 BMeinl        <S4: user> <list [1,172]>
## 6 Peter_Pilz    <S4: user> <list [2,551]>

Seemingly HC Strache was more active than the others, but even their activity should suffice.

In the next step, we transform the data into a more usable state.

dat_tweets <- dat_polit %>%
  mutate(
    tweets_df = map(tweets, twListToDF)
  ) %>%
   unnest(tweets_df)

Let’s take a glimpse at the result.

dat_tweets %>% glimpse()
## Observations: 13,065
## Variables: 17
## $ handle        <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ text          <chr> "RT @k_edtstadler: Hier können Sie alle Maßnahmen …
## $ favorited     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ favoriteCount <dbl> 0, 0, 0, 63, 0, 0, 114, 0, 0, 0, 0, 0, 0, 0, 0, 30…
## $ replyToSN     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ created       <dttm> 2019-02-13 17:49:20, 2019-02-13 17:49:18, 2019-02…
## $ truncated     <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FAL…
## $ replyToSID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ id            <chr> "1095741729297227776", "1095741719620931584", "109…
## $ replyToUID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ statusSource  <chr> "<a href=\"http://twitter.com/download/iphone\" re…
## $ screenName    <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ retweetCount  <dbl> 1, 1, 2, 19, 9, 8, 26, 14, 16, 50, 32, 20, 15, 11,…
## $ isRetweet     <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, …
## $ retweeted     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ longitude     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ latitude      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Visualization data

Now, that we have the data in a suitable format, the last preparatory step is to create a tibble with the names and colors of the politicians. This will come in handy for later visualizations.

(polit_info <- tibble(
    handle = dat_polit$handle,
    name = c(
      "Sebastian Kurz",
      "HC Strache",
      "Thomas Drozda",
      "A. Van der Bellen",
      "Beate Meinl-Reisinger",
      "Peter Pilz"),
    col = c(
    "#63C3D0",
    "#165D99",
    "#E31E2D",
    "#51A51E",
    "#D11E68",
    "#CEC234"
    )
  ) %>%
    ## convert name to factor to keep order
    mutate(name = as_factor(name))
)
## # A tibble: 6 x 3
##   handle        name                  col
##   <chr>         <fct>                 <chr>
## 1 sebastiankurz Sebastian Kurz        #63C3D0
## 2 HCStracheFP   HC Strache            #165D99
## 3 thomasdrozda  Thomas Drozda         #E31E2D
## 4 vanderbellen  A. Van der Bellen     #51A51E
## 5 BMeinl        Beate Meinl-Reisinger #D11E68
## 6 Peter_Pilz    Peter Pilz            #CEC234

With this helper-data at the ready we move to the analysis.

Analysis

Are the politicians human?

Before we test whether the ones, who spread the tweets of our beloved representatives, are humans, let’s check whether the politicians qualify as humans themselves.

polit_info %>%
  mutate(
    human_bot = map_dbl(handle, botcheck)
  )
## # A tibble: 6 x 4
##   handle        name                  col     human_bot
##   <chr>         <fct>                 <chr>       <dbl>
## 1 sebastiankurz Sebastian Kurz        #63C3D0    0.0355
## 2 HCStracheFP   HC Strache            #165D99    0.0533
## 3 thomasdrozda  Thomas Drozda         #E31E2D    0.0301
## 4 vanderbellen  A. Van der Bellen     #51A51E    0.0355
## 5 BMeinl        Beate Meinl-Reisinger #D11E68    0.0533
## 6 Peter_Pilz    Peter Pilz            #CEC234    0.0418

All of them have scores close to zero, so Botometer is confident that whoever operates those accounts are humans.

Do bots support the politicians?

We will take the entry with most retweets per politician and check the human/bot status of those who retweeted it.

dat_rt_top <- dat_tweets %>%
  ## get the most retweets per handle
  group_by(handle) %>%
  dplyr::filter(!isRetweet) %>%
  top_n(1, retweetCount)

Let’s have a look at it:

dat_rt_top %>%
  select(handle, text, created, retweetCount)
## # A tibble: 6 x 4
## # Groups:   handle [6]
##   handle    text                           created             retweetCount
##   <chr>     <chr>                          <dttm>                     <dbl>
## 1 sebastia… El régimen de #Maduro se ha n… 2019-02-04 09:13:29         5871
## 2 HCStrach… Italiens Innenminister @matte… 2019-01-25 10:19:53          775
## 3 thomasdr… Diese Foto ist offensichtlich… 2019-01-17 15:05:57          195
## 4 vanderbe… Ich freue mich sehr über die … 2018-10-17 09:42:48         2008
## 5 BMeinl    Nicht so ideal der Überschrif… 2018-09-21 11:13:54          232
## 6 Peter_Pi… Regierung - Stillstand - @seb… 2017-10-01 08:03:41          145

Hm, the numbers are quite different and, unfortunately, that is a problem. In the next step we would look up the retweeters, but retweeters() returns no more than the last 100. However, bots react automatically and therefore probably faster than most human twitter users. Accordingly, the proportion of bots in the last 100 retweets out of several thousand should be lower than out of a total sample not larger than a few hundred (that is just a hypothesis of mine and might be wrong - if you test it, please let me know the result).

So let’s see if we can find a set of tweets that is more suitable for comparison.

dat_rt_comp <- dat_tweets %>%
  dplyr::filter(
    !isRetweet,
    retweetCount >=100,
    retweetCount < 150
    ) %>%
  group_by(handle) %>%
  top_n(1, created)

Now, we have limited our selection to tweets with 100 to 150 retweets and selected the most recent ones. Let’s have a look at them:

dat_rt_comp %>%
  select(handle, text, created, retweetCount)
## # A tibble: 6 x 4
## # Groups:   handle [6]
##   handle    text                           created             retweetCount
##   <chr>     <chr>                          <dttm>                     <dbl>
## 1 sebastia… Ich möchte mein tiefempfunden… 2019-02-07 09:45:18          101
## 2 HCStrach… Wir wünschen Euch noch einen … 2018-12-24 21:05:29          104
## 3 thomasdr… „Herbert Kickl muss gehen, un… 2019-01-25 14:00:29          109
## 4 vanderbe… "#HolocaustMemorialDay \nUnse… 2019-01-27 13:05:23          109
## 5 BMeinl    "Die FPÖ stolpert von einer a… 2019-01-30 21:50:38          146
## 6 Peter_Pi… Regierung - Stillstand - @seb… 2017-10-01 08:03:41          145

Next, we get the retweets, …

dat_rt <- dat_rt_comp %>%
  mutate(
    ## get retweet ids
    rt_id = map(id, retweeters, n = 100)
  ) %>%
  unnest(rt_id)

… take a glimpse at the result, …

dat_rt %>% glimpse()
## Observations: 563
## Variables: 18
## Groups: handle [6]
## $ handle        <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ text          <chr> "Ich möchte mein tiefempfundenes Mitgefühl der Fam…
## $ favorited     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ favoriteCount <dbl> 469, 469, 469, 469, 469, 469, 469, 469, 469, 469, …
## $ replyToSN     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ created       <dttm> 2019-02-07 09:45:18, 2019-02-07 09:45:18, 2019-02…
## $ truncated     <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ replyToSID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ id            <chr> "1093445589512146944", "1093445589512146944", "109…
## $ replyToUID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ statusSource  <chr> "<a href=\"http://twitter.com/download/iphone\" re…
## $ screenName    <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ retweetCount  <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, …
## $ isRetweet     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ retweeted     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ longitude     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ latitude      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ rt_id         <chr> "1063042812269289472", "880200336232919040", "1056…

… and then look up the users.

## we use lookupUsers() to reduce the API call load
dat_rt_user <- lookupUsers(dat_rt$rt_id) %>%
  twListToDF() %>%
  as_tibble() %>%
  dplyr::rename(rt_id = id) 

By sending their screen names to Botometer we get their human/bot-scores. Note: this will probably take some time.

## create botcheck-wrapper to avoid NULL-return
botcheck_save <- function(x){
  res <- botcheck(x)
  ifelse(is.double(res),
          res,
          NA_real_)
}

dat_rt_bot <- dat_rt_user %>%
  mutate(human_bot = map_dbl(screenName, botcheck_save))

Let’s have a look at the result.

dat_rt_bot %>%
  select(screenName, name, location, human_bot)
## # A tibble: 458 x 4
##    screenName    name                location               human_bot
##    <chr>         <chr>               <chr>                      <dbl>
##  1 PolitikerAT   Politiker Vergleich Wien, Österreich          0.918
##  2 CrusilleauL   Matteo Salvini LEGA MILAN / ROME ( ITALIE)    0.194
##  3 BUNDgeg_Hass  #KATHOLISCH         ""                        0.0859
##  4 AngronIsAngry AngronIsAngry       Not Here                  0.0327
##  5 EugenPlesz    Eugen Plesz         Berlin, Deutschland       0.253
##  6 TenevaNina    Nina Teneva         Bulgaria                  0.253
##  7 Omi1937       OmiG                London                    0.117
##  8 MehDem7       MehdiDeğirmenci     Istanbul, Türkei          0.0492
##  9 osthollandia  osthollandia        ""                        0.0734
## 10 juzzl2        17.                 Tokio                     0.157
## # … with 448 more rows

Finally, we combine our data.

(dat_quest1 <- polit_info %>%
  full_join(dat_rt %>%
              select(handle, rt_id)) %>%
  full_join(dat_rt_bot %>%
              select(rt_id, human_bot)))
## Joining, by = "handle"
## Joining, by = "rt_id"
## # A tibble: 563 x 5
##    handle        name           col     rt_id               human_bot
##    <chr>         <fct>          <chr>   <chr>                   <dbl>
##  1 sebastiankurz Sebastian Kurz #63C3D0 1063042812269289472   NA
##  2 sebastiankurz Sebastian Kurz #63C3D0 880200336232919040     0.918
##  3 sebastiankurz Sebastian Kurz #63C3D0 1056942381457723392    0.194
##  4 sebastiankurz Sebastian Kurz #63C3D0 709677999990378496     0.0859
##  5 sebastiankurz Sebastian Kurz #63C3D0 1695839551             0.0327
##  6 sebastiankurz Sebastian Kurz #63C3D0 1027875113813848064    0.253
##  7 sebastiankurz Sebastian Kurz #63C3D0 958411854132469760     0.253
##  8 sebastiankurz Sebastian Kurz #63C3D0 471043083              0.117
##  9 sebastiankurz Sebastian Kurz #63C3D0 3303063784             0.0492
## 10 sebastiankurz Sebastian Kurz #63C3D0 990242146174332928     0.0734
## # … with 553 more rows

Before we look at the data we got, let’s check how much we did not.

dat_quest1 %>%
  group_by(name) %>%
  summarize(
    `% missing` = round(mean(is.na(human_bot)*100), digits = 1)
  ) 
## # A tibble: 6 x 2
##   name                  `% missing`
##   <fct>                       <dbl>
## 1 Sebastian Kurz               18.1
## 2 HC Strache                   26.4
## 3 Thomas Drozda                11.7
## 4 A. Van der Bellen            11.8
## 5 Beate Meinl-Reisinger        12.6
## 6 Peter Pilz                    4.2

Hm, missing values are probably mostly due to private accounts, which cannot be retrieved by lookupUsers() if you are not a friend of the particular account. Overall the missing rate is what I would expect in a random sample of Twitter users, except for the retweeters of HC Strache. Let’s check whether this deviation is explainable by chance.

dat_quest1 %>%
  mutate(
    missing = is.na(human_bot)
  ) %$%
  chisq.test(name, missing)
##
##  Pearson's Chi-squared test
##
## data:  name and missing
## X-squared = 21.468, df = 5, p-value = 0.0006607

With p = 6.607411610^{-4} chance seems not very likely. The high missing rate might be a reaction to the exposure of several FPÖ operatives (but better call them isolated cases) for tweeting or posting racist and rabble-rousing rubbish. Changing the account to private and sticking with one’s kind is probably good protection against further revelation to the public. Of course, this is just a guess, and maybe there is another reason for the large number of friends-only accounts. Anyways, I doubt that the hidden accounts are bots because that would decrease their effectiveness. So for the topic today - bot or human - they are probably no problem.

OK, next we check who (bot or human) retweeted the tweets of our politicians.

dat_quest1 %>%
  group_by(name) %>%
  summarize(
    mean = mean(human_bot, na.rm = T),
    `% > 2.5` = mean(human_bot > 2.5, na.rm = T)*100
  )
## # A tibble: 6 x 3
##   name                   mean `% > 2.5`
##   <fct>                 <dbl>     <dbl>
## 1 Sebastian Kurz        0.200         0
## 2 HC Strache            0.269         0
## 3 Thomas Drozda         0.266         0
## 4 A. Van der Bellen     0.276         0
## 5 Beate Meinl-Reisinger 0.231         0
## 6 Peter Pilz            0.223         0

Seeing the results, it seems highly likely that the twitter-supporters are all humans. The mean human/bot-score is very low in general, and not a single retweeter scored higher than 2.5

Do the politicians support bots?

Next, we check the human/bot-score of the accounts that got retweeted by the politicians.

Let’s start by comparing the rt-rates in the data-set.

dat_tweets %>%
  group_by(handle) %>%
  summarize(
    `# of rt` = sum(isRetweet, na.rm = T),
    `% rt` = round(mean(isRetweet, na.rm = T)*100, digits = 1)
  )
## # A tibble: 6 x 3
##   handle        `# of rt` `% rt`
##   <chr>             <int>  <dbl>
## 1 BMeinl              666   56.8
## 2 HCStracheFP          93    2.9
## 3 Peter_Pilz         1548   60.7
## 4 sebastiankurz      1298   46.4
## 5 thomasdrozda        283   20.3
## 6 vanderbellen        332   16.8

OK, the rt-rate differs quite tremendously between the politicians. While chancellor Kurz, Ms Meinl-Reisinger, and Mr Pilz retweet a lot, Mr. Strache mainly tweets original content.

Before we look at the human/bot-scores of the accounts that got retweets from politicians we have to prepare our data a little. We start by limiting our data to retweets, extracting the user-name of the original account, and merging the data with polit_info.

dat_retweets <- dat_tweets %>%
  filter(isRetweet) %>%
  mutate(
    rt_screenname = str_replace(text,
                                "RT @([[:alnum:]_]+):\\s[[:alpha:][:print:][:control:]]*",
                                "\\1")
  ) %>%
  full_join(polit_info)
## Joining, by = "handle"

Next, we extract all retweeters, fill in the already known human/bot-scores, and retrieve the still missing human/bot-scores.

Note: We use the fact that there is some reciprocity when it comes to retweets and fill in the human/bot-scores we had already retrieved when we were checking who retweeted the tweets of the politicians. This saves us some time because Botometer is not too fast and furthermore has a daily limit of 2000 checks. Still, we need to send a large number of requests to Botometer, so this will take time.

dat_retweeters <- dat_retweets %>%
  select(rt_screenname) %>%
  distinct() %>%
  left_join(dat_rt_bot %>%
              select(screenName,human_bot),
            by = c("rt_screenname" = "screenName")
            ) %>%
  mutate(
    human_bot = map2_dbl(human_bot, rt_screenname,
                         function(x,y) ifelse(
                           is.na(x),
                           botcheck_save(y),
                           x
                         ))
    )

Let’s get a quick impression on the overall humanness.

dat_retweeters %>%
  ggplot(aes(x=human_bot)) +
    geom_density() +
    scale_x_continuous(name="human/bot-score") +
    theme_classic()

OK, if it was just for the answer and not for the way to get it, then we could stop right now. The human/bot-score ranges from 0 to 5, but we see not a single value larger than 1. In fact, most are around 0.1. Without any further analyses, we can conclude that the politicians did not retweet tweets of bots. However, this is no data-journalism blog, but one about data-science, so we finish the job to learn how it would be done.

So let’s try to get a more differentiated picture than just an overall humanness-graph. For that, we need to combine dat_retweets with dat_retweeters.

dat_quest2 <- dat_retweets %>%
  left_join(dat_retweeters)
## Joining, by = "rt_screenname"

With the combined data-set we can create a diversified graph.

dat_quest2 %>%
  ggplot(aes(x=name, y = human_bot, fill=name)) +
    geom_boxplot() +
    scale_x_discrete(name="politician") +
    scale_y_continuous(name="human/bot-score")+
    scale_fill_manual(values = polit_info$col) +
    theme_classic() +
    theme(
      ## remove legend
      legend.position = "none",
      ## rotate names
      axis.text.x = element_text(angle = 30,
                                 vjust = 1,
                                 hjust = 1)
    )

If there were data spread over the whole range of the human/bot-score, then another visualization option would be to cut into data-segments with e.g.., cut(human_bot, 0:5) and illustrate their proportions per politician as stacked bar graphs. With our data, however, this is pointless. Instead, we try out something else and inspect the effect of the date.

dat_quest2 %>%
  ggplot(aes(x=created, y = human_bot, color=name, fill=name)) +
  geom_smooth() +
  scale_x_datetime(name="date",
                     limits = c(as.POSIXct("2017-01-01"), NA)) +
  scale_y_continuous(name="human/bot-score") +
  scale_color_manual(values = polit_info$col) +
  scale_fill_manual(values = polit_info$col) +
  theme_classic() +
  theme(
    legend.title = element_blank()
  )

Next, we could add annotations to the plot to highlight election dates and other interesting dates, but since we have already concluded that there are no bots involved, we will end now without doing so.

Summary

Overall, bots do not seem to play a direct role in the retweeting behavior of the selected politicians. Regardless of the political party, the selected politicians neither retweeted bots nor got retweeted by them. Honestly, I would have been surprised if such an obvious association could have been found, but on the other hand, I have seen stranger things. Of course, this does not mean that Twitter bots do not play a role in Austrian politics, just that the retweeting behavior of the selected politicians is not affected by bots.

Closing Remarks

I hope you have enjoyed our little digression into evaluating whether a tweeter is a human or a bot. In this case, we have not found any bots, but that does not mean that they are not out there. What I am interested next is whether people with different political affinities differ in their likelihood of following bots. This should not be too hard. One could take a sample of the follower groups and check who else is followed by them. So if you want to check out Botometer yourself, then this might be something you can try to find out. If you do, then please let me know about the results.


If something is not working as outlined here, please check the package versions you are using. The system I used was:

sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
##  [1] bindrcpp_0.2.2      RJSONIO_1.3-1.1     xml2_1.2.0
##  [4] httr_1.4.0          botcheck_0.0.0.9000 twitteR_1.1.9
##  [7] magrittr_1.5        forcats_0.3.0       stringr_1.4.0
## [10] dplyr_0.7.8         purrr_0.3.0         readr_1.3.1
## [13] tidyr_0.8.2         tibble_2.0.1        ggplot2_3.1.0
## [16] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5 xfun_0.4         haven_2.0.0      lattice_0.20-38
##  [5] colorspace_1.4-0 generics_0.0.2   htmltools_0.3.6  yaml_2.2.0
##  [9] rlang_0.3.1      pillar_1.3.1     DBI_1.0.0        glue_1.3.0
## [13] withr_2.1.2      bit64_0.9-7      modelr_0.1.3     readxl_1.2.0
## [17] bindr_0.1.1      plyr_1.8.4       munsell_0.5.0    gtable_0.2.0
## [21] cellranger_1.1.0 rvest_0.3.2      evaluate_0.13    knitr_1.21
## [25] curl_3.3         broom_0.5.1      Rcpp_1.0.0       openssl_1.2.1
## [29] scales_1.0.0     backports_1.1.3  jsonlite_1.6     bit_1.1-14
## [33] askpass_1.1      rjson_0.2.20     hms_0.4.2        digest_0.6.18
## [37] stringi_1.2.4    grid_3.5.2       cli_1.0.1        tools_3.5.2
## [41] lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.2  lubridate_1.7.4
## [45] assertthat_0.2.0 rmarkdown_1.11   rstudioapi_0.9.0 R6_2.3.0
## [49] nlme_3.1-137     compiler_3.5.2

3 comments:

  1. Hi Bernhard,

    Excellent post, thank you for sharing it!

    While reading it, I was wondering what's the scale for the botometer scores. I striked me that none of the scores for one of your analysis is higher than 1 (so is in my own analysis). So I cross-checked scores from the botcheck package with the ones from botometer website for several users.

    It seems that although the original botometer scores are between 1-5, botcheck scale is between 0-1. For example, my own account @gabrielaczarnek has 4.1 score on botometer score but 0.82 through botcheck (I think it tells more about my tweeting style rather than the botometer model :)).

    Anyway, the question is, if I am correct, why you got scores higher than 1 in your first analysis of followers?

    ReplyDelete
    Replies
    1. Hi Gabcza,

      you are absolutely right and I should have noticed my self that the scores are scaled to [0,1]. This makes my Twitter-Account quite bot-like.
      I just skimmed over my results and cannot find any value greater than 1.0. Could you please give a more precise pointer so that I can look into the matter.
      THX!

      Delete
    2. Hi again,

      re scores > 1, I think I misunderstood your sentence: "The mean human/bot-score is very low in general, and not a single retweeter scored higher than 2.5".

      You probably meant "noone had scores higher than a midpoint of a 1-5 scale" whereas I got it "the highest (observed) score was 2.5". My bad!

      Delete

Recommended Post

Follow the white robot - Exploring retweets of Austrian politicians with Botometer in R

botometer_publish.utf8.md Hi folks! I guess you are aware that social medi...

Popular Posts