Estimate probability that one or more Twitter accounts is a "bot"
Returns numeric vector of bot probabilities matched to the input vector (or
data frame with user_id or screen_name) of users. This differs from
predict_bot()
because it only returns the bot probabilities and not
user ID/screen name information.
predict_bot(x, batch_size = 100, ...) predict_bot_score(x, batch_size = 100, ...)
x | Input data either character vector of Twitter identifiers (user IDs or screen names) or a data frame of Twitter data |
---|---|
batch_size | Number of users to process per batch. Relevant if x contains user names or timeline data for more than 100 Twitter users. Because the data processing involves user-level aggregation (grouping by user), it can create computational bottlenecks that are easily avoided by breaking the data into batches of users. Manipulating this number may speed up or slow down data processing, but for most jobs the speed difference is likely negligible, meaning this argument may only be useful if you are working on either a very slow/low-memory machine or very fast/high-memory machine. Default is 100. |
... | Other arguments are passed on to rtweet functions. This is mostly
just to allow users to specify the Twitter API token, e.g.,
|
predict_bot: A data frame (data.table) with the user id, screen name, and estimated probability of being a bot
predict_bot_score: returns a numeric vector of bot probabilities
if (FALSE) { ## vector of screen names x <- c("netflix_bot", "aasfdiouyasdoifu", "madeupusernamethatiswrong", "a_quilt_bot", "jack", "SHAQ", "aasfdiouyasdoifu5", NA_character_, "madeupusernamethatiswrong", "a_quilt_bot") ## predict_bot - returns data.table (with user_id, screen_name, prob_bot) (p1 <- predict_bot(x)) ## predict_bot_score - returns scores (prob_bot as a numeric vector) (p2 <- predict_bot_score(x)) }