Estimate probability that one or more Twitter accounts is a "bot"

Returns numeric vector of bot probabilities matched to the input vector (or data frame with user_id or screen_name) of users. This differs from predict_bot() because it only returns the bot probabilities and not user ID/screen name information.

predict_bot(x, batch_size = 100, ...)

predict_bot_score(x, batch_size = 100, ...)

Arguments

x

Input data either character vector of Twitter identifiers (user IDs or screen names) or a data frame of Twitter data

batch_size

Number of users to process per batch. Relevant if x contains user names or timeline data for more than 100 Twitter users. Because the data processing involves user-level aggregation (grouping by user), it can create computational bottlenecks that are easily avoided by breaking the data into batches of users. Manipulating this number may speed up or slow down data processing, but for most jobs the speed difference is likely negligible, meaning this argument may only be useful if you are working on either a very slow/low-memory machine or very fast/high-memory machine. Default is 100.

...

Other arguments are passed on to rtweet functions. This is mostly just to allow users to specify the Twitter API token, e.g., predict_bot("kearneymw", token = token) or predict_bot("kearneymw", token = rtweet::bearer_token()).

Value

predict_bot: A data frame (data.table) with the user id, screen name, and estimated probability of being a bot

predict_bot_score: returns a numeric vector of bot probabilities

Examples

if (FALSE) { ## vector of screen names x <- c("netflix_bot", "aasfdiouyasdoifu", "madeupusernamethatiswrong", "a_quilt_bot", "jack", "SHAQ", "aasfdiouyasdoifu5", NA_character_, "madeupusernamethatiswrong", "a_quilt_bot") ## predict_bot - returns data.table (with user_id, screen_name, prob_bot) (p1 <- predict_bot(x)) ## predict_bot_score - returns scores (prob_bot as a numeric vector) (p2 <- predict_bot_score(x)) }