functional programming – Dplyr friendly function for computing moving average

The function computes moving arrange and offers some other minor widgets like possibility to return data frame in original order or automatically derive variable name for the moving average. In majority of actual use cases similar results could be easily achieved using combinations of usual suspects like across, mutate and so on. However, I was interested in an implementation that:

  1. Is fairly generic and easily applicable to multiple data sets
  2. Makes use of fairy standard dplyr verbs for easier usage in sparklyr, especially when working on old Spark backends that do not support apply. The function proved convoluted so I feel that I’ve failed in that goal, but that’s a side point.

Sought feedback

  1. Better approach to generating multiple lag calls. Through string manipulation I arrive at lag(var, n)... and then do:

    dplyr::mutate(data_sorted,"{{val}}_mavg" := !!rlang::parse_expr(lag_call))

    This feels naff and I would be grateful for suggestions on improvement.

  2. Is there an elegant way of forcing evaluation on the left hand side of :=. Currently I’m doing:

    if (res_val != "{{val}}_mavg") {
         data_avg <- dplyr::rename(res_val = "{{val}}_mavg")

    This works but one-liner like evaluating res_val and then doing potentially necessary magic with curly-curly would be nice.

  3. Any other observations

#' Add Moving Average for an Arbitrary Number of Intervals
#"" The functions adds moving average for an arbitrary number of intervals. The
#'   data can be returned sorted or according to the original order.
#"" @details The function can be used independently or within dplyr pipeline.
#"" @param .data A tibble or data frame.
#' @param sort_cols Columns used for sorting passed in a manner consistent with
#'   code{link(dplyr){arrange}}
#' @param val Column used to calculate moving average passed as bare column
#'   name or a character string.
#' @param res_val Resulting moving average, defaults to name of code{val}
#'   suffixed with code{_mavg}.
#' @param restore_order A logical, defaults to code{FALSE} if code{TRUE} it
#'   will restore original data order.
#"" @return A tibble with appended moving average.
#' @export
#"" @examples
#' add_moving_average(mtcars, sort_cols = c("mpg", "cyl"), val = hp, intervals = 2)
add_moving_average <-
             intervals = 2,
             res_val = "{{val}}_mavg",
             restore_order = FALSE) {

        unique_id_name <- tail(make.unique(c(colnames(.data), "ID")), 1)
        data_w_index <- dplyr::mutate(.data, {{unique_id_name}} := dplyr::row_number())

        index_col_name <- tail(names(data_w_index), 1)

        # Create desired number of calls to get moving average calculation
        lag_calls <- paste0("lag(",  rlang::as_string(rlang::ensym(val)), ", ", 1:intervals, ")")
        lag_call <- paste(lag_calls, collapse = " + ")
        lag_call <- paste0("(", lag_call, ") / ", intervals)

        data_sorted <- dplyr::arrange(data_w_index, dplyr::across(sort_cols))

        data_avg <- dplyr::mutate(data_sorted,"{{val}}_mavg" := !!rlang::parse_expr(lag_call))

        if (res_val != "{{val}}_mavg") {
            data_avg <- dplyr::rename(data_avg, res_val = "{{val}}_mavg")

        if (restore_order) {
            data_avg <- dplyr::arrange(data_avg, !!rlang::sym(index_col_name))

        data_avg <- dplyr::select(data_avg, -dplyr::last_col(1))


add_moving_average(.data = mtcars, sort_cols = c("am", "gear"), val = disp, intervals = 3)
add_moving_average(.data = mtcars, sort_cols = c("am", "gear"), val = disp, intervals = 3)
add_moving_average(.data = mtcars, sort_cols = c("am", "gear"), val = disp, intervals = 3,
                   restore_order = TRUE)