#5: 8 Tips for a production-ready Shiny Application

Setting up a Shiny application is extremely simple, and can be done in a very free and versatile way. However, as the complexity of the project increases, these rules are no longer enough and you can face a loss of overview, accidental errors and difficulties in analyzing existing problems.

It is necessary to have a clear idea of what the issues of release, reproducibility, and maintenance are in order to implement strategies ahead of time and then to structure the project in order to effectively manage these issues.

Development and release

Divide and conquer

An application, in this case Shiny, aims to perform a task: do something useful, produce value. This task is accomplished through the creation of features which, during the project, generally increase in number and in interaction with each other. This brings up the first challenge: managing the increasing complexity of the application. The solution is obtained with a strategy of divide and conquer, that is to divide a complex system into smaller parts, and consequently, more understandable and more manageable. The criterion with which to carry out the division is the separation of competences: which we can exemplify in the division into components or in the coding of elementary objects to be used throughout the interface to give consintency of treatment to similar parts.

Taking an example, the components can be the tabs or the pages that make up a web interface: in this way I can examine an object separate from the others and see what the perimeter is (we talk about of interface) with which it relates to the other components. On the other hand, the elementary objects can be, for example, a particular type of table, or graph, or widget that is used by the whole application in order to make it consistent with itself.

Use Shiny Modules

A first level of separation of competences is the division of the application into Shiny modules. The latter can be said to be miniature Shiny applications, because they are characterized by a front-end (User Interface) part and from a back-end one (Server) which contains reactive objects.

One can imagine having such a subdivision:

app_server.R
app_ui.R
mod_tab_1.R
mod_tab_2.R

where app_server.R contains the lines:

1
2
3
4
callModule (mod_tab_1_server, "tab_1",
           data = common_data)
callModule (mod_tab_2_server, "tab_2",
           data = common_data)

and app_ui.R:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
dashboardBody (

  tabItems (
    # First tab content
    tabItem (tabName = "tab_1",
            mod_tab_1_ui ("tab_1")),
    # Secon tab content
    tabItem (tabName = "tab_2",
            mod_tab_2_ui ("tab_2"))
    )
  
)

and each mod_* file contains both the UI (mod_*_ui() functions) and the Server (mod_*_server() functions) of the miniature application. common_data is an example of an interface between the two.

1
2
3
4
5
6
7
8
mod_tab_1_ui <- function (id) {
  ...
}
    
mod_tab_2_server <- function (input, output, session,
                             date) {
  ...
}

Separate Shiny and Simple R code into different files

The second level of separation of competences splits the Shiny part, from the simple R code. This practice is very important and separates the update logic (reactive) contained in the Shiny module, from the calculation logic that lives in R functions, which can be used separately from the reactive context (reactive context), allowing it to be used and tested independently.

The source files become:

app_server.R
app_ui.R
mod_tab_1.R
mod_tab_2.R

fun_tab_1.R
fun_tab_2.R

where the fun_* files contain the R code used in the respective mod_*. This is an example, otherwise the R part can be organized in different ways: for example by classes (S3 or R6).

Deploy Shiny application as an R (Golem) package

How do we gather code files and how do we distribute them?

Shiny is given to us with great freedom of organization. Just create a directory with some files with standard names and the rest we can decide at will: create a data folder, a folder for the base R code, etc… However these freedoms imply the obligation to create, according to the choices made, a number of procedures to perform tasks that are actually very common. For example, how do I organize the project folder? How do I define code dependencies and make sure they are installed on the system? Actually these tasks already have an absolutely standard solution encoded in the idea of R Package. In fact, creating a package means having:

the form of the standard project folder
the standard documentation format for: individual functions and vignettes
the declaration of dependencies (and their versions) not only in the package name, but also at a lower level by choosing the individual functions to import.
a standard and robust installation procedure starting from a wide variety of formats (tar.gz or zip files, Git repository, package on the CRAN, etc …)
other standard procedures for example: testing (which I will talk about later), and sample dataset provision.

The system for creating a package for a Shiny application or, if we want to say better, the framework is [Golem] (https://github.com/ThinkR-open/golem).

Do Unit Tests (testthat)

As the size and complexity of the application increases, the number of possible breaking points increases.

There may be situations in which implementing a new functionality breaks an existing one. Especially in a team, I can break the features developed by colleagues, precisely because I am less aware of the details of their functioning and, as a result, the new bug can escape my control and arrive in production. So let’s talk about functionality regression. Roughly speaking, by publishing a new feature, without realizing it, we lose other features that were instead taken for granted. This situation is even more serious than the one where the new functionality is not implemented perfectly, as the regression takes away a service that is now taken for granted.

To avoid these situations, before each release it is necessary to do a full check of all the functions. In order to do this, it is useful to set up a strategy of automatic unit tests: in this way the tests become part of the code base and everyone can test everything with one-click in seconds. This prevents regression and gives peace of mind on release.

The tests provide other advantages: testing the single component allows you to reduce the bug search zone in case of failure and the test itself is a working documentation of the single unit.

Reproducibility and configuration

Both during development and when it is in production, the application must run on different systems: your colleague’s computer or the production server.
For this it is important that the application is reproducible. It means it has to work on different systems. For this, at least two conditions are important: that the compatible environment is installed and that the application is configured in order to find the external resources it needs on the system it is running on.

Track used dependencies and reproduce environment (with “renv”)

For the reproducibility condition, I recommend using “renv” or Docker. The first is a package that creates a Virtual environment: namely it is able to track the version of the packages used and reproduce it on another server. While the second virtualizes the whole application including, in addition to the R packets, also the file system of the operating system and the network interface. But it is certainly a more complex choice.

This is an example of the renv.lock file, where “renv” tracks the version of R and of the packages in use:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "R": {
    "Version": "3.6.2",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cloud.r-project.org"
      }
    ]
  },
  "Packages": {
    "BH": {
      "Package": "BH",
      "Version": "1.72.0-3",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "8f9ce74c6417d61f0782cbae5fd2b7b0"
    },
    "DT": {
      "Package": "DT",
      "Version": "0.13",
      "Source": "Repository",
      "Repository": "CRAN",
      "Hash": "a16cb2832248c7cff5a6f505e2aea45b"
    },
...

Use configuration files

The second condition can instead be acquired through a configuration file: which is nothing more than a text file on which you can write all the information necessary to connect the application to the work environment. For example: the addresses of the resources to be accessed, the parts of the file system to write to, users to use, credentials to access the databases, etc, etc …

This is an example of a file that brings configurations for different environments: default settings and specific settings for production and development environments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
default:
  golem_name: myApp
  golem_version: 0.0.0.9000
production:
  app_prod: yes
  log_file: /var/log/.../file.log
  db: 10.0.0.2
  db_user: "John"
  db_password: "XXXXX"
dev:
  app_prod: no
  log_file: ./file.log
  db: 10.0.0.102
  db_user: "Doe"
  db_password: "YYYYYY"

Maintenance

Create log files per application and distinguish message severity (futile.logger)

When the application is on a server monitoring its behavior becomes complicated.

You are not the user: who is in front of the interface, who knows what data are loaded and the latest operations performed
You are not in the position of the developer: who sees the messages of the applications and is aware of the operations performed
You are not connected to it in real time or you can interact with the application instance and, when it ends, all the memory variables associated with it are lost

The main solution is a system that records all the information necessary to understand if the application in the various instances is working correctly and, if it is not, to understand what the reason for the malfunction is and how to intervene.

Here, too, Shiny comes with a standard system that writes what we are used to seeing in the console on a system log file. However it is not enough. It is necessary to avoid that logs from different instances get mixed up and it is important to be able to choose the quantity and the granularity of messages. For this I suggest to use libraries such as futile.logger which for each logging line adds a timestamp and an indication of the severity of the message, it is also configurable to manage: the severity threshold beyond which to record messages, on which medium (file or console) record, and has a number of useful functions to pursue this purpose.

Log file example:

Esempio di file di log:

1
2
3
4
5
6
7
8
INFO [2020-11-19 13:48:42] New run started. ---------------------
INFO [2020-11-19 13:48:42] Version: 0.5.3
INFO [2020-11-19 13:48:42] mode: dev
INFO [2020-11-19 14:49:24] optimx run 1
WARN [2020-11-19 14:49:25] .....
...
INFO [2020-11-19 14:59:14] optimx run 2
ERROR [2020-11-19 14:59:39] .....

Workshop

I held a workshop talking of these topics at eRum 2020. The relating working application is here (Give a star to the repo if you like it!)

Next level

Regarding the writing of the code these were the basic tips, you can also think of using advanced Shiny programming techniques, for example using R6. But perhaps it is better to improve in other directions too. For example, the workflow using Git: to manage collaboration in a structured and advanced way (who does what, sharing and discussing specifications, having kanban boards associated with the “branches” of the project), continuous integration and deploy. It is also good to have a strategy to understand the use made by the user, to improve the user experience and to convey the in-line documentation with the application. I plan to address these issues in some articles in the future.

Comments

Comment on Linkedin

Comment on Twitter