Chapter 2 R project setup

There are a few ways you can setup an R project. We will use the devtools package

2.1 Selecting a name for your package

Select a name for your package.

The official CRAN guidelines Writing R Extensions provides the following constrains:
The mandatory ‘Package’ field gives the name of the package. This should contain only (ASCII) letters, numbers and dot, have at least two characters and start with a letter and not end in a dot.

Please note, that underscore character _ is not allowed

More than 70% of R packages named using lowercase, but you can use Upper case or Camel Case style if the name of your package is an abbreviation or consists of multiple words.

Avoid names that already exist in CRAN.

# get all package names in CRAN
options(repos = list(CRAN="http://cran.rstudio.com/"))
pkgs <- available.packages(filters = c("CRAN", "duplicates"))[,'Package']

# check if pkgs vector contains a name "myutils"
"myutils" %in% pkgs

If you plan to use Bioconductor, check if the package name already exist there. It is a good idea to check it anyways. You can do it by checking the name in the Bioconductor Package list.

Alternatively, you can install package available and use function with the same name:

available::available("myutils")
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories'
for details

replacement repositories:
    CRAN: https://cran.us.r-project.org

Urban Dictionary can contain potentially offensive results,
  should they be included? [Y]es / [N]o:
1: Y
── myutils ──────────────────────────────────────────────────────────────────────────────
Name valid: ✔
Available on CRAN: ✔ 
Available on Bioconductor: ✔
Available on GitHub:  ✔ 
Abbreviations: http://www.abbreviations.com/myutils
Wikipedia: https://en.wikipedia.org/wiki/myutils
Wiktionary: https://en.wiktionary.org/wiki/myutils
Urban Dictionary:
  Not found.
Sentiment:???

2.2 Package Initialization

Make sure you set your current directory. The R package will be created in a sub-directory.

# set current directory where the package will reside
setwd("/Users/ktrn/Dropbox (BOSTON UNIVERSITY)/Projects/R/")

# create a new package
usethis::create_package("myutils")

Alternatively, you can use RStudio menu: File > New Project > New Directory > R Package.

Openning an R package project

create_package() function creates a sub-folder with a name after the package name. It also creates an R project and a few files and folders there:

dir("myutils", all.files=TRUE)
##  [1] "."             ".."            ".gitignore"    ".Rbuildignore"
##  [5] ".Rhistory"     ".Rproj.user"   "DESCRIPTION"   "LICENSE.md"   
##  [9] "myutils.Rproj" "NAMESPACE"     "R"

Let’s open myutils.Rproj R project in the RStudio.

2.3 R package structure

An R package usually includes the following components:

  • Functions
  • Data (optional)
  • Documentation ( function manuals and examples of how to use functions included in the package)
  • Vignettes (longer descriptions of package function usage)
  • Tests (R scripts to test the package functions)
R Project directory structure:
| - DESCRIPTION
| - LICENSE
| - NAMESPACE
| - R
|   | - script1.R
|   | - script2.R
| - man
|   | - function1.Rd
|   | - function2.Rd
|   | - function3.Rd
| - myutils.Rproj



DESCRIPTION- file containing metadata about your package: authors, current version, dependencies
LICENSE - file describing the package usage agreement
NAMESPACE - file containing information about functions your package imports from other packages
R - directory containing R scripts
man - directory containing documentation files
my_package.Rproj - R project file associated with the package

2.4 DESCRIPTION file: R package information

Let’s open the DESCRIPTION file, created by the create() function. By default it looks like this:

Package: myutils
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R: 
    person("First", "Last", , "first.last@example.com", role = c("aut", "cre"),
           comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a
    license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2

Each line consists of a field name and a value, separated by a colon. When values span multiple lines, they need to be indented.

It is important to set these fields approprietly and provide a good Title and Description for your package as this data will be displayed on the CRAN Download webpage:

There are two other fields that are not present in the DESCRIPTION file but which we might need to add later - Imports and Suggests. If our package uses functions from other R packages (e.g. ggplot2, dplyr, etc.) we will need to list them in the Imports field. If there are some optional functionality in our package that uses some other packages, they need to be listed in the Suggests field.

2.4.1 Title and Description

The Title field describes what package does in one line, while Description is a one paragraph text that contains a more detailed overview of the package functionality.

Do not include the package name in the title. Do not start the title or Description with “This package …” or similar words. Take a look at the list of CRAN packages to view examples of titles of existing R packages.

Each line in the Description field should contain no more than 80 characters and indented with 4 spaces.

2.4.2 Version

The version must include at least two integers separated by dot . and/or dash - symbols. Usually, the version contains three or four integers:
<major>.<minor>.<patch> or <major>.<minor>.<patch>.<development>

To increase any component of the version value, you can use use_version() function from the usethis package:

You can read more on how to set version at Semantic Versioning

2.4.3 Authors

To list the package authors and maintainers, use Authors@R field:

Authors@R:
    c(person(given = "John",
           family = "Smith", 
           role = c("aut", "cre"),
           email="john_smith@gmail.com",
           comment = c(ORCID = "0000-0001-1234-5678")),
      person(given = "Elizabeth",
           family = "Miller", 
           role = c("ctb"),
           email="emiller@gmail.com",
           comment = c(ORCID = "0000-0001-0314-1592"))
           )

The partial list of the roles:

Symbol Meaning Description
aut Author Authors, who provided substantial contributions to the package
com Compiler Person who collected code but did not make substantial contribution
ctb Contributor Authors who have made smaller contributions (e.g. patches)
cph Copyright holder Copyright holders
cre Creator Package maintainer
ths Thesis advisor Thesis advisor (if package is a part of a thesis)
trl Translator Person who translated all or part of the package code to R from another language.
fnd Funder People or organizations that provided financial support

For the list of other roles, see help topic for the person() function.

2.4.4 License

In the US, copyright is automatic: if you don’t choose a license for your software, no one else can use it!

There are two major types of open-source licenses - permissive and copyleft.

Permissive licenses allow the code to be copied, modified in any way, and published as long as the license is preserved. The modified code do not have to be distributed as an open source code. A popular example is an MIT license.
To create an MIT license, execute: usethis:use_mit_license("Company Name")

Copyleft licenses allow to copy and modify code for personal use. An example of a copyleft license is a GPL license. If the modified code is then published or distributed it must be also licensed with the GPL and must be distributed as an open source code. To create a GPL license, execute: usethis::use_gpl_license(version = 3, include_future = TRUE)

Most CRAN packages use copyleft licenses.

You can read more about R licenses at www.r-project.org/Licenses/

2.4.5 .Rbuildignore

The .Rbuildignore file tells R which files to ignore when the package is built. It can include full names, or a regex expression:

^.*\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$

2.5 Adding R functions to the package

Let’s now create a new R script and add a couple R functions there. Make sure you have devtools package loaded we will use it throughout the workshop! We will place our R function in the file named my_summaries.R. This function should be placed into the R sub-folder. We will use use_r() function from the usethis package (which is included when we loaded the devtools package):

library(devtools)
use_r("my_summaries")

Once the file is created, we can open it and add a couple functions (the first draft of our functions):

# This function creates a summary for a numeric vector
numeric_summary <- function(x, na.rm=FALSE){

  min = min(x, na.rm=na.rm)
  max = max(x, na.rm=na.rm)
  mean = mean(x, na.rm=na.rm)
  sd = sd(x, na.rm=na.rm)
  length = length(x)
  Nmiss = sum(is.na(x))

  c(min=min, max=max, mean=mean, sd=sd, length=length, Nmiss=Nmiss)

}

# This function creates a summary for a character vector
char_summary <- function(x, na.rm=FALSE){

  length = length(x)
  Nmiss = sum(is.na(x))
  Nunique = length(unique(x))

  c(length = length,
    Nmiss = Nmiss,
    Nunique = Nunique )

}

To use this functions from within the package we will not use the source() function as we usually do. Instead we will use load_all() function from the devtools package:

load_all()

cvec <- c("Boston", NA, "Brookline", "Brighton", NA, "Boston")
nvec <- c(2022, 2021, NA, 2021, 2021, NA, 2022)

char_summary(cvec)
numeric_summary(nvec, na.rm=TRUE)

We probably want to modify our functions, for example, to improve the output format. But we will return to this later.

2.6 Summary: Create new R Package workflow

  1. Set working directory
  2. library(devtools)` - Load devtools package
  3. create("package_name") - Create an R project that contains main components for an R package
  4. Set up information in the DESCRIPTION file
  5. use_r("r_file_name") - Create an R script and place it into R sub-folder
  6. Open r_file_name.R script and add desired functions
  7. load_all() - load all functions in the package

Once the package is created you can continue to add new R scripts using use_r() and load them into environment using load_all() function.