source - r package license




Listing R Package Dependencies Without Installing Packages (2)

Is there a simple way to get a list of R package dependencies (all recursive dependencies) for a given package, without installing the package and it's dependencies? Something similar to a fake install in portupgrade or apt.


Another neat and simple solution is the internal function recursivePackageDependencies from the library packrat. However, the package must be installed in some library on your machine. The advantage is that it works with selfmade non-CRAN packages as well. Example:

packrat:::recursivePackageDependencies("ggplot2",lib.loc = .libPaths()[1])

giving:

 [1] "R6"           "RColorBrewer" "Rcpp"         "colorspace"   "dichromat"    "digest"       "gtable"      
 [8] "labeling"     "lazyeval"     "magrittr"     "munsell"      "plyr"         "reshape2"     "rlang"       
 [15] "scales"       "stringi"      "stringr"      "tibble"       "viridisLite" 

I do not have R installed and I needed to find out which R Packages were dependencies upon a list of R Packages being requested for usage at my company.

I wrote a bash script that iterates over a list of R Packages in a file and will recursively discover dependencies.

The script uses a file named rinput_orig.txt as input (example below). The script will create a file named rinput.txt as it does its work.

The script will create the following files:

  • rdepsfound.txt - Lists dependencies found including the R Package that is dependent upon it (example below).
  • routput.txt - Lists all R Packages (from original list and list of dependencies) along with the license and CRAN URL (example below).
  • r404.txt - List of R Packages where a 404 was received when trying to curl. This is handy if your original list has any typos.

Bash script:

#!/bin/bash

# CLEANUP
rm routput.txt
rm rdepsfound.txt
rm r404.txt

# COPY ORIGINAL INPUT TO WORKING INPUT
cp rinput_orig.txt rinput.txt

IFS=","
while read PACKAGE; do
    echo Processing $PACKAGE...

    PACKAGEURL="http://cran.r-project.org/web/packages/${PACKAGE}/index.html"

    if [ `curl -o /dev/null --silent --head --write-out '%{http_code}\n' ${PACKAGEURL}` != 404 ]; then
        # GET LICENSE INFO OF PACKAGE
        LICENSEINFO=$(curl ${PACKAGEURL} 2>/dev/null | grep -A1 "License:" | grep -v "License:" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[0]}' | sed "s/|/,/g" | sed "s/+/,/g")
        for x in ${LICENSEINFO[*]}
        do
            # SAVE LICENSE
            LICENSE=$(echo ${x} | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[1]}')
            break
        done

        # WRITE PACKAGE AND LICENSE TO OUTPUT FILE
        echo $PACKAGE $LICENSE $PACKAGEURL >> routput.txt

        # GET DEPENDENCIES OF PACKAGE
        DEPS=$(curl ${PACKAGEURL} 2>/dev/null | grep -A1 "Depends:" | grep -v "Depends:" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[0]}')
        for x in ${DEPS[*]}
        do
            FOUNDDEP=$(echo "${x}" | gawk 'match($0, /<a href=".*">(.*)<\/a>/, a) {print a[1]}' | sed "s/<\/span>//g")
            if [ "$FOUNDDEP" != "" ]; then
                echo Found dependency $FOUNDDEP for $PACKAGE...
                grep $FOUNDDEP rinput.txt > /dev/null
                if [ "$?" = "0" ]; then
                    echo $FOUNDDEP already exists in package list...
                else
                    echo Adding $FOUNDDEP to package list...
                    # SAVE FOUND DEPENDENCY BACK TO INPUT LIST
                    echo $FOUNDDEP >> rinput.txt
                    # SAVE FOUND DEPENDENCY TO DEPENDENCY LIST FOR EASY VIEWING OF ALL FOUND DEPENDENCIES
                    echo $FOUNDDEP is a dependency of $PACKAGE >> rdepsfound.txt
                fi
            fi
        done
    else
        echo Skipping $PACKAGE because 404 was received...
        echo $PACKAGE $PACKAGEURL >> r404.txt
    fi

done < rinput.txt
echo -e "\nRESULT:"
sort -u routput.txt

Example rinput_orig.txt:

shiny
rmarkdown
xtable
RODBC
RJDBC
XLConnect
openxlsx
xlsx
Rcpp

Example console output when running script:

Processing shiny...
Processing rmarkdown...
Processing xtable...
Processing RODBC...
Processing RJDBC...
Found dependency DBI for RJDBC...
Adding DBI to package list...
Found dependency rJava for RJDBC...
Adding rJava to package list...
Processing XLConnect...
Found dependency XLConnectJars for XLConnect...
Adding XLConnectJars to package list...
Processing openxlsx...
Processing xlsx...
Found dependency rJava for xlsx...
rJava already exists in package list...
Found dependency xlsxjars for xlsx...
Adding xlsxjars to package list...
Processing Rcpp...
Processing DBI...
Processing rJava...
Processing XLConnectJars...
Processing xlsxjars...
Found dependency rJava for xlsxjars...
rJava already exists in package list...

Example rdepsfound.txt:

DBI is a dependency of RJDBC
rJava is a dependency of RJDBC
XLConnectJars is a dependency of XLConnect
xlsxjars is a dependency of xlsx

Example routput.txt:

shiny GPL-3 http://cran.r-project.org/web/packages/shiny/index.html
rmarkdown GPL-3 http://cran.r-project.org/web/packages/rmarkdown/index.html
xtable GPL-2 http://cran.r-project.org/web/packages/xtable/index.html
RODBC GPL-2 http://cran.r-project.org/web/packages/RODBC/index.html
RJDBC GPL-2 http://cran.r-project.org/web/packages/RJDBC/index.html
XLConnect GPL-3 http://cran.r-project.org/web/packages/XLConnect/index.html
openxlsx GPL-3 http://cran.r-project.org/web/packages/openxlsx/index.html
xlsx GPL-3 http://cran.r-project.org/web/packages/xlsx/index.html
Rcpp GPL-2 http://cran.r-project.org/web/packages/Rcpp/index.html
DBI LGPL-2 http://cran.r-project.org/web/packages/DBI/index.html
rJava GPL-2 http://cran.r-project.org/web/packages/rJava/index.html
XLConnectJars GPL-3 http://cran.r-project.org/web/packages/XLConnectJars/index.html
xlsxjars GPL-3 http://cran.r-project.org/web/packages/xlsxjars/index.html

I hope this helps someone!





packages