Downloading Python source files from github#

This project incorporates Python modules and functions that are used in multiple notebooks. Most of these are simple convenience functions for accessing device hardware. But whatever the use, repeating the same source code in multiple notebooks complicates maintenance and has little value for the reader. For these reasons, it is much better to maintain code in the project’s repository and import as needed for use in the notebooks.

Unfortunately, Github stores code files in an a database for which the standard API does not provide direct access to whole directories. There are libraries circulating in the Python community designed to circumvent this limitation.

Here we demonstrate three techniques:

  1. Use of wget to selectively download individual Python source files to the current working directory.

  2. The use of git clone to download the entire repository and then add a Python source directory to the import path. Changes to the code can be committed and pushed back to the git repository.

  3. Use of pip  install to install python packages from a githb repository. This is convenient for the notebook user, but requires a properly configured setup.py in the repository.

Method 1. Downloading individual Python files with wget#

The file hello_world.py is located in the top-level src directory of a github repository. To access the file, use the shell command wget with an https link to the raw content of the main branch. The prefix exclamation/bang symbol ! causes the following line to be executed by the system command line rather than the Python kernal. The --no-cache option ensures the latest version is downloaded.

The --backups=1 option saves any prior version of the same code file to a backup.

user = "jckantor"
repo = "cbe-virtual-laboratory"
src_dir = "src"
pyfile = "hello_world.py"

url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src_dir}/{pyfile}"
!wget --no-cache --backups=1 {url}
--2020-11-01 19:11:46--  https://raw.githubusercontent.com/jckantor/cbe-virtual-laboratory/main/src/hello_world.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123 [text/plain]
Saving to: ‘hello_world.py’

hello_world.py      100%[===================>]     123  --.-KB/s    in 0s      

2020-11-01 19:11:46 (8.21 MB/s) - ‘hello_world.py’ saved [123/123]
import subprocess

result = subprocess.run(["wget", "--no-cache", "--backups=1", url], stderr=subprocess.PIPE, stdout=subprocess.PIPE)
print(result.stderr.decode("utf-8"))
--2020-11-01 19:11:46--  https://raw.githubusercontent.com/jckantor/cbe-virtual-laboratory/main/src/hello_world.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123 [text/plain]
Saving to: ‘hello_world.py’

     0K                                                       100% 7.32M=0s

2020-11-01 19:11:46 (7.32 MB/s) - ‘hello_world.py’ saved [123/123]

Let’s make a listing of the file’s content.

with open(pyfile, 'r') as f:
    print(f.read())
def hello():
    """Print hello, world to demonstrate use of the source library."""
    print("Hello, World")
    return
  

Let’s import the file as a Python module and use the embedded functions. If the name of the file is fixed and known, then the usual Python import statement will do the job.

import hello_world
help(hello_world)
hello_world.hello()
Help on module hello_world:

NAME
    hello_world

FUNCTIONS
    hello()
        Print hello, world to demonstrate use of the source library.

FILE
    /content/hello_world.py


Hello, World

If the name of the python file is given as the value of a Python string variable then the standard library importlib may be used. Note the need to strip any suffix from a file name.

import importlib
mymodule = importlib.import_module(pyfile.rstrip(".py"))
help(mymodule)
mymodule.hello()
Help on module hello_world:

NAME
    hello_world

FUNCTIONS
    hello()
        Print hello, world to demonstrate use of the source library.

FILE
    /content/hello_world.py


Hello, World

For platforms where the shell escape ! might fail, an alternative is to use the standard Python subprocess library.

Method 2. Cloning a git repository#

Downloading a collection of files from a git repository with wget (or curl) can be cumbersome, particularly if the names of the individual files are unknown or subject to change. And, unfortunately, Github does not provide an API for accessing a folder of files.

For these situations, an alternative is to simply clone the git repository to to a local directory.

import os

user = "jckantor"
repo = "cbe-virtual-laboratory"

# remove local directory if it already exists
if os.path.isdir(repo):
    !rm -rf {repo}

!git clone https://github.com/{user}/{repo}.git
Cloning into 'cbe-virtual-laboratory'...
remote: Enumerating objects: 518, done.
remote: Counting objects: 100% (518/518), done.
remote: Compressing objects: 100% (286/286), done.
remote: Total 518 (delta 407), reused 318 (delta 221), pack-reused 0
Receiving objects: 100% (518/518), 406.64 KiB | 2.01 MiB/s, done.
Resolving deltas: 100% (407/407), done.

With the repository cloned to a local subdirectory of the same name, there are several useful strategies for importing from the source directory. The following cell demonstrates how to insert a repository source directory in Python path (if it doesn’t appear already).

import sys

src_dir = "src"

path = f"{repo}/{src_dir}"
if not path in sys.path:
    sys.path.insert(1, path)

# list all directories in the Python path
print("\n".join(["'" + path + "'" for path in sys.path]))
''
'cbe-virtual-laboratory/src'
'/env/python'
'/usr/lib/python36.zip'
'/usr/lib/python3.6'
'/usr/lib/python3.6/lib-dynload'
'/usr/local/lib/python3.6/dist-packages'
'/usr/lib/python3/dist-packages'
'/usr/local/lib/python3.6/dist-packages/IPython/extensions'
'/root/.ipython'

The next stop is to import a python module from inside the library

import sys

src_dir = "src"

sys.path.insert(1, f"{repo}/{src_dir}")
import hello_world
hello_world.hello()
Hello, World

The following cell summaries these steps into a single cell that can be copied into a new notebook.

import os, sys, importlib

user = "jckantor"
repo = "cbe-virtual-laboratory"
src_dir = "src"
pyfile = "hello_world.py"

if os.path.isdir(repo):
    !rm -rf {repo}

!git clone https://github.com/{user}/{repo}.git

path = f"{repo}/{src_dir}"
if not path in sys.path:
    sys.path.insert(1, path)

mymodule = importlib.import_module(pyfile.rstrip(".py"))
help(mymodule)
Cloning into 'cbe-virtual-laboratory'...
remote: Enumerating objects: 518, done.
remote: Counting objects: 100% (518/518), done.
remote: Compressing objects: 100% (286/286), done.
remote: Total 518 (delta 407), reused 318 (delta 221), pack-reused 0
Receiving objects: 100% (518/518), 406.64 KiB | 2.15 MiB/s, done.
Resolving deltas: 100% (407/407), done.
Help on module hello_world:

NAME
    hello_world

FUNCTIONS
    hello()
        Print hello, world to demonstrate use of the source library.

FILE
    /content/hello_world.py

Commit and push changes#

A potential use case for cloning a repository is to allow for editing the source code directly from a Jupyter notebook. In this case, the code can be committed and pushed back to the reposity using standard git commands.

Be sure you know what you’re doing before attempting this. This code has been commented out to avoid inadvertent changes to this repository’s source code.

import os
from getpass import getpass
import urllib

#password = getpass('Password: ')
#password = urllib.parse.quote(password)

#cmd_str = f"git -C https://{user}:{password}@github.com/{user}/{repo} push"
#os.system(cmd_string)

#!git -C /content/cbe-virtual-laboratory commit -m "update"
#!git -C /content/cbe-virtual-laboratory push

#cmd_str, password = "", "" # removing the password from the variable

Method 3. Using pip to install from a github repository#

The methods presented above assume the user has detailed knowledge of how functions have been organized into modules in the repository’s source directory. For simple applications, that may be satisfactory and those methods are fast and can work well. For more complex applications, however, it will be helpful to use common methods for creating Python software packages.

For this case we assume a file setup.py has been included in the top-level directory of the repository that specifies how packages have been organized into source directories following using the setuptools library.

Assuming setup.py is present and that the usual conventions for creating Python packages have been followed, the packages can be loaded directory from github as shown in the following cell.

user = 'jckantor'
repo = 'cbe-virtual-laboratory'

url = f"git+https://github.com/{user}/{repo}.git"
!pip install --upgrade {url}
Collecting git+https://github.com/jckantor/cbe-virtual-laboratory.git
  Cloning https://github.com/jckantor/cbe-virtual-laboratory.git to /tmp/pip-req-build-4g0kdhj4
  Running command git clone -q https://github.com/jckantor/cbe-virtual-laboratory.git /tmp/pip-req-build-4g0kdhj4
Building wheels for collected packages: cbelaboratory
  Building wheel for cbelaboratory (setup.py) ... ?25l?25hdone
  Created wheel for cbelaboratory: filename=cbelaboratory-0.0.0-cp36-none-any.whl size=2347 sha256=0246cdb88b2feb591c7b95ccdbb82e1a043f9fa7cfba4a03a8a4950c91028554
  Stored in directory: /tmp/pip-ephem-wheel-cache-jbbfspkb/wheels/c9/9d/5c/f86f44683b875e91e4843a17cfa5b3f69cbf419d35ca09f247
Successfully built cbelaboratory
Installing collected packages: cbelaboratory
  Found existing installation: cbelaboratory 0.0.0
    Uninstalling cbelaboratory-0.0.0:
      Successfully uninstalled cbelaboratory-0.0.0
Successfully installed cbelaboratory-0.0.0
from cbelaboratory.hello_world import hello
hello()
Hello, World