Downloading Python source files from github
Contents
Downloading Python source files from github#
This project incorporates Python modules and functions that are used in multiple notebooks. Most of these are simple convenience functions for accessing device hardware. But whatever the use, repeating the same source code in multiple notebooks complicates maintenance and has little value for the reader. For these reasons, it is much better to maintain code in the project’s repository and import as needed for use in the notebooks.
Unfortunately, Github stores code files in an a database for which the standard API does not provide direct access to whole directories. There are libraries circulating in the Python community designed to circumvent this limitation.
Here we demonstrate three techniques:
Use of
wget
to selectively download individual Python source files to the current working directory.The use of
git clone
to download the entire repository and then add a Python source directory to the import path. Changes to the code can be committed and pushed back to the git repository.Use of
pip install
to install python packages from a githb repository. This is convenient for the notebook user, but requires a properly configuredsetup.py
in the repository.
Method 1. Downloading individual Python files with wget#
The file hello_world.py
is located in the top-level src
directory of a github repository. To access the file, use the shell command wget
with an https
link to the raw content of the main branch. The prefix exclamation/bang symbol !
causes the following line to be executed by the system command line rather than the Python kernal. The --no-cache
option ensures the latest version is downloaded.
The --backups=1
option saves any prior version of the same code file to a backup.
user = "jckantor"
repo = "cbe-virtual-laboratory"
src_dir = "src"
pyfile = "hello_world.py"
url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src_dir}/{pyfile}"
!wget --no-cache --backups=1 {url}
--2020-11-01 19:11:46-- https://raw.githubusercontent.com/jckantor/cbe-virtual-laboratory/main/src/hello_world.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123 [text/plain]
Saving to: ‘hello_world.py’
hello_world.py 100%[===================>] 123 --.-KB/s in 0s
2020-11-01 19:11:46 (8.21 MB/s) - ‘hello_world.py’ saved [123/123]
import subprocess
result = subprocess.run(["wget", "--no-cache", "--backups=1", url], stderr=subprocess.PIPE, stdout=subprocess.PIPE)
print(result.stderr.decode("utf-8"))
--2020-11-01 19:11:46-- https://raw.githubusercontent.com/jckantor/cbe-virtual-laboratory/main/src/hello_world.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123 [text/plain]
Saving to: ‘hello_world.py’
0K 100% 7.32M=0s
2020-11-01 19:11:46 (7.32 MB/s) - ‘hello_world.py’ saved [123/123]
Let’s make a listing of the file’s content.
with open(pyfile, 'r') as f:
print(f.read())
def hello():
"""Print hello, world to demonstrate use of the source library."""
print("Hello, World")
return
Let’s import the file as a Python module and use the embedded functions. If the name of the file is fixed and known, then the usual Python import
statement will do the job.
import hello_world
help(hello_world)
hello_world.hello()
Help on module hello_world:
NAME
hello_world
FUNCTIONS
hello()
Print hello, world to demonstrate use of the source library.
FILE
/content/hello_world.py
Hello, World
If the name of the python file is given as the value of a Python string variable then the standard library importlib
may be used. Note the need to strip any suffix from a file name.
import importlib
mymodule = importlib.import_module(pyfile.rstrip(".py"))
help(mymodule)
mymodule.hello()
Help on module hello_world:
NAME
hello_world
FUNCTIONS
hello()
Print hello, world to demonstrate use of the source library.
FILE
/content/hello_world.py
Hello, World
For platforms where the shell escape !
might fail, an alternative is to use the standard Python subprocess
library.
Method 2. Cloning a git repository#
Downloading a collection of files from a git repository with wget
(or curl
) can be cumbersome, particularly if the names of the individual files are unknown or subject to change. And, unfortunately, Github does not provide an API for accessing a folder of files.
For these situations, an alternative is to simply clone the git repository to to a local directory.
import os
user = "jckantor"
repo = "cbe-virtual-laboratory"
# remove local directory if it already exists
if os.path.isdir(repo):
!rm -rf {repo}
!git clone https://github.com/{user}/{repo}.git
Cloning into 'cbe-virtual-laboratory'...
remote: Enumerating objects: 518, done.
remote: Counting objects: 100% (518/518), done.
remote: Compressing objects: 100% (286/286), done.
remote: Total 518 (delta 407), reused 318 (delta 221), pack-reused 0
Receiving objects: 100% (518/518), 406.64 KiB | 2.01 MiB/s, done.
Resolving deltas: 100% (407/407), done.
With the repository cloned to a local subdirectory of the same name, there are several useful strategies for importing from the source directory. The following cell demonstrates how to insert a repository source directory in Python path (if it doesn’t appear already).
import sys
src_dir = "src"
path = f"{repo}/{src_dir}"
if not path in sys.path:
sys.path.insert(1, path)
# list all directories in the Python path
print("\n".join(["'" + path + "'" for path in sys.path]))
''
'cbe-virtual-laboratory/src'
'/env/python'
'/usr/lib/python36.zip'
'/usr/lib/python3.6'
'/usr/lib/python3.6/lib-dynload'
'/usr/local/lib/python3.6/dist-packages'
'/usr/lib/python3/dist-packages'
'/usr/local/lib/python3.6/dist-packages/IPython/extensions'
'/root/.ipython'
The next stop is to import a python module from inside the library
import sys
src_dir = "src"
sys.path.insert(1, f"{repo}/{src_dir}")
import hello_world
hello_world.hello()
Hello, World
The following cell summaries these steps into a single cell that can be copied into a new notebook.
import os, sys, importlib
user = "jckantor"
repo = "cbe-virtual-laboratory"
src_dir = "src"
pyfile = "hello_world.py"
if os.path.isdir(repo):
!rm -rf {repo}
!git clone https://github.com/{user}/{repo}.git
path = f"{repo}/{src_dir}"
if not path in sys.path:
sys.path.insert(1, path)
mymodule = importlib.import_module(pyfile.rstrip(".py"))
help(mymodule)
Cloning into 'cbe-virtual-laboratory'...
remote: Enumerating objects: 518, done.
remote: Counting objects: 100% (518/518), done.
remote: Compressing objects: 100% (286/286), done.
remote: Total 518 (delta 407), reused 318 (delta 221), pack-reused 0
Receiving objects: 100% (518/518), 406.64 KiB | 2.15 MiB/s, done.
Resolving deltas: 100% (407/407), done.
Help on module hello_world:
NAME
hello_world
FUNCTIONS
hello()
Print hello, world to demonstrate use of the source library.
FILE
/content/hello_world.py
Commit and push changes#
A potential use case for cloning a repository is to allow for editing the source code directly from a Jupyter notebook. In this case, the code can be committed and pushed back to the reposity using standard git
commands.
Be sure you know what you’re doing before attempting this. This code has been commented out to avoid inadvertent changes to this repository’s source code.
import os
from getpass import getpass
import urllib
#password = getpass('Password: ')
#password = urllib.parse.quote(password)
#cmd_str = f"git -C https://{user}:{password}@github.com/{user}/{repo} push"
#os.system(cmd_string)
#!git -C /content/cbe-virtual-laboratory commit -m "update"
#!git -C /content/cbe-virtual-laboratory push
#cmd_str, password = "", "" # removing the password from the variable
Method 3. Using pip to install from a github repository#
The methods presented above assume the user has detailed knowledge of how functions have been organized into modules in the repository’s source directory. For simple applications, that may be satisfactory and those methods are fast and can work well. For more complex applications, however, it will be helpful to use common methods for creating Python software packages.
For this case we assume a file setup.py
has been included in the top-level directory of the repository that specifies how packages have been organized into source directories following using the setuptools library.
Assuming setup.py
is present and that the usual conventions for creating Python packages have been followed, the packages can be loaded directory from github as shown in the following cell.
user = 'jckantor'
repo = 'cbe-virtual-laboratory'
url = f"git+https://github.com/{user}/{repo}.git"
!pip install --upgrade {url}
Collecting git+https://github.com/jckantor/cbe-virtual-laboratory.git
Cloning https://github.com/jckantor/cbe-virtual-laboratory.git to /tmp/pip-req-build-4g0kdhj4
Running command git clone -q https://github.com/jckantor/cbe-virtual-laboratory.git /tmp/pip-req-build-4g0kdhj4
Building wheels for collected packages: cbelaboratory
Building wheel for cbelaboratory (setup.py) ... ?25l?25hdone
Created wheel for cbelaboratory: filename=cbelaboratory-0.0.0-cp36-none-any.whl size=2347 sha256=0246cdb88b2feb591c7b95ccdbb82e1a043f9fa7cfba4a03a8a4950c91028554
Stored in directory: /tmp/pip-ephem-wheel-cache-jbbfspkb/wheels/c9/9d/5c/f86f44683b875e91e4843a17cfa5b3f69cbf419d35ca09f247
Successfully built cbelaboratory
Installing collected packages: cbelaboratory
Found existing installation: cbelaboratory 0.0.0
Uninstalling cbelaboratory-0.0.0:
Successfully uninstalled cbelaboratory-0.0.0
Successfully installed cbelaboratory-0.0.0
from cbelaboratory.hello_world import hello
hello()
Hello, World
Summary and recommended practices#
Which of these methods should one use? While there is overlap in the functionality, there are some recommendations that can be make.
If you need to import just a few python files, the
wget
methd is easy to use and minimizes the amount of transmitted data.If you wish to import a whole folders of source code, creating a local clone of the repository is easy to code with
git
. Moveover, it is possible to edit, commit, and push code back to the repository directory from a notebook.For more complex projects where organization of the source code should decoupled from it’s use, the conventional packaging methods of Python should be used. The packages can be install from the github repository using
pip
.