Interacting with the Github API - a how-to guide

2022-08-18|By Łukasz Kopicki|Code

GitHub is in fact one of the best collaboration platforms with the Git Verison Control System at its heart. Nowadays, GitHub is not only used by IT teams but also by people not directly related to the IT area, like book writers, researchers and many others. However, in this article I will not explain what GitHub is and what it is used for. Instead, I will focus on automating some of the everyday use cases relating to GitHub.

Simple or not?

At first glance GitHub seems easy to use… You go to the Github website, create an account (if you don't already have one), create a repository, clone it locally and voilà - you are ready to write your code. Creating one repository looks cool, but what if you have 100 repositories, 20 teams and 200 members to create... Holy cow! It could take you a week or more.

And now you are probably thinking “Hmm... there are probably some ways to automate such actions”. Yeah, you are right. You can use tools that allow you a direct interaction with GitHub API. There are several options to consider in this area, I will focus on two solutions, namely:

  • Terraform GitHub provider
  • GitHub SDK PyGithub

What is GitHub API?

GitHub has created an awesome API which can be used by other tools or IT guys to interact with it. There are currently two stable versions of the GitHub API:

  • REST API
  • GraphQL API

This article deals exclusively with the GitHub REST API

. Using it, you can create and manage GitHub resources such as repositories, branches, issues, pull requests, teams, and many more.

Prerequisites

Authentication

First, you need to authenticate with GitHub API in some way. When it comes to authentication as well as authorization, there are several methods to concern. I'm going to cover OAuth Personal Access Tokens (PATs) option. PAT can easily be used by both tools discussed below.

Moreover, in official documentation you can find the following info:

While the API provides multiple methods for authentication, we strongly recommend using OAuth for production applications.

How to generate PAT in GitHub?

PAT can be generated in few simple ClickOps steps:


Step 1: Go to your GitHub profile Settings and then to Developer settings.

BEWAREOF DOG
Good practice for production environments is to create a GitHub's service account for which a token is then generated.

Step 2: Choose Personal access tokens pane and click on Generate new token.

Generate new Personal Access Token

Generate new Personal Access Token

Step 3: Provide relevant PAT details:

  • Fill the Note field.
  • To provide additional security, some expiration can be added to PAT.
Personal Access Token details

Personal Access Token details

  • Select proper scopes (access level) for that PAT. For this post, PAT should have repo and admin:org permissions.
    Personal Access Token scopes

    Personal Access Token scopes


BEWAREOF DOG
Choose the scopes carefully as some of these might be destructive.

Step 4: After selecting scopes, scroll down and click on generate token and save the token in password manager. It will be used in the upcoming parts of the article.


More detailed how-to can be found here

Environment variables

After successfully generating the token, you are probably wondering how it should be used... It won’t come as a big surprise as we simply pass it to the tools using appropriate environment variables.


In the examples section, we will focus on creating objects that are part of the GitHub organization, so the following variables should be declared:

  • GITHUB_TOKEN - previously generated PAT
  • GITHUB_OWNER - your GitHub organization name
BEWAREOF DOG
For real-life projects set envs in CI/CD tool (f.e. Spacelift).

Terraform GitHub provider

OK, after all the prerequisites have been met, we can move on to more interesting things. The first tool we will look at is Terraform GitHub provider

, and the good news is that it's really easy to get started.

For this post, GitHub provider will be used to perform the following actions:

  • add two GitHub users (members) to organization;
  • add organization team;
  • assign users to a team;
  • create two organization repositories.

First, you should create Terraform GitHub provider definition
terraform {
  required_providers {
    github = {
      source  = "integrations/github"
      version = "~> 4.26"
    }
  }
}

provider "github" {}
BEWAREOF DOG
For simplicity Terraform state is kept locally, but for serious projects remote state should be used.

GitHub organization members

Can you imagine organizations without users? Yeah, me neither. Come on, let's add some users to your GitHub organization.

resource "github_membership" "this" {
  for_each = var.users

  username = each.key
  role     = each.value
}

The example of input data for var.users:

users = {
  bob-hamster   = "admin"
  alice-alpaka  = "member"
}

GitHub organization team

To simplify the management of users and permissions, let's create a team.

resource "github_team" "this" {
  for_each = var.teams

  name        = each.key
  description = each.value.description
  privacy     = each.value.privacy
}

The example of input data for var.teams:

cloud-team = {
  description = "Team for cloud ninjas."
  privacy     = "closed"
}

Team membership

Once we have users and a team, we can proceed to the next step and assign users to this team.

resource "github_team_membership" "this" {
  for_each = { for membership in var.team_membership : "${membership.team}-${membership.user}" => membership }

  team_id  = github_team.this[each.value.team].id
  username = each.value.user
  role     = each.value.role
}

The example of input data for var.team_membership:

team_membership = [
  {
    user = "bob-hamster"
    role = "maintainer"
    team = "cloud-team"
  },
  {
    user = "alice-alpaka"
    role = "member"
    team = "cloud-team"
  }
]

Repositories

The next step is to create repositories where your code can be stored.

resource "github_repository" "this" {
  for_each = var.repositories

  name                   = each.key
  description            = each.value.description
  visibility             = each.value.visibility
  delete_branch_on_merge = each.value.delete_branch_on_merge
  allow_rebase_merge     = each.value.allow_rebase_merge
}

The example of input data for var.repositories:

repositories = {
  tf-muflon-app-dev = {
    description            = "Dev repo for muflon app."
    visibility             = "public"
    delete_branch_on_merge = true
    allow_rebase_merge     = true
  }
  tf-hamster-app-prod = {
    description            = "Prod repo for muflon app."
    visibility             = "private"
    delete_branch_on_merge = true
    allow_rebase_merge     = false
  }
}

Terraform init, plan & apply

Finally, when all resources are declared the Terraform's Big Three can be used.

Initialize Terraform working directory by the following command:

terraform init

To see what changes Terraform plans to make on GitHub run:

terraform plan

To apply changes on GitHub run:

terraform apply

GitHub SDK PyGithub

After a quick demo of using the Terraform GitHub provider, let's move on to an example of creating similar objects using PyGithub.

Due to the fact that PyGithub requires MORE code than Terraform to create GitHub's objects, the deployment will be limited to the following steps:

  • add GitHub user (member) to organization;
  • create two organization repositories;
  • add a branch to the repository.

What is PyGithub?

For a quick and clear answer on what PyGithub is, see the official documentation

:

PyGithub is a Python library to use the Github API v3. With it, you can manage your Github resources (repositories, user profiles, organizations, etc.) from Python scripts.

To start, we need to install the PyGithub package (for this post version 1.55 was used).

pip install PyGithub==1.55

Import the necessary packages.

import os
from typing import Union
from github import Github, GithubException, NamedUser

Next, we add the following function to the codebase. It is responsible for adding user with appropriate permissions to the GitHub organization.

def add_org_member(gh_token: str, org_name: str, user_config: dict) -> None:
    gh = Github(gh_token)
    
    user_name = user_config.get('user_name')
    role = user_config.get('role')
    try:
      gh_identity = gh.get_organization(org_name)
        member = gh.get_user(user_name)
        gh_identity.add_to_members(member=member, role=role)
    except GithubException as err:
        print(f'Error: {err.data}')

Perfect. Now, we can add a function that will create the desired repository in the GitHub organization.

def create_org_repo(gh_token: str, org_name: str, repo_config: dict) -> None:
    gh = Github(gh_token)
    
    protect_default_branch = repo_config.get('protect_default_branch', False)
    auto_init = repo_config.get('auto_init', False)
    repo_config.pop('protect_default_branch', None)
    try:
        gh_identity = gh.get_organization(org_name)
        repo = gh_identity.create_repo(**repo_config)
        default_branch = repo.default_branch
        if auto_init and protect_default_branch:
            branch = repo.get_branch(default_branch)
            branch.edit_protection(strict=True)
    except GithubException as err:
        print(f'Error: {err.data}')
BEWAREOF DOG
As you can notice, the repository can be initialized, and the protected branch can be added just in the code.

To demonstrate the power of Pygithub library, we will declare a function that creates a branch in the selected repository.

def add_repo_branch(gh_token: str, org_name: str, repo_name: str, source_branch: str, target_branch: str) -> None:
    gh = Github(gh_token)
    
    try:
        gh_identity = gh.get_organization(org_name)
        repo = gh_identity.get_repo(repo_name)
        src_branch = repo.get_branch(source_branch)
        repo.create_git_ref(ref=f'refs/heads/{target_branch}', sha=src_branch.commit.sha)
    except GithubException as err:
        print(f'Error: {err.data}')

Finally, we codify the main function. It will perform all the intended steps using the above declared functions.

def main() -> None:
    # Get environment variables
    GITHUB_TOKEN = os.getenv('GITHUB_TOKEN', '')
    GITHUB_OWNER = os.environ.get('GITHUB_OWNER', '')

    # Input values
    user = {
        'user_name': 'john-labrador',
        'role': 'member'
    }
    repos = [
        {
            'name': 'py-muflon-app-dev',
            'private': False,
            'description': 'Dev repo for muflon app.',
            'delete_branch_on_merge': True,
            'allow_rebase_merge': True,
            'auto_init': True,
            'protect_default_branch': True
        },
        {
            'name': 'py-muflon-app-prod',
            'private': True,
            'description': 'Prod repo for muflon app.',
            'delete_branch_on_merge': True,
            'allow_rebase_merge': False,
            'auto_init': False,
            'protect_default_branch': True
        }
    ]
    # Add member to organization
    add_org_member(gh_token=GITHUB_TOKEN, org_name=GITHUB_OWNER, user_config=user)
    # Create provided repositories
    for repo in repos:
        create_org_repo(gh_token=GITHUB_TOKEN, org_name=GITHUB_OWNER, repo_config=repo)
    # Add 'dev' branch to 'py-muflon-app-dev' repo
    add_repo_branch(gh_token=GITHUB_TOKEN, org_name=GITHUB_OWNER, repo_name='py-muflon-app-dev', source_branch='main', target_branch='dev')

The main function can be invoked inside the script with the following code.

if __name__ == '__main__':
    main()

Conclusion

As it usually happens when comparing two tools, both have some pros and cons. The choice should depend on your current use-case. Probably if you want to perform a quick one-time operation, you will choose the PyGithub library. On the other hand, if GitHub's objects are to be provisioned and maintained with IaC way, then you will choose Terraform.


Implementing some configuration changes can be much easier with Terraform as coding them with PyGithub will take a lot of time. Moreover, in the case of PyGithub, it would be necessary to implement some object state management feature, which in Terraform we have out-of-the-box.

And what is most interesting - both tools can be used together. You can write Python scripts which interact with GitHub API, retrieve and parse data in format that can be later used as a input values in Terraform config. This way PyGithub library can be irreplaceable to codify existing GitHub organization in Terraform.

References

LinkedInLinkedInLinkedIn
Łukasz Kopicki photo

About the author

Łukasz Kopicki