How to Remove Sensitive Data and Plaintext Secrets From GitHub | by Miguel A. Calles MBA | Jul, 2022

Hold your repository clear to keep away from getting hacked

Photograph by Stillness InMotion on Unsplash

Some builders like to code quick and lower corners, and I’m responsible of it too! This implies delicate knowledge (e.g., plaintext secrets and techniques, Software Programming Interface [API] keys, passwords, and so on.) would possibly unintentionally get dedicated to your git repository. This is likely to be superb if you’re growing regionally, however this is usually a drawback when utilizing a hosted service like GitHub.

Learn the Secjuice Squeeze Volume 7 which comprises a narrative a couple of Starbucks API key being discovered on GitHub.

I used to be engaged on a number of git repositories, most of which had delicate knowledge dedicated to them. That they had API keys, AWS keys, passwords, you identify it! As a safety engineer, I wished to treatment this. It appeared like lots of work. There have been a number of repositories, however that they had these “soiled” commits going again years. I’ll share how I cleaned the soiled commits by eradicating their secrets and techniques.

I used the official documentation from GitHub to get began. I learn by means of it, and it appears easy sufficient. I simply wanted to make use of BFG Repo-Cleaner, and ask the builders to delete the repository and clone it once more. Piece of cake! Or so I assumed.

https://help.github.com/en/articles/removing-sensitive-data-from-a-repository

BFG Repo-Cleaner is a Java program that makes use of git filter department to switch current commits and substitute the content material. Git filter department is a slightly tedious course of (see the GitHub assist doc above), so I’m glad BFG simplifies it.

Under is the method I used to scrub my commits.

1) I downloaded the Java software to my ~/Downloads folder.

2) I created a ~/Paperwork/bfg-secrets-all.txt file. I made certain to place this exterior of my git repositories to keep away from committing it by chance and defeating the aim of this train!

I added one line for every secret I wished to scrub. Every line should begin with both regex: or glob: and I made a decision to make use of common expressions for simplicity and familiarity.

regex:8cea3229-09cd-4b89-9dce-f0f9b0697406
regex:815e9bc4-d795-4961-ab8b-50ddf8a391fe

I looked for particular secrets and techniques, however I may have used precise common expressions.

regex:w8-w4-w4-w4-w12

3) I painstakingly eliminated all of the secrets and techniques from every repository. I leveraged environment variables, AWS Key Management Service, and dot files to maneuver the delicate knowledge out of the dedicated information.

4) I went to department safety guidelines within the GitHub repository settings and enabled drive pushes.

5) I ran the next command to examine for soiled commits.

java -jar ~/Downloads/bfg-1.13.0.jar --replace-text ~/Paperwork/bfg-secrets-all.txt

It could both say there are not any soiled commits or print out an inventory of soiled commits. See the sanitized instance output.

Commit Tree-Dust Historical past
------------------------
Earliest Newest
| |
..DDDDDDDDDDDDDDDDDDDDDDDDDDDDmmDmmDDDDmDDDDDDmmmmmmmmmmmmDD
D = soiled commits (file tree mounted)
m = modified commits (commit message or mother and father modified)
. = clear commits (no adjustments to file tree)
Earlier than After
-------------------------------------------
First modified commit | 06f9e3e4 | cc990b18
Final soiled commit | e587f82e | f7ded7dc

6) I then pushed up all of the adjustments.

git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push origin --force --all
git push origin --force --tags

7) Requested the builders to delete the repository and clone it once more.

8) I visited the repository on GitHub and made certain a commit that used to appear like this:

- apiKey = 'placeholder';
+ apiKey = '8cea3229-09cd-4b89-9dce-f0f9b0697406';

Now it appeared like this:

- apiKey = 'placeholder';
+ apikey = '***REMOVED***';

9) I celebrated as a result of I assumed I used to be performed.

A short time later, I switched to a special and outdated department. I occurred to see an API key in plaintext within the commit historical past. BFG Repo-Cleaner mentioned it cleaned the commit historical past!

I spotted BFG Repo-Cleaner solely cleans the checked-out git department. Is sensible. That is in step with the whole git workflow.

I needed to repeat the BFG course of for each department and ask the builders to delete their repositories and clone them once more.

At the very least now all of the branches are cleaned. My worries are actually over.

I used to be visiting an outdated pull request (PR) and noticed an API key in plaintext within the commit historical past. Once more! I cleaned each department with BFG Repo-Cleaner. What’s going on?! Idiot me as soon as; disgrace on you. Idiot me twice; disgrace on me.

It seems GitHub PRs are unbiased of the git repository. This appears apparent as a result of a PR is an exterior doc permitting reviewers to approve whether or not one department ought to merge into one other. When the PR is permitted and merged, GitHub performs the git merge perform.

BFG Repo-Cleaner was designed for git repositories and never GitHub pull requests. In consequence, I wanted to take away all these PRs and their commits manually.

After the second PR and 40 commits later, I spotted manually checking a whole bunch of PRs and 1000’s of commits for secrets and techniques can be troublesome and liable to human error.

I made a decision I wanted an automatic technique to discover all of the PRs and the commits which have the soiled commits. I made a decision to make use of the GitHub API.

I can’t share the script I wrote on account of copyright causes. I’m describing the thought course of I used to construct the script.

1) I created a personal access token.

2) I created a Node.js script to check to token.

mkdir myscript
npm init -y
npm set up github-api
contact index.js
/* index.js */
'use strict'
const GitHub = require('github-api');
const gh = new GitHub( token );

3) I used to be utilizing a GitHub group for all of the repositories. I up to date the script to get all of the repositories. The script listed all of the repositories.

gh.getOrganization(orgName);
const repos = org.getRepos();
console.log(repos);

4) I picked one repository.

const repo = gh.getRepo(repos[0].proprietor.login, repos[0].identify);

5) I obtained all its PRs.

const prs = repo.listPullRequests(choices);

6) I picked one PR.

const pr = prs[0];

7) I obtained all of the information created, modified, and deleted within the PR.

const information = repo.listPullRequestFiles(repo, pr.quantity);

8) I learn the bfg-secrets-all.txt file I used earlier.

9) I searched the diffs in every file utilizing the bfg-secrets-all.txt file I used earlier, and created a CSV output.

10) I’d replace the code to iterate by means of each repository, PRs, and file.

11) I contacted GitHub Support to delete both the whole PR or the monitoring refs, and you should utilize the GitHub API to get that data too (see beneath for an instance script).

I waited a few weeks after I cleaned the repositories and ran BFG Repo-Cleaner on the repositories once more. I discovered some repositories had delicate knowledge once more. I realized {that a} developer forgot to delete the repository and pushed a commit utilizing the uncleaned repository.

It’s a good suggestion to examine the repositories after time passes to make sure they’re clear.

It appears this might be a unending battle: I clear, a developer unintentionally commits a secret, I discover it by chance, I clear once more, and the cycle repeats. I wished a course of to assist forestall this within the first place.

I made a decision to make use of git hooks to examine a commit earlier than it commits. I made a decision to examine the pre-commit git hook.

1) I created an executable pre-commit script.

contact .git/hooks/pre-commit
chmod +x .git/hoooks/pre-commit
# pre-commit
#!/bin/sh
if $(grep -rqE "w8-w4-w4-w4-w12" *) ; then
echo 'Discovered an identical secret'
exit 1
fi

2) I created a file with a secret to check it.

echo 8cea3229-09cd-4b89-9dce-f0f9b0697406 > secres.txt
git commit -a -m 'Testing'

I received the next output, and the file was not dedicated.

Discovered an identical secret

3) I wanted a technique to make this a part of the repository. For the time being, it solely will work on my machine. I leveraged that every repository was for a Node.js mission. I added a post-install script to make sure the git hook script will work on each developer’s machine.

I up to date the bundle.json file.


"scripts":
"postinstall": "git config core.hooksPath .githooks"

4) I copied the git hook script to a committable listing and dedicated it; you can not commit information within the .git listing.

mkdir .githooks
mv .git/hooks/pre-commit .githooks
git add .githooks/pre-commit
git commit -m "Added pre-commit hook script."

5) All of the builders want to drag the newest code and run the npm set up command on their machine.

6) One other strategy is to permit npm set up to repeat the hook to the .git/hooks listing.


"scripts":
"postinstall": "cp .githooks/* .git/hooks"

Committing delicate knowledge and plaintext secrets and techniques to a GitHub repository can weaken your safety posture, and it takes effort to scrub it after the actual fact.

You should use the BFG Repo-Cleaner to scrub the secrets and techniques in your commit historical past. Make certain to scrub each department, drive push the adjustments, and run BFG once more after time passes to make sure delicate knowledge was not re-introduced.

You might discover delicate knowledge in GitHull pull requests after utilizing BFG. You should use the GitHub API to seek out pull requests with delicate knowledge. Ship these findings to GitHub Support and ask them to delete the pull requests or their monitoring references.

You should use a git pre-commit hooks to assist forestall committing delicate knowledge.

More Posts