Saturday, October 17, 2020

Effectively Internet Filtering in 2020

(To skip my rambling intro and get to the nitty gritties, search this page for, "After that long introduction")

In college, back when the internet was young, I hated the clumsy ineffective internet filtering that was in place on campus. It often blocked sites that were perfectly fine, and did not catch all the sites of the type that they were trying to block. Fast forward 10 years or so and I saw my children stumbling upon some content that I didn't want them to see on my unfiltered home internet and my attitude changed a bit. Back then web filtering was pretty easy. Nothing was encrypted and DansGuardian was the go-to tool. You set up a transparent web proxy and DansGuardian would scan the entire content of every website that you downloaded in your home. Incriminating words and phrases would trigger its blocking and it would replace the website you were downloading with an explanatory message. The beauty was that there was no need to scour the web, categorize every website in the world, and maintain lists. It still had its false positives and if a website hand objectionable images but otherwise benign text there was nothing it could do, but it took the edge off the raw internet.

Today, it's not so easy. The HTTPS Everywhere campaign bothered me at first. It felt unnecessary, and it most definitely broke my DansGuardian filtering. I have since come to understand the importance and necessity of HTTPS and I'm very glad that Let's Encrypt has made it easy for all of us to use it. But I do still have kids.

DNS filtering came to the rescue, first with OpenDNS, and now I use CleanBrowsing. It's pretty good, but sometimes I want more control. One night our school had a parent-night presentation about internet safety for kids and they had invited some vendors to pitch their wares. One of them was RouterLimits. They had a small box that you simply connected to your network and it would filter internet traffic based on categories or individual sites you listed. No software or configuration of other hosts or your router required. It could also enforce time windows with no internet. "How is this possible when this box is just another client on the LAN?" I pressed their salesman. It was a small company and I think he was also an engineer because he realized I was one, and he slyly said to me, "ARP spoofing."

"That's evil!" I instinctively replied. And then I thought a little more about it and realized it was evil. Evil genius! I bought one right there. Their model was great. Pay $80 for $5 worth of hardware and you get their service for life. Plug the little box into your LAN and connect to their web service. The little box collects a list of hosts on the LAN by paying attention to broadcast traffic, then it floods each host with ARP replies to tell them all that it is the gateway and begins its Man in the Middle attack. If a kid tries to visit, or any site during the time window when internet is configured to be off for their device, the RouterLimits box sees that and just drops the packet. If a kid tries to visit, the RouterLimits box simply forwards the packet along to the actual gateway. Simple and very effective.

The schedule was the thing I loved the most. I had never had that with DansGuardian or CleanBrowsing alone. Sadly, RouterLimits was bought by a bigger company that changed the business model to a yearly subscription. Also, right about the same time, the RouterLimits box lost its ability to block my Roku for some reason. Kids were watching Netflix late into the night on school nights again, dang it. I worked with the RouterLimits support team a bit, but they couldn't figure out what was going on. I wasn't super motivated to debug it myself, because I didn't want to start paying a regular fee for this service anyway.

I still wanted my kids kicked off the internet at a decent time on school nights, though, so I started looking for solutions. The first thing I tried was a pi-hole. It doesn't have scheduling built-in, but I was able to hack together a script that modified the pi-hole database directly to put my kids' devices into a group that had a blocklist that filtered everything. That mostly worked, but it was really a hack. And then my raspberry pi's SD card died and I didn't have a backup. I started looking for another solution. I remember ARP spoofing and did a little research. Sure enough, there is a tool called ettercap that make it pretty easy, especially if you just want to block everything.

After that long introduction, some nitty gritties. To run ettercap in text-mode and see what it can do, run this command:

sudo ettercap -Tq

Play around with it a bit, it's pretty cool.

To filter (perform a Man in the Middle Attack), you'll want to scan and save a list of hosts on the LAN, like so (change the 1000 to your user ID):

sudo env EC_UID=1000 ettercap -Tqk lan-hosts

To man-in-the-middle a host with IP, and if your gateway is, run this:

sudo ettercap -Tq -M arp:remote / /

For me that didn't really do anything because it simply forwarded the packets it was intercepting on to the gateway. To do something with the packets ettercap is intercepting, you need to create a filter. My filter is simple, just drop every packet:


Put that in a text file named drop-all.ecf and run this to compile the filter

etterfilter drop-all.ecf -o drop-all.ef

You can read the ettercap-filter man page for more information about what you can do. I image the RouterLimits box had some more interesting filters (assuming they were using ettercap).

Once you have your filter compiled, add it to the above ettercap command like so:

sudo ettercap -Tq --filter drop-all.ef -M arp:remote / /

You have successfully performed a Denial of Service attack against If you have, for example, two kids devices you want to block, you need the lan-hosts file you made earlier, and you do this:

sudo ettercap -Tz -j lan-hosts --filter drop-all.ef -M arp:remote / /\;

You can add as many ip addresses as you like to list, separated by semi-colons. As far as I can tell, they all need to be listed in lan-hosts too. I believe you could use MAC addresses instead of IP addresses, but I have my router giving out fixed IP addresses to all my kids devices (that was to make the pi-hole hack work), so I just use the IP addresses.

All that's left is to run ettercap with --daemon, make a cron job or systemd timer to start and stop it at the times you want to block your kids' internet access, and you are done! It just so happens that I have written an ansible playbook that does all this for you. You'll have to modify lan-hosts and the internet-stop.service to use your own devices MAC and IP addresses, then run ansible-playbook to deploy this to a raspberry pi (or some other linux box on your LAN that you leave on all the time) and you are good to go.

P.S. This even block the Roku. ettercap couldn't detect the Roku on my LAN like it could other hosts for some reason, so that's probably why Router Limits couldn't block it, but once I manually entered the Roku's IP and MAC into the lan-hosts file, ettercap was able to DoS it just like all the other hosts.

Tuesday, March 24, 2020

How To Retroactively Annex Files Already in a Git Repo

Table of Contents

How To Retroactively Annex Files Already in a Git Repo

In my last post I talked about how surprisingly easy it is to use git annex to manage your large binary files (or even small ones). In this post, I'm going to show how hard it is to go back and fix the mistake you made when you decided not to learn and use git annex at the start of your project. Learn from my mistake!

When I started developing the website for my business, I figured that editing history in git is easy, and I could just check in binary files (like the images) for now and fix it later. Well, it was starting to get a little sluggish, and I had some bigger binary files that I wanted to start keeping with the website code, so I figured the time had come. Once I decided on git annex, it was time to go edit that history.

First Tries: filter-branch, filter-repo

There is a very old page of instructions for doing this using git filter-branch. The first thing I noticed when I tried that was this message from git:

WARNING: git-filter-branch has a glut of gotchas generating mangled history
         rewrites.  Hit Ctrl-C before proceeding to abort, then use an
         alternative filtering tool such as 'git filter-repo'
         ( instead.  See the
         filter-branch manual page for more details; to squelch this warning,

Yikes! A warning like that from a tool (git) that is already known for its gotchas is one I decided to take seriously. Besides, I'm always down to try the new hotness, so I started reading about git-filter-repo. The more I read and experimented, even dug into the source code, the more I came to understand that it could not do what I needed, sadly. Maybe someone will read this and correct me.

Success with git rebase –interactive

Not seeing a nice pre-built tool or command that could do this for me, I set out to manually edit the repository history using good ol' git rebase --interactive. First, I had to find the all the binary files that are in the repo (not just the ones in the current revision). Here's how I did it:

# The --stat=1000 is so it doesn't truncate anything
git log --stat=1000 | grep Bin | sort | uniq > binary-files

Note the comment. Isn't it cute that git log truncates long lines even when stdout is not connected to your terminal? There are lots of little annoying gotchas like that throughout this process. Makes me miss mercurial, but don't worry, I will try not to mention mercurial again.

Now, you'll still have duplicates in binary-files because the other stuff that git log --stat spits out on each line. I personally used some emacs commands to remove everything but the filename from each line of the binary-files file, and then did a sort and uniq again.

Next, I had to find each commit that modified any of these binary files. Here's how I did that:

for file in $(cat binary-files); do
    git log --pretty=oneline --follow -- $file >> commits;

Then I did another sort and uniq on that. Luckily there were only about 15 commits. Phew.

Next I tried to find the earliest commit in the list I had, but that was a pain (don't…mention…mercurial…), so I just ran git rebase --interactive and gave it one of the first commits I made in the repository. I actually used emacs magit to start the rebase, but the surgery required throughout the process made me drop to the command-line for most of it. magit did make it really easy to mark the 15 commits from my commits file with an e though.

OK, once the rebase got rolling I ran into a few different scenarios. Commits that added a new binary file, commits that deleted binary files, commits that modified binary files, and a commit that moved binary files.

Added binary files

When a binary file was added, git would act like I have always seen rebase interactive work, it would show the normal thing:

Stopped at 53fc550...  some commit message here
You can amend the commit now, with

  git commit --amend 

Once you are satisfied with your changes, run

  git rebase --continue

In that case I did this:

git show --stat=1000 # to see binary (Bin) files
git rm --cached <the-binary-files>
git add <the-binary-files> # git annex will annex them
git commit --amend
git rebase --continue

Easy peasy, as long as you have set up annex like my previous post explains so that annexing happens automatically.

Deleted binary files

When a binary file was deleted, git would throw up a message like this up:

$ git rebase --continue
[detached HEAD 130bcc4] banner on each page now
 21 files changed, 190 insertions(+), 42 deletions(-)
 create mode 100644 msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg
 delete mode 100644 msd/webshop/static/webshop/img/common/file-icons.png
 create mode 100644 msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg
CONFLICT (modify/delete): msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg deleted in 90d71fb... refactored banners in pricing.css to reduce code duplication and modified in HEAD. Version HEAD of msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg left in tree.
error: could not apply 90d71fb... refactored banners in pricing.css to reduce code duplication
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 90d71fb... refactored banners in pricing.css to reduce code duplication

I guess in this case it was that I had added some new files too, so the message was extra verbose. The key message in all that was: "msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg deleted…" Here's what you do in this case:

git rm msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg
git diff --stat=1000 --staged # to find full paths for any Bin files
git restore --staged <binary-files>
git add <binary-files>
git diff --stat --staged # just to double check there are no Bin files now
git rebase --continue

Looks so simple (heh), but it took me a decent amount of web searching and experimentation to figure it out. All for you, dear reader, all for you.

Modified binary files

Here's one where I resized several images, git helpfully uttered:

$ git rebase --continue
[detached HEAD 7dfb28c] refactored banners in pricing.css to reduce code duplication
 4 files changed, 28 insertions(+), 75 deletions(-)
 create mode 100644 msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg
 delete mode 100644 msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
Auto-merging msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg
error: could not apply a90710f... scaled images down to max width of 1920 pixels
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply a90710f... scaled images down to max width of 1920 pixels

The trick to fixing this is to notice which commit it's trying to let you edit, which is in the last line of that message, and then checkout that version of each of the unmerged binary files it mentions, like so:

git status # to get the names of the unmerged binary files
git checkout a90710f <filenames>

Now you can do the same thing you did for the deleted file:

git restore --staged <filenames>
git add <filenames>
git diff --stat --staged # just to double check there are no Bin files now
git rebase --continue

Moved binary files

When I ran git log --follow to find all the commits that modified binary files, it flagged one where I had moved them. I'm not sure I actually had to edit that commit and I wonder if I would not have had this weird situation if I had not edited it. But for completeness, here's what I saw. Git rebase stopped to let me edit the commit and git annex printed out this message for every file that was moved:

git-annex: git status will show <filename> to be modified, since content availability has changed and git-annex was unable to update the index. This is only a cosmetic problem affecting git status; git add, git commit, etc won't be affected. To fix the git status display, you can run: git update-index -q --refresh <filename>

Sounds…quite weird. But git rebase would not continue until I did run the suggested command:

git update-index -q --refresh <filenames>
git rebase --continue

Dealing with Tags

Once the rebase was done I noticed that the tags I had all still pointed to the original commits. Oops. A quick internet search led me to this post about rebasing and moving tags to the new commits (written by a former co-worker, it just so happens). Too bad I didn't look for that before I rebased. I thought about redoing the whole rebase, but in the end I just wrote my own quick python script (using snippets from Nacho's) to take care of my specific situation. Here it is:

#! /usr/bin/env python
from subprocess import run, PIPE

tags = run(['git', 'show-ref', '--tags'],

tags_with_comments = {}
for tag in tags:
    tag_hash, tag_name = tag.split(' ')
    tag_name = tag_name.split('/')[-1]
    comment = run(['git', '--no-pager', 'show', '-s',
                   '--format=%s', tag_hash],
    print(f'{tag_name}: {comment}')
    tags_with_comments[tag_name] = comment

commits = run(['git', 'log', '--oneline'],

for tag_name in tags_with_comments:
    for c in commits:
        commit_hash = c.split(' ')[0]
        comment = c.split(' ')[1:]
        comment = ' '.join(comment)
        if comment == tags_with_comments[tag_name]:
            run(['git', 'tag', '--force', tag_name, commit_hash])

Clean Up and Results

Well, with all that done, it was time to see how it all turned out. My original git repo was sitting at about 1.4 GB. This new repo was…3 GB!? Something wasn't right. Here are some steps I took to clean it up after making sure there weren't any old branches or remotes laying around:

git clean -fdx
git annex fsck
git fsck
git reflog expire --verbose --expire=0 --all
git gc --prune=0

The git clean command showed that I had a weird leftover .git directory in another directory somehow, so I deleted that. I don't think the fsck commands really did anything, but the gc definitely did. Size was now down to 985 MB. Much better. Wait a minute, what if I did a git gc on the original repo? It's size went down to 984 MB. Oh shoot. I guess it makes sense though, if both git and git annex are storing full versions of each binary file they would end up the same size. The real win is the faster git operations, especially clones.

A local git clone now happens in the blink of an eye, and its size is only 153 MB. Now, that's a little unfair because it doesn't have any of the binary files. After a git annex get to get the binary files for the current checkout it goes up to 943 MB. Not a huge savings, but it only gets better as time goes on and more edits happen. Right? This was all worth it, wasn't it?!

Let me know in the comments if this is helpful, hurtful, or if I did this totally wrong.

Git Annex is Great

I'm developing the website for my business and I have a mix of code an images in my git repository. Since everyone seems to know that you shouldn't keep large binary files in your git repo, I decide to see what the current solutions to that problem are.

After doing a little bit of searching, I narrowed things down to git lfs and git annex. Git lfs looks so nice and simple, except I'm not using github. Sure, you can set up your own central git lfs server yourself, but that sounded suddenly not so nice and simple.

The website for git annex immediately hits you with all the power and flexibility that it has, and I was turned off by that complexity. I did like that it doesn't require any kind of central server for me to set up, so I didn't reject it outright. After some digging I found out that it does actually support a usage that looks a lot like git lfs, where you can configure it to automatically manage certain sets of files and then you just use git commands like normal. This is very nice. Here's how you set it up in your git repo (hopefully before you have committed any binary files to git, see my next blog post about fixing that):

git annex init
git annex config --set annex.largefiles 'mimeencoding=binary and largerthan=1b'

That's it! Now just use git commands like normal and annex will take care of binary files for you. The only tricky part to setting this up was figuring out that empty files, like those files that Django creates, were considered binary files. That's why I had to add the and largerthan clause.

There are a couple of other things you might need to be aware of:

  • When you clone, none of the binary files will get copied to your clone until you run git annex get, and that will only copy over the files for the current commit. If you checkout an older commit or another branch, you might need to run git annex get again.
  • If you do start collaborating with others you'll have to make sure that their git annex get command can access the binary files. That's where you have many many options for setting up network communication that can work for your team. I have not delved into the details of that yet.
  • If you try to delete a clone, you'll discover that the annex files down under .git are read-only. Using sudo, or changing permissions with chmod will fix that.

Aside from those things I haven't noticed any trickiness with git annex. It seems like a great tool.

Wednesday, January 22, 2020

Millennium Discount Code

I promise I'm not turning this into a spam blog for my business (see the business twitter account for all my self-promotion), but I want to get the word out about early-adopter discount codes that I have made available.  I'll start an MSD specific blog and continue promoting things there as well.

The discount code is for $99 off, which is a free Self-support subscription.  There are only 20 purchases available with this code:


Please give it a try, download and install the product, go through the Hello World tutorial, let me know how it all works. Thank you!

Tuesday, January 21, 2020

MSD: A Red Hat-like business for Open Source EDA (Verilog, VHDL, etc.)

I have been working with commercial EDA tools (Verilog, VHDL, etc.) for years and always found them to be quite overpriced and frustrating.  Whenever I bring up the idea of using open source tools I get responses just like I got when first brining up Linux in the workplace back in 2001.  Things like, "you get what you pay for" and "it's only free if you don't value your time."  Red Hat (and SUSE/Novell) addressed those concerns for business people and made Linux mainstream (and put UNIX out of business and made a lot money for themselves in the process).  Maybe a similar business could do the same for open source EDA.

To that end, I have quit my job and I've spent the past few week putting this business together.  What do you think? (AKA,