Wednesday, June 30, 2021

Traffic in Little Cottonwood Canyon

This is my comment on the Utah Department of Transportation's plans to "to provide an integrated transportation system that improves the reliability, mobility and safety for residents, visitors, and commuters who use S.R. 210."

This is long, but I have tried to order it in such a way that the most important points come first, so don't give up now.  At least read the first 3 paragraphs, please.

First and foremost I'd like to ask, what problem are we really trying to solve?  Roughly 355 days a year there are no reliability, mobility, or safety problems on S.R. 210.  The weather is good, the roads are clean and clear, and traffic flows at or above the speed limit of the road.  We all need to understand that the problems with reliability, mobility, and safety only happen about 10 days a year, if the skiers are lucky and we get that many big snow storms.

Mobility

Congestion on roads is annoying, but we need to seek to understand it before we try to fix it.  Congestion on a road happens because it leads to a popular place.  Lot's of people want to get to that place, so they get on that road.  The road gets congested and nobody can get to the popular place as fast as they could if there was no congestion.  This is what bothers us.  We have a road that could allow travel at a given speed, but because of the over crowding on the road, we all have to go slower than that speed.

Solutions to congestion are all temporary.  When a road is congested, there are some number of people that will simply choose not to go to the popular destination.  If you widen the road or add alternative means to get to the popular destination, at first the congestion will be alleviated, but before too long the people that were avoiding the popular place because of congestion will see that there is no congestion and they will start traveling to the popular place again.  Before too long you will have congestion again.  Anyone who has seen the progression of I-15 over the years here in Utah can understand this.  There will be more people getting to the popular destination than there were before, but there will still be congestion.

Understanding all that, we can better talk about what we are really doing.  We are not alleviating congestion (increasing mobility) long-term.  We are alleviating it short-term only, and we are providing the means for more people to reach the popular destination.  Is that really what we want in Little Cottonwood Canyon?  Can the ski resorts, hiking trails, picnic areas, climbing routes, etc. handle more people?  Or will they become congested too?

Reliability and Safety

These are essentially the same concern.  When it snows, cars and busses are less reliable because they might get stuck or slide off the road.  In extreme cases they might slide into each other or off the road which is a safety issue.  This is where I would like to point out how strange it is that UDOT has recently stopped talking about these concerns in Big Cottonwood Canyon (S.R. 190) and is now only talking about Little Cottonwood Canyon (S.R. 210).  I would really like to see data on reliability and safety in both canyons because in my following of the two it appears that S.R. 190 has far more accidents and slide offs than S.R. 210.  S.R. 190 is a much longer, windier road with areas of very steep drop-offs down to the creek.  I have noticed that S.R. 190 gets closed to deal with accidents (stranding skiers on the road or at the resorts for hours on end) far, far more often than S.R. 210.  Is any of this plan really concerned with reliability and safety?  If so, it should consider both canyons.

Bus Lanes vs. Gondola

Now, all that being said, let's address this specific plan which seems to assume that yes, the canyon can and should accommodate more people and is in dire need of more reliability and safety.  Considering all the above, I believe neither solution is a good idea.  Both will be incredibly costly and have very real negative impacts on the environment.  Neither will make a difference on the 355 good traffic days a year, and in the long run, neither will solve the congestion problems on the 10 bad days a year.  The one thing the gondola plan has going for it is the increased reliability and safety on those 10 bad days, but I see no data that justifies the extreme cost for what is likely to be only very small increase in reliability in safety in the one canyon that doesn't have that big of a reliability and safety problem anyway, while we ignore the other canyon that does have real reliability and safety problems (on those 10 days a year).

My belief is we should look for more cost effective ways to address the reliability and safety issues only, in both canyons(!), and not proceed with either a road widening or gondola project.

Friday, April 30, 2021

Fix for Cura's Ender 3 gcode

A child of mine finally asked for a 3d printer.  I knew that if I tried to push it, no kid would be interested, so I didn't.  But finally, one of them asked for one.  We ordered the Creality Ender 3 that night from their website and some filament from Amazon.  It all arrived a couple days later and we enjoyed the process of assembling it and then finally printing the gcode files that were on the SD card that came with the printer.  Including those was a really nice touch.  Once we got the bed close enough to the nozzle it all worked great.

After the initial success we found some models on thingiverse, sliced them with Cura, and then saw the printer do something like this (not my video) over and over.  Too much filament in the wrong place, no filament in the wrong place.  It was a strange and bewildering start-up sequence to watch.  I searched the internet for advice and didn't find much.  I finally just opened up the gcode file that Cura created and compared it to the gcode files that came with the printer.  There was definitely a more complicated start-up sequence in the Cura file.  I deleted it, replaced it with what came with the printer, and prints are all working again.

You can make this change permanent in Cura by clicking Settings->Printer->manage printer.  Then click on your printer and click the Machine Settings button.  In the text box on the left labeled "Start G-code" delete all the gcode there and replace it with this:


; Ender 3 Custom Start G-code
G28 ; home all axes
G29
G92 E0
G1 E-10.0000 F6000
G1 Z0.750 F1002
; process Process1
; layer 1, Z = 0.450
T0

Now your Cura-sliced prints will work nicely on your Ender 3.

Monday, March 15, 2021

Linux Environment Management: direnv does it all

 

Linux Environment Management: direnv does it all

A few years back I wrote about different options for linux environment management.  I recently learned about another option, direnv.  I think I'm convinced that it is the only tool you need.  Read this as if it's another section added to that previous post.

Use direnv

Straight from the direnv website: "direnv is an extension for your shell. It augments existing shells with a new feature that can load and unload environment variables depending on the current directory.  Before each prompt, direnv checks for the existence of a .envrc file in the current and parent directories. If the file exists (and is authorized), it is loaded"

This happens automatically, so it solves the problem of the "Explicit Environment Files" solution above in a way that is much more convenient than the "Per-command Environment Files" solution. The .envrc files are in standard shell syntax and it properly unloads environments like the "Smart Environment Manager Tool" mentioned above as well.  It has the downside that it is not easy to share the same environment setup in multiple directories.

I'm not sure if there is a simple solution that gives us both of those things, but we have the option with direnv to choose any of the three the discussed environment setup solutions in any given terminal.

Why not all three?

direnv is powerful enough to allow all three techniques for shell configuration described above.

Standard direnv

The default automatic direnv behavior is enabled by putting this in your .bashrc file:

eval "$(direnv hook bash)"

If you want to easily choose between standard direnv and the below option when you start a new shell you could encapsulate this in a shell function named direnvenable.  When your terminal starts up, you would run that function if you want standard direnv behavior.

The equivalent of Shell Initialization Files

To "source" a given .envrc file you can just spawn a subshell using the direnv exec command, passing it the path to a project and its .envrc file:

direnv exec $(readlink -f /path/to/git/clone) $SHELL -i

I would suggest wrapping this a shell function to make it easier.

The equivalent of Per-command Environment Files

This can work in conjunction with the automatic direnv behavior (you can run direnvenable and still use this for commands outside of any project directory).  This is a good way to run commands in scripts using the correct shell environment.  It's the same direnv exec call above prefixing any shell command:

direnv exec $(readlink -f /path/to/git/clone) <command>

I would also suggest wrapping this in a shell function to make it easier.


Conclusion

direnv gives you the power of never needing to manually source an environment setup script when you are working in a git clone of a project. It also gives you the ability to use project settings from a git clone in other directories if needed.

Saturday, October 17, 2020

Effectively Internet Filtering in 2020

(To skip my rambling intro and get to the nitty gritties, search this page for, "After that long introduction")

In college, back when the internet was young, I hated the clumsy ineffective internet filtering that was in place on campus. It often blocked sites that were perfectly fine, and did not catch all the sites of the type that they were trying to block. Fast forward 10 years or so and I saw my children stumbling upon some content that I didn't want them to see on my unfiltered home internet and my attitude changed a bit. Back then web filtering was pretty easy. Nothing was encrypted and DansGuardian was the go-to tool. You set up a transparent web proxy and DansGuardian would scan the entire content of every website that you downloaded in your home. Incriminating words and phrases would trigger its blocking and it would replace the website you were downloading with an explanatory message. The beauty was that there was no need to scour the web, categorize every website in the world, and maintain lists. It still had its false positives and if a website hand objectionable images but otherwise benign text there was nothing it could do, but it took the edge off the raw internet.

Today, it's not so easy. The HTTPS Everywhere campaign bothered me at first. It felt unnecessary, and it most definitely broke my DansGuardian filtering. I have since come to understand the importance and necessity of HTTPS and I'm very glad that Let's Encrypt has made it easy for all of us to use it. But I do still have kids.

DNS filtering came to the rescue, first with OpenDNS, and now I use CleanBrowsing. It's pretty good, but sometimes I want more control. One night our school had a parent-night presentation about internet safety for kids and they had invited some vendors to pitch their wares. One of them was RouterLimits. They had a small box that you simply connected to your network and it would filter internet traffic based on categories or individual sites you listed. No software or configuration of other hosts or your router required. It could also enforce time windows with no internet. "How is this possible when this box is just another client on the LAN?" I pressed their salesman. It was a small company and I think he was also an engineer because he realized I was one, and he slyly said to me, "ARP spoofing."

"That's evil!" I instinctively replied. And then I thought a little more about it and realized it was evil. Evil genius! I bought one right there. Their model was great. Pay $80 for $5 worth of hardware and you get their service for life. Plug the little box into your LAN and connect to their web service. The little box collects a list of hosts on the LAN by paying attention to broadcast traffic, then it floods each host with ARP replies to tell them all that it is the gateway and begins its Man in the Middle attack. If a kid tries to visit badsite.example.com, or any site during the time window when internet is configured to be off for their device, the RouterLimits box sees that and just drops the packet. If a kid tries to visit goodsite.example.com, the RouterLimits box simply forwards the packet along to the actual gateway. Simple and very effective.

The schedule was the thing I loved the most. I had never had that with DansGuardian or CleanBrowsing alone. Sadly, RouterLimits was bought by a bigger company that changed the business model to a yearly subscription. Also, right about the same time, the RouterLimits box lost its ability to block my Roku for some reason. Kids were watching Netflix late into the night on school nights again, dang it. I worked with the RouterLimits support team a bit, but they couldn't figure out what was going on. I wasn't super motivated to debug it myself, because I didn't want to start paying a regular fee for this service anyway.

I still wanted my kids kicked off the internet at a decent time on school nights, though, so I started looking for solutions. The first thing I tried was a pi-hole. It doesn't have scheduling built-in, but I was able to hack together a script that modified the pi-hole database directly to put my kids' devices into a group that had a blocklist that filtered everything. That mostly worked, but it was really a hack. And then my raspberry pi's SD card died and I didn't have a backup. I started looking for another solution. I remember ARP spoofing and did a little research. Sure enough, there is a tool called ettercap that make it pretty easy, especially if you just want to block everything.

After that long introduction, some nitty gritties. To run ettercap in text-mode and see what it can do, run this command:

sudo ettercap -Tq

Play around with it a bit, it's pretty cool.

To filter (perform a Man in the Middle Attack), you'll want to scan and save a list of hosts on the LAN, like so (change the 1000 to your user ID):

sudo env EC_UID=1000 ettercap -Tqk lan-hosts

To man-in-the-middle a host with IP 192.168.1.193, and if your gateway is 192.168.1.1, run this:

sudo ettercap -Tq -M arp:remote /192.168.1.1// /192.168.1.193//

For me that didn't really do anything because it simply forwarded the packets it was intercepting on to the gateway. To do something with the packets ettercap is intercepting, you need to create a filter. My filter is simple, just drop every packet:

drop();

Put that in a text file named drop-all.ecf and run this to compile the filter

etterfilter drop-all.ecf -o drop-all.ef

You can read the ettercap-filter man page for more information about what you can do. I image the RouterLimits box had some more interesting filters (assuming they were using ettercap).

Once you have your filter compiled, add it to the above ettercap command like so:

sudo ettercap -Tq --filter drop-all.ef -M arp:remote /192.168.1.1// /192.168.1.193//

You have successfully performed a Denial of Service attack against 192.168.1.193. If you have, for example, two kids devices you want to block, you need the lan-hosts file you made earlier, and you do this:

sudo ettercap -Tz -j lan-hosts --filter drop-all.ef -M arp:remote /192.168.1.1// /192.168.1.193\;192.168.1.221//

You can add as many ip addresses as you like to list, separated by semi-colons. As far as I can tell, they all need to be listed in lan-hosts too. I believe you could use MAC addresses instead of IP addresses, but I have my router giving out fixed IP addresses to all my kids devices (that was to make the pi-hole hack work), so I just use the IP addresses.

All that's left is to run ettercap with --daemon, make a cron job or systemd timer to start and stop it at the times you want to block your kids' internet access, and you are done! It just so happens that I have written an ansible playbook that does all this for you. You'll have to modify lan-hosts and the internet-stop.service to use your own devices MAC and IP addresses, then run ansible-playbook to deploy this to a raspberry pi (or some other linux box on your LAN that you leave on all the time) and you are good to go.

P.S. This even block the Roku. ettercap couldn't detect the Roku on my LAN like it could other hosts for some reason, so that's probably why Router Limits couldn't block it, but once I manually entered the Roku's IP and MAC into the lan-hosts file, ettercap was able to DoS it just like all the other hosts.

Tuesday, March 24, 2020

How To Retroactively Annex Files Already in a Git Repo


Table of Contents

How To Retroactively Annex Files Already in a Git Repo

In my last post I talked about how surprisingly easy it is to use git annex to manage your large binary files (or even small ones). In this post, I'm going to show how hard it is to go back and fix the mistake you made when you decided not to learn and use git annex at the start of your project. Learn from my mistake!

When I started developing the website for my business, I figured that editing history in git is easy, and I could just check in binary files (like the images) for now and fix it later. Well, it was starting to get a little sluggish, and I had some bigger binary files that I wanted to start keeping with the website code, so I figured the time had come. Once I decided on git annex, it was time to go edit that history.

First Tries: filter-branch, filter-repo

There is a very old page of instructions for doing this using git filter-branch. The first thing I noticed when I tried that was this message from git:

WARNING: git-filter-branch has a glut of gotchas generating mangled history
         rewrites.  Hit Ctrl-C before proceeding to abort, then use an
         alternative filtering tool such as 'git filter-repo'
         (https://github.com/newren/git-filter-repo/) instead.  See the
         filter-branch manual page for more details; to squelch this warning,
         set FILTER_BRANCH_SQUELCH_WARNING=1.

Yikes! A warning like that from a tool (git) that is already known for its gotchas is one I decided to take seriously. Besides, I'm always down to try the new hotness, so I started reading about git-filter-repo. The more I read and experimented, even dug into the source code, the more I came to understand that it could not do what I needed, sadly. Maybe someone will read this and correct me.

Success with git rebase –interactive

Not seeing a nice pre-built tool or command that could do this for me, I set out to manually edit the repository history using good ol' git rebase --interactive. First, I had to find the all the binary files that are in the repo (not just the ones in the current revision). Here's how I did it:

# The --stat=1000 is so it doesn't truncate anything
git log --stat=1000 | grep Bin | sort | uniq > binary-files

Note the comment. Isn't it cute that git log truncates long lines even when stdout is not connected to your terminal? There are lots of little annoying gotchas like that throughout this process. Makes me miss mercurial, but don't worry, I will try not to mention mercurial again.

Now, you'll still have duplicates in binary-files because the other stuff that git log --stat spits out on each line. I personally used some emacs commands to remove everything but the filename from each line of the binary-files file, and then did a sort and uniq again.

Next, I had to find each commit that modified any of these binary files. Here's how I did that:

for file in $(cat binary-files); do
    git log --pretty=oneline --follow -- $file >> commits;
 done

Then I did another sort and uniq on that. Luckily there were only about 15 commits. Phew.

Next I tried to find the earliest commit in the list I had, but that was a pain (don't…mention…mercurial…), so I just ran git rebase --interactive and gave it one of the first commits I made in the repository. I actually used emacs magit to start the rebase, but the surgery required throughout the process made me drop to the command-line for most of it. magit did make it really easy to mark the 15 commits from my commits file with an e though.

OK, once the rebase got rolling I ran into a few different scenarios. Commits that added a new binary file, commits that deleted binary files, commits that modified binary files, and a commit that moved binary files.

Added binary files

When a binary file was added, git would act like I have always seen rebase interactive work, it would show the normal thing:

Stopped at 53fc550...  some commit message here
You can amend the commit now, with

  git commit --amend 

Once you are satisfied with your changes, run

  git rebase --continue

In that case I did this:

git show --stat=1000 # to see binary (Bin) files
git rm --cached <the-binary-files>
git add <the-binary-files> # git annex will annex them
git commit --amend
git rebase --continue

Easy peasy, as long as you have set up annex like my previous post explains so that annexing happens automatically.

Deleted binary files

When a binary file was deleted, git would throw up a message like this up:

$ git rebase --continue
[detached HEAD 130bcc4] banner on each page now
 21 files changed, 190 insertions(+), 42 deletions(-)
 create mode 100644 msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg
 delete mode 100644 msd/webshop/static/webshop/img/common/file-icons.png
 create mode 100644 msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg
 create mode 100644 msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg
CONFLICT (modify/delete): msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg deleted in 90d71fb... refactored banners in pricing.css to reduce code duplication and modified in HEAD. Version HEAD of msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg left in tree.
error: could not apply 90d71fb... refactored banners in pricing.css to reduce code duplication
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 90d71fb... refactored banners in pricing.css to reduce code duplication

I guess in this case it was that I had added some new files too, so the message was extra verbose. The key message in all that was: "msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg deleted…" Here's what you do in this case:

git rm msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg
git diff --stat=1000 --staged # to find full paths for any Bin files
git restore --staged <binary-files>
git add <binary-files>
git diff --stat --staged # just to double check there are no Bin files now
git rebase --continue

Looks so simple (heh), but it took me a decent amount of web searching and experimentation to figure it out. All for you, dear reader, all for you.

Modified binary files

Here's one where I resized several images, git helpfully uttered:

$ git rebase --continue
[detached HEAD 7dfb28c] refactored banners in pricing.css to reduce code duplication
 4 files changed, 28 insertions(+), 75 deletions(-)
 create mode 100644 msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg
 delete mode 100644 msd/webshop/static/webshop/img/common/nick-fewings-ZJAnGFg-rM4-unsplash.jpg
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
warning: Cannot merge binary files: msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg (HEAD vs. a90710f... scaled images down to max width of 1920 pixels)
Auto-merging msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/yogesh-phuyal-mjwGKmwkDDA-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/umberto-jXd2FSvcRr8-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/max-duzij-qAjJk-un3BI-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/levi-saunders-1nz-KjRdg-s-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/kevin-ku-w7ZyuGYNpRQ-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/connor-betts-QK6Iwzd5MhE-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/alexandre-debieve-FO7JIlwjOtU-unsplash.jpg
Auto-merging msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg
CONFLICT (content): Merge conflict in msd/webshop/static/webshop/img/common/adi-goldstein-EUsVwEOsblE-unsplash.jpg
error: could not apply a90710f... scaled images down to max width of 1920 pixels
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply a90710f... scaled images down to max width of 1920 pixels

The trick to fixing this is to notice which commit it's trying to let you edit, which is in the last line of that message, and then checkout that version of each of the unmerged binary files it mentions, like so:

git status # to get the names of the unmerged binary files
git checkout a90710f <filenames>

Now you can do the same thing you did for the deleted file:

git restore --staged <filenames>
git add <filenames>
git diff --stat --staged # just to double check there are no Bin files now
git rebase --continue

Moved binary files

When I ran git log --follow to find all the commits that modified binary files, it flagged one where I had moved them. I'm not sure I actually had to edit that commit and I wonder if I would not have had this weird situation if I had not edited it. But for completeness, here's what I saw. Git rebase stopped to let me edit the commit and git annex printed out this message for every file that was moved:

git-annex: git status will show <filename> to be modified, since content availability has changed and git-annex was unable to update the index. This is only a cosmetic problem affecting git status; git add, git commit, etc won't be affected. To fix the git status display, you can run: git update-index -q --refresh <filename>

Sounds…quite weird. But git rebase would not continue until I did run the suggested command:

git update-index -q --refresh <filenames>
git rebase --continue

Dealing with Tags

Once the rebase was done I noticed that the tags I had all still pointed to the original commits. Oops. A quick internet search led me to this post about rebasing and moving tags to the new commits (written by a former co-worker, it just so happens). Too bad I didn't look for that before I rebased. I thought about redoing the whole rebase, but in the end I just wrote my own quick python script (using snippets from Nacho's) to take care of my specific situation. Here it is:

#! /usr/bin/env python
from subprocess import run, PIPE

tags = run(['git', 'show-ref', '--tags'],
           stdout=PIPE).stdout.decode('utf-8').splitlines()

tags_with_comments = {}
for tag in tags:
    tag_hash, tag_name = tag.split(' ')
    tag_name = tag_name.split('/')[-1]
    comment = run(['git', '--no-pager', 'show', '-s',
                   '--format=%s', tag_hash],
                  stdout=PIPE).stdout.decode('utf-8').splitlines()[-1]
    print(f'{tag_name}: {comment}')
    tags_with_comments[tag_name] = comment

commits = run(['git', 'log', '--oneline'],
              stdout=PIPE).stdout.decode('utf-8').splitlines()

for tag_name in tags_with_comments:
    for c in commits:
        commit_hash = c.split(' ')[0]
        comment = c.split(' ')[1:]
        comment = ' '.join(comment)
        if comment == tags_with_comments[tag_name]:
            run(['git', 'tag', '--force', tag_name, commit_hash])

Clean Up and Results

Well, with all that done, it was time to see how it all turned out. My original git repo was sitting at about 1.4 GB. This new repo was…3 GB!? Something wasn't right. Here are some steps I took to clean it up after making sure there weren't any old branches or remotes laying around:

git clean -fdx
git annex fsck
git fsck
git reflog expire --verbose --expire=0 --all
git gc --prune=0

The git clean command showed that I had a weird leftover .git directory in another directory somehow, so I deleted that. I don't think the fsck commands really did anything, but the gc definitely did. Size was now down to 985 MB. Much better. Wait a minute, what if I did a git gc on the original repo? It's size went down to 984 MB. Oh shoot. I guess it makes sense though, if both git and git annex are storing full versions of each binary file they would end up the same size. The real win is the faster git operations, especially clones.

A local git clone now happens in the blink of an eye, and its size is only 153 MB. Now, that's a little unfair because it doesn't have any of the binary files. After a git annex get to get the binary files for the current checkout it goes up to 943 MB. Not a huge savings, but it only gets better as time goes on and more edits happen. Right? This was all worth it, wasn't it?!

Let me know in the comments if this is helpful, hurtful, or if I did this totally wrong.