Git Annex is Great


UPDATE: With current versions of git, I no longer recommend git annex or git LFS unless you really need to store your large files on a separate server from your git repository. For the simple use I describe here, just add your large files to git like any other file and when you clone, you can avoid downloading the full repository history with git clone --filter=blob:none and use git as normal.

I'm developing the website for my business and I have a mix of code an images in my git repository. Since everyone seems to know that you shouldn't keep large binary files in your git repo, I decide to see what the current solutions to that problem are.

After doing a little bit of searching, I narrowed things down to git lfs and git annex. Git lfs looks so nice and simple, except I'm not using github. Sure, you can set up your own central git lfs server yourself, but that sounded suddenly not so nice and simple.

The website for git annex immediately hits you with all the power and flexibility that it has, and I was turned off by that complexity. I did like that it doesn't require any kind of central server for me to set up, so I didn't reject it outright. After some digging I found out that it does actually support a usage that looks a lot like git lfs, where you can configure it to automatically manage certain sets of files and then you just use git commands like normal. This is very nice. Here's how you set it up in your git repo (hopefully before you have committed any binary files to git, see my next blog post about fixing that):

git annex init
git annex config --set annex.largefiles 'mimeencoding=binary and largerthan=1b'

That's it! Now just use git commands like normal and annex will take care of binary files for you. The only tricky part to setting this up was figuring out that empty files, like those __init__.py files that Django creates, were considered binary files. That's why I had to add the and largerthan clause.

There are a couple of other things you might need to be aware of:

  • When you clone, none of the binary files will get copied to your clone until you run git annex get, and that will only copy over the files for the current commit. If you checkout an older commit or another branch, you might need to run git annex get again.
  • If you do start collaborating with others you'll have to make sure that their git annex get command can access the binary files. That's where you have many many options for setting up network communication that can work for your team. I have not delved into the details of that yet.
  • If you try to delete a clone, you'll discover that the annex files down under .git are read-only. Using sudo, or changing permissions with chmod will fix that.

Aside from those things I haven't noticed any trickiness with git annex. It seems like a great tool.

Comments

Popular posts from this blog

SystemVerilog Fork Disable "Gotchas"

'git revert' Is Not Equivalent To 'svn revert'

SystemVerilog Streaming Operator: Knowing Right from Left