pywb 2.0 - docker quickstart
Four years have passed since i first wrote of pywb: it was a young tool at the time, but already usable and extremely simple to deploy.
Since then a lot of works has been done by Ilya Kreymer (and others), resulting in all the new features available with the 2.0 release.
Also, some very big webarchiving initiatives have moved and used pywb in these years: Webrecorder itself, Rhizome, Perma, Arquivo PT in Portugal, the Italian National Library in Florence (Italy), (others i’m missing).
For many years i’ve used pywb for my personal private webarchive on a shared host, with the setup described here. Nowadays actually shared hosts are well defunct, and cloud virtual machines are even more cheap.
The simplest way you can use pywb today for your own instance is probably docker. Here a quick tutorial:
pull the docker image
docker pull webrecorder/pywb
create a directory to keep the collection
mkdir ~/webarchive; cd ~/webarchive
initialise the collection (call my-collection as you prefer)
docker run --rm -v ~/webarchive:/webarchive webrecorder/pywb wb-manager init my-collection
add archived contents, copying WARCs you have previously created
cp $file.warc.gz ~/webarchive/collections/my-collection/archive
index the collection
docker run --rm -v ~/webarchive:/webarchive webrecorder/pywb wb-manager reindex my-collection
a CDXJ index will be created in
start it: pywb will run on localhost:8080
docker run -d --name pywb -v ~/webarchive:/webarchive -p 8080:8080 webrecorder/pywb open http://localhost:8080
Again, why pywb has been so important in the webarchiving scene? Because it focus on individuals, for the easiness on creating, curating and mantaining personal web archives!