Pywb 2.0 - docker quickstart
2018-01-31 tags: webarchiving pywb dockerFour years have passed since i first wrote of pywb: it was a young tool at the time, but already usable and extremely simple to deploy. Since then a lot of works has been done by Ilya Kreymer (and others), resulting in all the new features available with the 2.0 release.
Also, some very big webarchiving initiatives have moved and used pywb in these years: Webrecorder itself, Rhizome, Perma, Arquivo PT in Portugal, the Italian National Library in Florence (Italy), (others I'm missing).
For many years i've used pywb for my personal private webarchive on a shared host, with the setup described here. Nowadays actually shared hosts are well defunct, and cloud virtual machines are even more cheap.
The simplest way you can use pywb today for your own instance is probably docker. Here a quick tutorial:
-
pull the docker image
docker pull webrecorder/pywb
-
create a directory to keep the collection
mkdir ~/webarchive; cd ~/webarchive
-
initialise the collection (call my-collection as you prefer)
docker run --rm -v ~/webarchive:/webarchive webrecorder/pywb wb-manager init my-collection
-
add archived contents, copying WARCs you have previously created
cp $file.warc.gz ~/webarchive/collections/my-collection/archive
-
index the collection
docker run --rm -v ~/webarchive:/webarchive webrecorder/pywb wb-manager reindex my-collection
a CDXJ index will be created in
~/webarchive/collections/my-collection/indexes/index.cdxj
-
start it: pywb will run on localhost:8080
docker run -d --name pywb -v ~/webarchive:/webarchive -p 8080:8080 webrecorder/pywb open http://localhost:8080
Easy!
Again, why pywb has been so important in the webarchiving scene? Because it focus on individuals, for the easiness on creating, curating and mantaining personal web archives!