A short guide to download digitized books from Internet Archive and rehost on your own infrastructure using IIIF with full-text search.

pywb 2.0 - docker quickstart

Four years have passed since i first wrote of pywb: it was a young tool at the time, but already usable and extremely simple to deploy. Since then a lot of works has been done by Ilya Kreymer (and others), resulting in all the new features available with the 2.0 release.
Also, some very big webarchiving initiatives have moved and used pywb in these years: Webrecorder itself, Rhizome, Perma, Arquivo PT in Portugal, the Italian National Library in Florence (Italy), (others i’m missing).

Anonymous webarchiving

Webarchiving activities, as any other activity where an HTTP client is involved, leave marks of their steps: the web server you are visiting or crawling will save your IP address in its logs (or even worse it can decide to ban your IP). This is usually not a problem, there are plenty of good reasons for a webserver to keep logs of its visitors.
But sometimes you may need to protect your own identity when you are visiting or saving something from a website, and there a lot of sensitive careers that need this protection: activists, journalist, political dissidents.
TOR has been invented for this, and today offer a good protection to browse anonymously the web.
Can we also archive the web through TOR?

Open BNI

Il 30 maggio 2016 viene annunciato il rilascio libero della Bibliografia Nazionale Italiana (BNI). Viene apprezzata l’apertura di questo catalogo (anche se con i limiti dei soli pdf), e da profano di biblioteconomia faccio anche una domanda sull’effettivo caso d’uso della BNI.
Il 30 agosto 2016 viene annunciato il rilascio delle annate 2015 e 2016 anche in formato UNIMARC e MARCXML.
Incuriosito dal catalogo inizio ad esplorarlo, per pensare a possibili trasformazioni (triple rdf) o arricchimenti con/verso altri dati (wikidata).

Epub linkrot

Linkrot also affects epub files (who would have thought! :)).
How to check the health of external links in epub books (required tools: a shell, atool, pup, gnu parallel).

SKOS Nuovo Soggettario, api e autocomplete

Come creare una api per un form con autocompletamento usando i termini del Nuovo Soggettario, con i Sorted Sets di Redis e Nginx+Lua.

Serve deepzoom images from a zip archive with openseadragon

vips is a fast image processing system. Version higher than 7.40 can generate static tiles of big images in deepzoom format, saving them directly into a zip archive.

a wayback machine (pywb) on a cheap, shared host

For a long time the only free (i’m unaware of commercial ones) implementation of a web archival replay software has been the Wayback Machine (now Openwayback). It’s a stable and mature software, with a strong community behind.
To use it you need to be confident with the deploy of a java web application; not so difficult, and documentation is exaustive.
But there is a new player in the game, pywb, developed by Ilya Kramer, a former Internet Archive developer.
Built in python, relatively simpler than wayback, and now used in a pro archiving project at Rhizome.

Opendata dell'Anagrafe Biblioteche

Come usare gli opendata dell’Anagrafe delle Biblioteche Italiane e disegnare su una mappa web gli indirizzi delle biblioteche.

api json dell'opac SBN

Alcuni mesi fa è stata rilasciata da ICCU una app mobile per consultare l’OPAC SBN. Anche se graficamente poco accattivante l’app funziona bene, e trovo molto utili le funzioni di ricerca di un libro scansionando il codice a barre con la camera del telefonino, e la possibilità di bookmarkare dei preferiti.
Incuriosito dal funzionamento ho pensato di analizzarne il traffico http.