-
Notifications
You must be signed in to change notification settings - Fork 296
How to build and run in Docker
This document describes the process of building the OpenWayback from source and running, all in the Docker environment. This can be very handy for development and testing in different environments. The OpenWayback source code includes a Dockerfile. Generated Docker image is kept minimal which makes it suitable for running in production as well.
Docker (version 17.05
or later is required for building the image).
Acquire the source code.
$ git clone https://github.com/iipc/openwayback.git
$ cd openwayback
Make any changes to the source code if needed. Then build the docker image.
$ docker image build -t openwayback .
This will download dependencies, compile the code, run tests, package, and place necessary components in appropriate places to build a minimal Docker image with the name openwayback
.
This process may take a while (depending on the network bandwidth and processor speed).
It utilizes Multi-Stage Build feature of Docker to exclude compile-time environment and dependencies from the final image, which makes it both, secure and smaller in size.
By default, the source is built using the latest versions of Maven
and JDK
then the image is packaged with the latest versions of Tomcat
and JRE
.
However, it is possible to build and package with custom combinations these dependencies using MAVEN_TAG
and TOMCAT_TAG
build arguments.
These variations can be helpful for both testing and production needs without making any changes in the Dockerfile.
$ docker image build \
--build-arg=MAVEN_TAG=3.5-jdk-7 \
--build-arg=TOMCAT_TAG=7-jre7-alpine \
-t openwayback:custom .
Above command would build an image named openwayback
with tag custom
where the source code would be built using Maven 3.5
with JDK 7
and then the built artifacts will be packaged in a small Alpine Linux
image with Tomcat 7
and JRE 7
.
See available values of MAVEN_TAG
and TOMCAT_TAG
build arguments.
Another build argument SKIP_TEST
is made available which is set to false
by default.
To skip tests, use --build-arg=SKIP_TEST=true
argument in the Docker build command.
The default configuration of the OpenWayback uses the automatic BDB Indexer
and expects WARC
files at ${WAYBACK_BASEDIR}/files1/
or ${WAYBACK_BASEDIR}/files2/
.
By default the WAYBACK_BASEDIR
is set to /data
volume in the Docker image.
Create necessary directory structure on the host machine for testing and populate it with some test files.
$ mkdir -p /tmp/owb/files1
$ cp /path/to/sample/*.warc /tmp/owb/files1/
Run a Docker container with appropriately mounted volumes and port mapping. By default the container would run the Tomcat server.
$ docker container run -it --rm -v /tmp/owb:/data -p 8080:8080 openwayback
Once the WARC
files are indexed, they should be ready for lookup at http://localhost:8080/.
The OpenWayback allows certain configuration overrides using environment variables that can be customized when running a container, but these customization are very limited.
WAYBACK_HOME=/usr/local/tomcat/webapps/ROOT/WEB-INF
WAYBACK_BASEDIR=/data
WAYBACK_URL_SCHEME=http
WAYBACK_URL_HOST=localhost
WAYBACK_URL_PORT=8080
WAYBACK_URL_PREFIX=http://localhost:8080
However, by strategically mounting certain volumes, it is possible to run the OpenWayback server with custom configuration files.
$ docker container run -it --rm -p 8080:8080 \
-v /tmp/owb:/data \
-v /path/to/custom/wayback.xml:/usr/local/tomcat/webapps/ROOT/WEB-INF/wayback.xml \
-v /path/to/custom/CDXCollection.xml:/usr/local/tomcat/webapps/ROOT/WEB-INF/CDXCollection.xml \
openwayback
This way of mounting configuration files can be handy for testing. However, for production purposes it is better to create derived image and override configuration files with custom files.
The Docker image contains various executable utilities with their necessary dependencies that can be used in one-off mode.
The following command illustrates one possible usage of the cdx-indexer
to index WARC
files into CDX
files on the host machine with appropriate volume mounting while utilizing a one-off container.
$ docker container run -it --rm -v /tmp/owb:/data openwayback cdx-indexer /data/files1/sample1.warc > /tmp/owb/index1.cdx
Alternatively, access the bash
prompt of the container to run utility scripts inside or perform debugging.
$ docker container run -it --rm -v /tmp/owb:/data openwayback bash
[CONTAINER ID]# cdx-indexer /data/files1/sample1.warc > /data/index1.cdx
IMPORTANT If you are using the bash sort
command to sort CDX files, you must set the environment variable LC_ALL=C
.
This tells sort how to sort and ensures that it matches how OpenWayback expects CDX indexes to be sorted.
Copyright © 2005-2022 [tonazol](http://netpreserve.org/). CC-BY. https://github.com/iipc/openwayback.wiki.git