DevOps

How to deploy Alfresco using Docker in 3 Minutes?

What is Alfresco?

Alfresco is an Enterprise Content Management (ECM) and Business Process Modeling (BPM) System. It’s, however, recognized more prominently as a Document Management System (DMS).

It runs on an Apache TomCat Server and it supports a range of Database Management Systems (DBMS). In this post, we will use PostgreSQL which is the default option.

It provides a range of enterprise level features like role-based user management, document version control, and it could be extended with an Optical Character Recognition (OCR) module. It could also be integrated with an Enterprise Resource Planning (ERP) such as ODOO.

Before You Start

It’s usually a good practice not to test new software on your machine directly. You could end up messing around with system files or binding to wrong ports. I’d start with installing Ubuntu 64-bit on a virtual machine using Oracle VM VirtualBox. One should assign an adequate storage such as 25 G byte

I’d also recommend changing the default directory that Docker is using to store images. Otherwise, the disk storage gets eaten up and one runs into a similar issue as described in a previous blog : Ubuntu Login Screen Loop

Docker Container Linking

Prior to Container Networking, Docker provided a nifty feature to link containers via an argument – -link passed to the run command.

$ docker run --link [name or id]:alias image

Docker Container Linking via an Ambassador

This is not everything. Docker provided another nifty feature that worked just out-of-the-box to link containers running on different machines using an ambassador container [3]. This feature was also deprecated and superseded with Overlay Networks when Docker Networking was introduced [1].

Docker Container Networking

Docker provides a networking mechanism to orchestrate the communication between the containers and the host and between the containers and each other.

By default Docker creates three networks for your container:

  • None
  • Host
  • Bridge

Briefly, the Host network copies the /etc/hosts file from the host machine to the container. The Bridge network connects the host machine and the container.

Docker also gives options for networking containers running on the same host machine or containers running on different hosting machines. The following figure demonstrates an insulated bridge network, and a set of selected ports are exposed to the outer world. I’d suggest having a look at the documentation for a profound understanding [1].

Docker Data Volumes

Since, a container is a virtualization of a platform in a way or another, it doesn’t persist state. That is any data one’s working on or any changes one makes, they will get lost once the container is stopped.

Docker provides the ability of mounting data volumes to containers. In a nutshell, it instructs the container to persist the data found in a specific directory to a permanent directory on the hosting machine. It’s all simpleand it’s done with an argument -v of the run command. For profound understanding, please refer to Manage Data in Containers [2] section in the official documentation.

$ docker run \
-v [host directory]:[container directory] \
image

Docker Data Volume Containers

A more robust alternative to persisting data is called data volume containers. This is a valid option because

  • We create a new container based on the same image
  • The data container is easily shared with multiple containers
  • The persisting directory is not bound by name to a specific container

As interpreted in [7], the containerization is only a logical shift of the mind-set; let’s think in terms of containers, no matter where the data is physically stored.

For profound understanding, please refer to Manage Data in Containers [2] section in the official documentation.

What is our plan?

The plan is to set up PostgreSQL server on one container and Alfresco on another container. Then, we instruct the Alfresco to connect to PostgreSQL via Docker networking. Both containers will persist their data via Docker volumes to the host machine. Finally, we write a shell script that runs on machine start-up, since we will have to run the containers each time our hosting machine reboots.

Just A Question

You may start wondering now why we didn’t mount the PostgreSQL data path running on the Alfresco container to our host machine? It would have saved us the hassle of networking. Yes, you’re right. However, in a production environment it’s very common that the database server will be running on a different machine than the application. In this case and if both are Docker based environments, one can extend the concept introduced in this blog post to networking Docker containers running on different hosts with few extra steps.

One More Question

Why haven’t we used Docker container linking as a ready out-of-the box alternative to hard-wiring containers via networking? In addition to that Docker Networking is a more recent feature that succeeded Legacy Container Linking. I quote this answer from [4]:

Networking vs. Links

We know that some users love the simplicity of links and you can still use links in the docker0 network. But we’d recommend that you try out our new networking because unlike links, it allows you to:

  • Connect containers to each other across different physical or virtual hosts
  • Containers using Networking can be easily stopped, started and restarted without disrupting the connections to other containers
  • You don’t need to create a container before you can link to it. With Networking containers be created in any order and discover each other using their container names

A Bridge Network

Using sudo to run Docker commands is way inapt and a professional solution is to add the current user to the docker group as advised in the official documentation [5].

$ sudo usermod -aG docker $USER

Docker official documentation recommends using a bridge network for connecting containers running on the same host. The containers discover each other, and the network itself could be pinged from the host machine or any other machine. It’s also possible to reserve ports for external communications.

It’s possible to list all the networks available using the following command.

$ docker network ls

Hence, we create a user defined bridge network using the following command.

$ docker network create --driver bridge nw_alfresco

If we run the network listing command now, the new user defined network will be listed.

If we run the network inspection command as follows, then we will see that there are no containers listed under this network. It’s also easy to know the new network’s gateway IP.

$ docker network inspect nw_alfresco

PostgreSQL Docker Image

In the beginning, I’d pull the official Docker image of PostgreSQL

$ docker pull postgres

Now, the following command runs the container from postgres image

$ docker run --name 'postgresql' \
--net=nw_alfresco \
-e POSTGRES_USER=pg_alfresco \
-e POSTGRES_PASSWORD=123456 \
-v $HOME/alfresco/database:/var/lib/postgresql/data \
-d postgres

Let’s unearth the meaning of each argument. Just note that “\” only denotes a line break in the command.

– -name assigns a name to the container that can be used later to start, stop and attach to the container.

If one runs the containers listing command, then the just created container will be found with the name passed to – – name

$ docker ps

– -net connects the container to the user defined bridge network nw_alfresco which we have created together in the previous section. At this point if you inspect the nw_alfresco user defined network, the just created container will come under the containers listing. The following is the command and a screen-shot of the output.

$ docker network inspect nw_alfresco

-e indicates an environment variable key value pair. That’s for example we assign the variable named POSTGRES_USER the value of ‘pg_alfresco’. Hence, all the references made to this variable shall be replaced by the value when the container runs.

-v mounts the default directory used to stare the databases to a persistent directory on the hosting machine under the current user’s home directory. Note that a custom storage directory other than /var/lib/postgresql/data could have been assigned to the container using the environment variable PGDATA.

When one runs the container inspection command, the mountable volumes defined in the Docker build file can be found under volumes listing as follows.

$ docker inspect postgresql

It’s also valid to check the build file directly or the official documentation of the image for such details. In our case postgres documentation is found under [11].

-d runs the container in a detached mode, so that one could run more commands in the terminal on the hosting machine, an alternative is -it for interactive mode.

At this point, it’s all abstract and one gets full of doubts whether these command really get the job done. I will show you next to ways to check the correctness of the above steps.

First, let’s attach to the container and run the psql in the terminal and inspect the schemas found so far.

$ docker exec -ti postgresql psql -U pg_alfresco
$ docker attach postgresql

The first command exec is the Docker way of running a command in the container from the hosting machine’s terminal [12].

The second command attach is the Docker way of switching the scope from the hosting machine to the container’s standard output. It’s usually used to run /bin/sh, but we have used it to run the Command Line Interface (CLI) of PostgreSql namely psql. If you have been following, we have set the environment variable of PostgreSql default user to be pg_alfresco and it is necessary at this point to pass it using -U (Case Sensitive) to the psql daemon.

It’s now as easy as simply running the following command to list the available schemas

# \l

It’s worth noting that PostgreSql has created a schema with the same name of the default username. We could again have overridden this by setting the environment variable POSTGRES_DB when running the container.

Note that in order to detach from a container’s context or shell, one should press Ctrl+q+p. If one presses Ctrl+d, the container will exit, and this is not a desired action.

Now let’s check the persistent storage passed to -v argument, and I expect to find the files of PostgreSql permanently stored on the hosting machine under $HOME/alfresco/database. This way one can easily restore the database to any container in the future without losing a bit of the data. Here is a screen-shot of the contents of that directory.

Alfresco Docker Image

It’s time to pull the alfresco image. In this step, I had some issues when I tried gui81/alfresco image [15].

docker run --name='alfresco' -d \
-p 8080:8080 \
-v $HOME/alfresco_volumes/alf_data:/alfresco/alf_data \
-v  $HOME/alfresco_volumes/tomcat_logs:/alfresco/tomcat/logs  \
-v $HOME/alfresco_volumes/content:/content \
gui81/alfresco

If mounted correctly, the logs are found under $HOME/alfresco_volumes/tomcat_logs. Or else one could simply attach to the container as before and manually inspect the TomCat server error log, and I copy the error here for the sake of completeness.

$ docker exec -ti alfresco /bin/sh
$ tail -f /alfresco/tomcat/logs

SEVERE: Failed to initialize end point associated with ProtocolHandler [“http-bio-8443”] java.io.FileNotFoundException: /alfresco/alf_data/keystore/ssl.keystore (No such file or directory)

I was terribly confused whether I should run the certification keytool inside the docker container or on the host. A tutorial for running Jenkins with SSL [14] suggested that the issue is simply all about signing a certificate and placing it in the mounted volume. KeyTool is found in the $JAVA_HOME/bin, and if it is not found, one may consult a Java installation manual.

After having signed the certificate and mounted it to the image. I ran into even more errors which I could not resolve.

SEVERE: Error listenerStart May 28, 2016 4:58:11 AM org.apache.catalina.core.StandardContext startInternal

ERROR [solr.tracker.AbstractTracker] [SolrTrackerScheduler_Worker-39] Tracking failed java.net.ConnectException: Connection refused

SEVERE: Failed to initialize connector [Connector[HTTP/1.1-8443]] org.apache.catalina.LifecycleException: Failed to initialize component [Connector[HTTP/1.1-8443]]

I found a similar open issue on Github under the gui81/docker-alfresco image. The full discussion is to be found in [8]. The commands to get up and running are:

$ git clone https://github.com/disaster37/rancher-alfresco.git
$ cd rancher-alfresco
$ docker build --tag="$USER/alfresco"
$ docker run -d --name "alfresco" \
-p 7070:7070 \
-p 8080:8080 \
--net=nw_alfresco \
-e DATABASE_HOST=172.18.0.2 \
-e DATABASE_NAME=pg_alfresco \
-e DATABASE_PASSWORD=123456 \
-e DATABASE_USER=pg_alfresco \
$USER/alfresco

Note why no volume was specified here, as it would have mounted an empty folder and one would have run into the same errors once again. It’s also worth explaining now, how to figure out the IP address of PostgreSql server. One has only to run the network inspection command, and sniff the IP address of PostgreSql.

$ docker network inspect nw_alfresco

Now if one returns to the PostgreSql image and connects to pg_alfresco database, all the tables created by Alfresco should be listed. These commands do just that:

$ docker exec -ti postgresql -U pg_alfresco
$ \connect pg_alfresco
$ \dt

One more thing, when the network nw_alfresco was inspected, the new container was also connected to the network and listed under containers.

Start-up Script

In case the host machine is rebooted, one has to simply start the containers once again. It’s crucial to the availability of the repository that this process is automated. In reference to [16, 17], the following commands are to be run:

$ sudo apt-get install upstart
$ sudo vim .config/upstart

Then, add the following lines:

start on startup                                                                                                                                                    
task                                                                             
exec docker start postgresql                                                     
exec docker start alfresco 

After rebooting the host machine, the containers should be listed when one runs this command:

$ docker ps

If one navigates to http://localhost:8080/share/page , the repository should be accessible.

Backup & Restore

Since there are only two elements essential to an Alfresco repository: the database and the root directory, now we will write a simple cronjob to backup Alfresco every week at 05:00 AM. The following commands do the trick:

$ crontab -e

Simply add the following lines in your favorite editor:

                                                                                  
0 5 * * 1 docker cp alfresco:/opt/alfresco/alf_data $HOME/alfresco/alf_data  
0 5 * * 1 docker cp postgresql:/var/lib/postgresql/data $HOME/alfresco/database

In order to restore an existing repository on a new machine, the following three commands are all that’s needed.

$ docker network create --driver bridge nw_alfresco

$ docker run --name 'postgresql' \
--net=nw_alfresco \
-e POSTGRES_PASSWORD=123456 \
-e POSTGRES_USER=psql_alfresco \
-v $HOME/alfresco/database:/var/lib/postgresql/data \
-d postgres

$ docker run -d --name "alfresco" \
  -p 7070:7070 -p 8080:8080 \
 --net=nw_alfresco \
 -e DATABASE_HOST=172.18.0.2 \
 -e DATABASE_NAME=psql_alfresco \
 -e DATABASE_PASSWORD=123456 \
 -e DATABASE_USER=psql_alfresco \
 -v $HOME/alfresco/alf_data:/opt/alfresco/alf_data \
  $USER/alfresco

At this point, if one navigates to http://localhost/share/page, then all the data will be seen as expected. One should replace localhost with the IP address of the new host machine.

A Security Tip

It’s absolutely essential to change the default username and/or the password.

Final Thoughts

One can easily deploy Alfresco using the above strategy in less than three minutes. As long as the database and alf_data directory are backed up, one can easily migrate from one host machine to another. In this blog, we have together demonstrated concepts related to Docker Networking and Data Volumes. It is worth noting that the above was only a proof of concept and should not be used in production. I’m looking forward to your comments and feedback.

Archaeological Alfresco

It’s very interesting to know that Alfresco is the art of painting on fresh stucco which was prominent in Roman times in ancient Egypt [18].

References:

  1. Docker Networks
  2. Docker Volumes
  3. Docker Linking using Ambassador Links
  4. Docker Tutorial Series
  5. Docker Installation: Create a Docker Group
  6. How to deal with persistent storage (e.g. databases) in docker?
  7. Why Docker Containers are Good?
  8. Alfresco error when using data volume for “/alfresco/alf_data” #7
  9. Rancher Alfresco Docker Image
  10. Docker CLI Volume Create
  11. Official Docker PostgreSql
  12. How to get to a psql from a running container?
  13. Alfresco Wiki: Backup & Restore
  14. Enable SSL in Jenkins in Docker
  15. Docker Hub: gui81/alfresco
  16. Ask Ubuntu: How to run scripts on start up?
  17. Getting Started Upstart
  18. Wikipedia: Fayum Mummy Portraits

5 thoughts on “How to deploy Alfresco using Docker in 3 Minutes?

  1. Not sure why this has zero comments. I am extremely grateful for this post.

    Except for the humongous download, everything ran without a glitch. I am especially fond of running this through docker since it can be removed without disturbing the rest of the system.

  2. I have tried many times to try to follow the tutorial and there is no way for the commands to work and to follow it, why?
    The postgresql part of launching the container and launching the commands does not work for me, does the user pg_alfresco not exist, how do I make it work?

Leave a comment