Docker-Compose¶

Docker-compose and the use of Docker allows running all the required INNUENDO Platform components in a controller environment (containers) in a very simple way.

Since it uses the docker-images as built using the developed Dockerfiles that act as a recipe for the installation of all components, it releases that burden from the user.

Installation¶

For the docker-compose version of the INNUENDO Platform you will need to install the following software.

Docker
Docker-Compose

On Ubuntu¶

First, add the GPG key for the official Docker repository to the system.

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Add the Docker repository to APT sources.

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

Next, update the package database with the Docker packages from the newly added repo.

sudo apt-get update

Install Docker.

sudo apt-get install -y docker-ce

Docker should now be installed, the daemon started, and the process enabled to start on boot. Check that it’s running.

sudo systemctl status docker

Next we will install docker-compose. We will check the current release and if necessary, update it in the command below.

sudo curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

Next we will set the permissions.

sudo chmod +x /usr/local/bin/docker-compose

Then we can verify that the installation was successful by checking the version.

docker-compose --version

On Windows and Mac¶

Install the executables from the docker-compose page.

https://docs.docker.com/compose/install/

Configuration¶

Each component of the INNUENDO Platform can be configured by modifying its configuration file. Configuration files are located at configs/ and are files required for the Platform to work.

NOTE: Modifying these files might lead to corruption of the application. Proceed with care.

Each file belonging to each component is described bellow.

Frontend Server¶

The Frontend server has one configuration file located at configs/app/config_frontend.py that has a set of variables required for this module to work in cooperation with the process controller.

Below defaults are for the docker-compose version.

FRONTEND_IP: IP address of the machine, default: web
phyloviz_root: Root address of PHYLOViZ Online. default: http://web:82
AGRAPH_IP: AllegroGraph server IP adress. default: web
CURRENT_ROOT: Current address of the frontend application. default: http://’+FRONTEND_IP+’/app
JOBS_IP: INNUENDO Process Controller IP address. default: web
JOBS_ROOT: Job submission route. default: http://’+JOBS_IP+’/jobs/’
FILES_ROOT: Route to get information about fastq files. default:http://’+JOBS_IP+’/jobs/fastqs/’
REPORTS_URL: Reports application route. default: “http://localhost/reports”
SECRET_KEY: Secret key for flask-security hash.
SECURITY_PASSWORD_HASH: Flask-security type of hash used.
SECURITY_PASSWORD_SALT: Flaks-security salt used.
ADMIN_EMAIL: Email of the platform administrator. default: innuendo@admin.com
ADMIN_NAME: Administrator name. default: Admin
ADMIN_USERNAME: ADministrator username. default: innuendo_admin
ADMIN_PASS: Administrator password.
ADMIN_GID: Group identifier for admins. default: 501
REDIS_URL: Redis queue URL. default: redis://redis:6379
SECURITY_REGISTERABLE: Allow Flask-security view to register. default: False
SECURITY_RECOVERABLE: Allow Flask-security view to recover password. default: True
SECURITY_CHANGEABLE: Allow Flask-security view to change password. default: True
SECURITY_FLASH_MESSAGES: SHow Flask-security messages. default: True
FAST_MLST_PATH: Path for fast-mlst application used for profile classification and search. default: /Frontend/fast-mlst
NEXTFLOW_TAGS: Currently available FlowCraft tags. More information on FlowCraft documentation.
DATABASE_USER: User owner of the postgreSQL database. default: innuendo
DATABASE_PASS: Password of the postgreSQL user. default: innuendo_database
database_uri: URI for the wgMLST profile database. default: ‘postgresql://’+DATABASE_USER+’:’+DATABASE_PASS + ‘@db_mlst/mlst_database’
innuendo_database_uri: URI for the innuendo database. default: ‘postgresql://’+DATABASE_USER+’:’+DATABASE_PASS+’@db_innuendo/innuendo’
SQLALCHEMY_BINDS: Databases that bind to SQLAlchemy.
SQLALCHEMY_MIGRATE_REPO: Location to store and update database files. default: os.path.join(basedir, ‘db_repository’)
SQLALCHEMY_TRACK_MODIFICATIONS: Track database modification. default: True
WTF_CSRF_ENABLED: Enable CSRF. default: False
app_route: Application entry route. default: ‘/app’
LDAP_PROVIDER_URL: LDAP client IP definition. default: LDAP_IP
LDAP_PROTOCOL_VERSION: LDAP protocol version. default: 3
baseDN: Base repository reference. default: dc=innuendo,dc=com
LOGIN_METHOD: Platform login method. Used to distinguish between LDAP authentication and single user authentication used in the docker version. default: None
LOGIN_GID: Login group identifier. Used in case of docker version. default: 501
LOGIN_HOMEDIR: Single user home directory. Used in case of docker version. default: /INNUENDO/
LOGIN_USERNAME: Single user username. Used in case of docker version. default: innuendo_user
LOGIN_PASSWORD: Single user password. Used in case of docker version. default: innuendo_user
LOGIN_EMAIL: Single user email. Used in case of docker version. default: innuendo@innuendo.com
ALL_SPECIES: All supported species. default: [“E.coli”,”Yersinia”,”Campylobacter”,”Salmonella”]
allele_classes_to_ignore: chewBBACA report on profile to replace with 0.
wg_index_correspondece: Path to the wg index file used by fast-mlst for profile search up to x differences. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_wg”}
core_index_correspondece: Path to the core index file used by fast-mlst for profile search up to x differences. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_core”}
wg_headers_correspondece: Path to the list of the wg loci for each species. Example: {“E.coli”: “/INNUENDO/inputs/core_lists/ecoli_headers_wg.txt”}
core_headers_correspondece: Path to the list of the core loci for each species. Example: {“E.coli”: “/INNUENDO/inputs/core_lists/ecoli_headers_core.txt”}
core_increment_profile_file_correspondece: Location of the file with the core profiles for each species. Used to contruct the search index. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_core_profiles.tab”}
wg_increment_profile_file_correspondece: Location of the file with wg profiles for each species. Used to contruct the search index. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_wg_profiles.tab”}
classification_levels: Classification levels for each specie. Number of profile differences. Example: {“E.coli”: [8, 112, 793]}
AG_REPOSITORY: Name of the AllegroGraph repository. default: innuendo
AG_USER: AllegroGraph user. default: innuendo
AG_PASSWORD: AllegroGraph password. default: innuendo_allegro

Controller Server¶

The Controller server has one configuration file located at configs/app/config_process.py that has a set of variables required for this module to work in cooperation with the frontend and the workflow managers.

Below defaults are for the docker-compose version.

REDIS_URL: Redis queue URL. default: redis://redis:6379
ASPERAKEY: Aspera key location. default: ~/.aspera/connect/etc/asperaweb_id_dsa.openssh
FTP_FILES_FOLDER: Location of the files folder in relation to the user home directory. default: ftp/files
NEXTFLOW_RESOURCES: Specifications of each nextflow process. Can be used to specify each parameter of any given process. Example: { “integrity_coverage”:{“memory”: r“‘2GB’”,”cpus”: “1”}
SERVER_IP: IP address of the machine. default: web
FRONTEND_SERVER_IP: IP address of the frontend server. default: web
DEFAULT_SLURM_CPUS: Default SLURM CPUs used when a process is not specified. default: 8
NEXTFLOW_PROFILE: Nextflow profile to use. Those are specified in the FlowCraft software. default: desktop
NEXTFLOW_GENERATOR_PATH: Location of the FlowCraft software executable. default: /Controller/flowcraft/flowcraft/flowcraft.py
NEXTFLOW_GENERATOR_RECIPE: FlowCraft recipe to use. It defines the set of processes that can be used and their relationships. default: innuendo
FASTQPATH: Location of the fastq files in the user directory structure. Used by FlowCraft to search for paired end reads. default: “data/_{1,2}.”
JOBS_ROOT_SET_OUTPUT: Route used to set the output status of processes. Example: http://+SERVER_IP+/jobs/setoutput/
JOBS_ROOT_SET_REPORT: Route used to set the reports and store them on the database. Example: http://+FRONTEND_SERVER_IP+/app/api/v1.0/jobs/report/
CHEWBBACA_PARTITION: Partition name used by SLURM to launch chewBBACA processes. Can only run one chewBBACA at a time. default: chewBBACA
CHEWBBACA_SCHEMAS_PATH: Location of the chewBBACA schemas. default: /INNUENDO/inputs/schemas
CHEWBBACA_TRAINING_FILE: Location of prodigal training files for each specie. Example: { “E.coli”: “/INNUENDO/inputs/prodigal_training_files/prodigal_training_files/Escherichia_coli.trn”, }
SEQ_FILE_O: SeqTyping FILE_O location. default: {“E.coli”: “/INNUENDO/inputs/serotyping_files/escherichia_coli/1_O_type.fasta”}
SEQ_FILE_H: Seqtyping FILE_H location. default: {“E.coli”: “/INNUENDO/inputs/serotyping_files/escherichia_coli/2_H_type.fasta”}
wg_index_correspondece: Path to the wg index file used by fast-mlst for profile search up to x differences. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_wg”}
core_index_correspondece: Path to the core index file used by fast-mlst for profile search up to x differences. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_core”}
wg_headers_correspondece: Path to the list of the wg loci for each species. Example: {“E.coli”: “/INNUENDO/inputs/core_lists/ecoli_headers_wg.txt”}
core_headers_correspondece: Path to the list of the core loci for each species. Example: {“E.coli”: “/INNUENDO/inputs/core_lists/ecoli_headers_core.txt”}
core_increment_profile_file_correspondece: Location of the file with the core profiles for each species. Used to contruct the search index. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_core_profiles.tab”}
wg_increment_profile_file_correspondece: Location of the file with wg profiles for each species. Used to contruct the search index. Example: {“E.coli”: “/INNUENDO/inputs/indexes/ecoli_wg_profiles.tab”}
AG_REPOSITORY: AllegroGraph repository name. default: innuendo
AG_USER: AllegroGraph username. default: innuendo
AG_PASSWORD: AllegroGraph user password. default: innuendo_allegro

Flowcraft Configuration¶

The Flowcraft webapp application has two configuration files located at configs/flowcraft that has a set of variables required for this module to work in cooperation with the frontend.

Below are the defaults for the docker-compose version.

reportsRoute: Route location to fetch for reports. default: http://localhost/reports

Running the INNUENDO Platform¶

Retrieving the docker-compose version¶

To launch the docker-compose version of the INNUENDO Platform, first need to get: the INNUENDO_docker repository from github that has all the

required Dockerfiles and structures for communication between the containers and the user file system.

git clone https://github.com/bfrgoncalves/INNUENDO_docker.git

Launching the application¶

Running the INNUENDO Platform is very simple. You can lauch it with a single command.

# Access the INNUENDO docker repository
cd </path/to/INNUENDO_docker>

# Launch the application
docker-compose up

The last command will pull all the required images first then it will launch all the Docker containers. They will will communicate between each other by a docker network that is built by default with docker-compose.

Downloading legacy data and building profile databases¶

The application provides a script to download all the required files to perform comparisons with some already publicly available strains. This is made through the download of the following data available here:

chewBBACA schemas

Legacy strain metadata (for each species)

Legacy strain profiles (for each species)

Serotyping files

Prodigal training files

These data will be available under ./inputs and will be mapped to the docker containers running the application.

The script also build the required files for a rapid comparison between profiles using fast-mlst and populates the mlst_database.

To run the script, type the following command:

# Enter repository directory
cd <innuendo_docker_directory>/build_files

# Run script to get legacy input files
./get_inputs.sh

These steps might take up to 1h depending on the available internet connection and the host machine.

Mapping data into the Docker containers¶

To map data between the user filesystem and the containers, docker-compose already has a directive to deal with that action.

Inside the docker-compose.yml you got all the required attributes to launch the container and the interaction between other containers.

Below is described the directives used to launch a service in docker-compose.

# Service for the INNUENDO frontend. Requires the config files for the
# application and mapping of the fastq files
frontend:
    # this service uses the dockerfile inside the Frontend directory
    build: ./components/Frontend/
    # Allow run services inside as root
    privileged: true
    # Allow restart on failure
    restart: on-failure
    # Directive to map files and folders to the container. In this case,
    all files before : are files in the user file system. The files after
     : are the location of those files in the container.
    volumes:
      - ./configs/app/config_frontend.py:/Frontend/INNUENDO_REST_API/config.py
      - user_data:/INNUENDO
      - ./inputs/fastq:/INNUENDO/ftp/files
      - ./inputs/v1/classifications:/INNUENDO/inputs/v1/classifications
      - ./inputs/v1/core_lists:/INNUENDO/inputs/v1/core_lists
      - ./inputs/v1/indexes:/INNUENDO/inputs/v1/indexes
      - ./inputs/v1/legacy_metadata:/INNUENDO/inputs/v1/legacy_metadata
      - ./inputs/v1/legacy_profiles:/INNUENDO/inputs/v1/legacy_profiles
      - singularity_cache:/mnt/singularity_cache
    # Ports mapping between container and host
    ports:
      - "5000:5000"
    # Depends on other docker-compose services to work
    depends_on:
      - "allegro"
      - "db_innuendo"
      - "db_mlst"
      - "web"
    # Arguments to give to the docker-entrypoint.sh
    command: ["init_allegro", "build_db", "init_app"]

As seen above, the files can be mapped with the volumes directive.

Fastq files from the user must be placed into the inputs/fastq folder to be linked with the INNUENDO Platform docker version.

Backing up/ Build data¶

We provide a series of scripts to backup/build all the required databases used in the docker-compose version of the INNUENDO Platform. These files are located at inside the images and need to be triggered after the application is running. This is made using the docker exec command on an already running container.

Backing up/ Build postgreSQL databases¶

There are four postgreSQL databases used in the INNUENDO Platform that can be backed up: innuendo, mlst_database, assemblerflow, and phyloviz.

All databases backups can be made using a single command for each database.

# Execute script on frontend container to backup database
# Information on database, username and pass are located in the
# docker-compose.yml file
docker exec innuendo_docker_frontend_1 backup_dbs.sh backup <database> <username> <pass> <backup_file_name>

The build command to restore a database to a given backup state is very similar to the above.

# Execute script on frontend container to build database
docker exec innuendo_docker_frontend_1 backup_dbs.sh build <database> <username> <pass> <backup_file_name>

Backing up/ Build AllegroGraph databases¶

Other database type used in the INNUENDO Platform is a triplestore and it is also required for the application to retrive to a given state if required.

To backup AllegroGraph, it is only required to run a single command

# Execute script on frontend container to backup allegrograph
# Information on database, username and pass are located in the
# docker-compose.yml file
docker exec innuendo_docker_frontend_1 build_allegro.py backup <backup_file_name>

The build command is similar to the above and is required to move the application to a given state.

# Execute script on frontend container to backup allegrograph
docker exec innuendo_docker_frontend_1 build_allegro.py build <backup_file_name>

Customizing Entrypoints¶

Entrypoints are the files run on container creation with a series of predefined commands.

On each component/ folder of the application you have an entrypoint.sh file and a Dockerfile.

By modifying the commands inside the entrypoint.sh you can change the default behaviour when the container for that component launches.

Useful docker commands¶

Bellow are some docker commands that might be useful to interact with the containers.

Show active containers.

docker-compose ps

Enter container.

docker exec -it container_name bash

List virtual volumes.

docker volume ls

List images.

docker images

Remove images

docker rmi image_name