documenation: Add Docker installation method using a dedicated Docker network. Add scaling up chapter. Add S3 storage configuration section.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
This commit is contained in:
Roberto Rosario
2018-10-14 03:47:41 -04:00
parent 5a922e2689
commit c9fb3814d9
5 changed files with 245 additions and 3 deletions

View File

@@ -1,4 +1,4 @@
3.1.7 (2018-10-XX)
3.1.7 (2018-10-14)
==================
* Fix an issue with some browsers not firing the .load event on cached
images. Ref: http://api.jquery.com/load-event/
@@ -18,6 +18,11 @@
* Add a noop OCR backend that disables OCR and the check for the
Tesseract OCR binaries. Set the OCR_BACKEND setting or MAYAN_OCR_BACKEND
environment variable to ocr.backends.pyocr.PyOCR to use this.
* All tests pass on Python 3.
* documentation: Add Docker installation method using a dedicated
Docker network.
* documentation: Add scaling up chapter.
* documentation: Add S3 storage configuration section.
3.1.6 (2018-10-09)
==================

View File

@@ -43,6 +43,7 @@ repository for electronic documents.
Docker image <topics/docker>
Direct deployments <topics/deploying>
Scaling up <topics/scaling_up>
Development <topics/development>
App creation <topics/app_creation>

View File

@@ -2,7 +2,7 @@
Mayan EDMS v3.1.7 release notes
===============================
Released: October XX, 2018
Released: October 14, 2018
Changes
~~~~~~~
@@ -24,7 +24,7 @@ Changes
* Add a noop OCR backend that disables OCR and the check for the
Tesseract OCR binaries. Set the OCR_BACKEND setting or MAYAN_OCR_BACKEND
environment variable to ocr.backends.pyocr.PyOCR to use this.
* All tests pass on Python 3.
Removals
--------

View File

@@ -71,6 +71,47 @@ If another web server is running on port 80 use a different port in the
``-p`` option. For example: ``-p 81:8000``.
Using a dedicated Docker network
--------------------------------
Use this method to avoid having to expose PostreSQL port to the host's network
or if you have other PostgreSQL instances but still want to use the default
port of 5432 for this installation.
Create the network::
docker network create mayan
Launch the PostgreSQL container with the network option and remove the port
binding (``-p 5432:5432``)::
docker run -d \
--name mayan-edms-postgres \
--network=mayan \
--restart=always \
-e POSTGRES_USER=mayan \
-e POSTGRES_DB=mayan \
-e POSTGRES_PASSWORD=mayanuserpass \
-v /docker-volumes/mayan-edms/postgres:/var/lib/postgresql/data \
-d postgres:9.5
Launch the Mayan EDMS container with the network option and change the
database hostname to the PostgreSQL container name (``mayan-edms-postgres``)
instead of the IP address of the Docker host (``172.17.0.1``)::
docker run -d \
--name mayan-edms \
--network=mayan \
--restart=always \
-p 80:8000 \
-e MAYAN_DATABASE_ENGINE=django.db.backends.postgresql \
-e MAYAN_DATABASE_HOST=mayan-edms-postgres \
-e MAYAN_DATABASE_NAME=mayan \
-e MAYAN_DATABASE_PASSWORD=mayanuserpass \
-e MAYAN_DATABASE_USER=mayan \
-e MAYAN_DATABASE_CONN_MAX_AGE=60 \
-v /docker-volumes/mayan-edms/media:/var/lib/mayan \
mayanedms/mayanedms:<version>
Stopping and starting the container
--------------------------------------

195
docs/topics/scaling_up.rst Normal file
View File

@@ -0,0 +1,195 @@
.. _scaling_up:
==========
Scaling up
==========
The default installation method fits most use cases. If you use case requires
more speed or capacity here are some suggestion that can help you improve the
performance of your installation.
Change the database manager
===========================
Use PostgreSQL or MySQL as the database manager.
Tweak the memory setting of the database manager to increase memory allocation.
More PostgreSQL especific examples are available in their wiki page:
https://wiki.postgresql.org/wiki/Performance_Optimization
Increase the number of Gunicorn workers
=======================================
The Gunicorn workers process HTTP requests and affect the speed at which the
website responds.
If you are using the Docker image, change the value of the
MAYAN_GUNICORN_WORKERS (https://docs.mayan-edms.com/topics/docker.html#environment-variables)
environment variable. Normally this variable defaults to 2. Increase this
number to match the number of CPU cores + 1.
If you are using the direct deployment methods, change the line that reads::
command = /opt/mayan-edms/bin/gunicorn -w 2 mayan.wsgi --max-requests 500 --max-requests-jitter 50 --worker-class gevent --bind 0.0.0.0:8000 --timeout 120
And increase the value of the ``-w 2`` argument. This line is found in the
``[program:mayan-gunicorn]`` section of the supervisor configuration file.
Background task processing
==========================
The Celery workers are system processes that take care of the background
tasks requested by the frontend interactions like document image rendering
and periodic tasks like OCR. There are several dozen tasks defined in the code.
These tasks are divided into queues based on the app of the relationship
between the tasks. The queues by default are divided into three groups
based on the speed at which they need to be processed. The document page
image rendering for example is categorized as a high volume, short duration
task. The OCR is a high volume, long duration task. Email checking is a
low volume, medium duration tasks. It is not advisable to have the same
worker processing OCR to process image rendering too. If the worker is
processing several OCR tasks it will not be able to provide fast images
when an user is browsing the user interface. This is why by default the
queues are split into 3 workers: fast, medium, and slow.
The fast worker handles the queues:
* converter: Handles document page rendering
* sources_fast: Does staging file image rendering
The medium worker handles the queues:
* checkouts_periodic: Scheduled tasks that check if a document's checkout
period has expired
* documents_periodic:
* indexing: Does reindexing of documents in the background when their
properties change
* metadata:
* sources:
* sources_periodic: Checking email accounts and watch folders for new
documents.
* uploads: Processes files to turn the into Mayan documents. Processing
encompasses MIME type detection, page count detection.
* documents:
The slow worker handles the queues:
* mailing: Does the actual sending of documents via email as requested by
users via the mailing profiles
* tools: Executes in the background maintenance requests from the options
in the tools menu
* statistics: Recalculates statistics and charts
* parsing: Parses documents to extract actual text content
* ocr: Performs OCR to transcribe page images to text
Optimizations
-------------
* Increase the number of workers and redistribute the queues among them
(only possible with direct deployments).
* Launch more workers to service a queue. For example for faster document
image generation launch 2 workers to process the converter queue only
possible with direct deployments).
* By default each worker process uses 1 thread. You can increase the thread
count of each worker process with the Docker environment options:
* MAYAN_WORKER_FAST_CONCURRENCY
* MAYAN_WORKER_MEDIUM_CONCURRENCY
* MAYAN_WORKER_SLOW_CONCURRENCY
* If using direct deployment, increase the value of the --concurrency=1
argument of each worker in the supervisor file. You can also remove this
argument and let the Celery algorithm choose the number of threads to
launch. Usually this defaults to the number of CPU cores + 1.
Change the message broker
=========================
Messages are the method of communication between front end interactive code
and background tasks. In this regard messages can be thought as homologous
to tasks requests. Improving how many messages can be sent, stored and
sorted will impact the number of tasks the system can handle. To save on
memory, the basic deployment method and the Docker image default to using
Redis as a message broker. To increase capacity and reduce volatility of
messages (pending tasks are not lost during shutdown) use RabbitMQ to
shuffle messages.
For direct installs refer to the Advanced deployment method
(https://docs.mayan-edms.com/topics/deploying.html#advanced-deployment) for
the required changes.
For the Docker image, launch a separate RabbitMQ container
(https://hub.docker.com/_/rabbitmq/)::
docker run -d --name mayan-edms-rabbitmq -e RABBITMQ_DEFAULT_USER=mayan -e RABBITMQ_DEFAULT_PASS=mayanrabbitmqpassword -e RABBITMQ_DEFAULT_VHOST=mayan rabbitmq:3
Pass the MAYAN_BROKER_URL environment variable (https://kombu.readthedocs.io/en/latest/userguide/connections.html#connection-urls)
to the Mayan EDMS container so that it uses the RabbitMQ container the
message broker::
-e MAYAN_BROKER_URL="amqp://mayan:mayanrabbitmqpassword@localhost:5672/mayan",
When tasks finish, they leave behind a return status or the result of a
calculation, these are stored for a while so that whoever requested the
background task, is able retrieve the result. These results are stored in the
result storage. By default a Redis server is launched inside the Mayan EDMS
container. You can launch a separate Docker Redis container and tell the Mayan
EDMS container to use this via the MAYAN_CELERY_RESULT_BACKEND environment
variable. The format of this variable is explained here: http://docs.celeryproject.org/en/3.1/configuration.html#celery-result-backend
Deployment type
===============
Docker provides a faster deployment and the overhead is not high on modern
systems. It is however memory and CPU limited by default and you need to
increase this limits. The settings to change the container resource limits
are here: https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory
For the best performance possible use the advanced deployment method on a
host dedicated to serving only Mayan EDMS.
Storage
=======
Mayan EDMS stores documents in their original file format only changing the
filename to avoid collision. For best input and output speed use a block
based local filesystem for the ``/media`` sub folder of the path specified by
the MEDIA_ROOT setting. For increased storage capacity use an object storage
filesystem like S3.
To use a S3 compatible object storage do the following:
* Install the Python packages ``django-storages`` and ``boto3``:
* Using Python::
pip install django-storages boto3
* Using Docker::
-e MAYAN_PIP_INSTALLS='django-storages boto3'
On the Mayan EDMS user interface, go to ``System``, ``Setup``, ``Settings``,
``Documents`` and change the following setting:
* ``DOCUMENTS_STORAGE_BACKEND`` to ``storages.backends.s3boto3.S3Boto3Storage``
* ``DOCUMENTS_STORAGE_BACKEND_ARGUMENTS`` to ``'{access_key: <your access key>, secret_key: <your secret key>, bucket_name: <bucket name>}'``.
Restart Mayan EDMS for the changes to take effect.
Use additional hosts
====================
When one host is not enough you can use multiple hosts and share the load.
Make sure that all hosts share the ``/media`` folder as specified by the
MEDIA_ROOT setting, also the database, the broker, and the result storage.
One setting that needs to be changed in this configuration is the lock
manager backend.
Resource locking is a technique to avoid two processes or tasks to modify
the same resource at the same time causing a race condition. Mayan EDMS uses
its own lock manager. By default the lock manager with use a simple file
based lock backend ideal for single host installations. For multiple hosts
installation the database backend must be used in other to coordinate the
resource locks between the different hosts over a share data medium. This is
accomplished by modifying the environment variable LOCK_MANAGER_BACKEND in
both the direct deployment or the Docker image. Use the value
"lock_manager.backends.model_lock.ModelLock" to switch to the database
resource lock backend. If you can also write your own lock manager backend
for other data sharing mediums with better performance than a relational
database like Redis, Memcached, Zoo Keeper.