documenation: Add Docker installation method using a dedicated Docker network. Add scaling up chapter. Add S3 storage configuration section.
Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
This commit is contained in:
@@ -71,6 +71,47 @@ If another web server is running on port 80 use a different port in the
|
||||
``-p`` option. For example: ``-p 81:8000``.
|
||||
|
||||
|
||||
Using a dedicated Docker network
|
||||
--------------------------------
|
||||
Use this method to avoid having to expose PostreSQL port to the host's network
|
||||
or if you have other PostgreSQL instances but still want to use the default
|
||||
port of 5432 for this installation.
|
||||
|
||||
Create the network::
|
||||
|
||||
docker network create mayan
|
||||
|
||||
Launch the PostgreSQL container with the network option and remove the port
|
||||
binding (``-p 5432:5432``)::
|
||||
|
||||
docker run -d \
|
||||
--name mayan-edms-postgres \
|
||||
--network=mayan \
|
||||
--restart=always \
|
||||
-e POSTGRES_USER=mayan \
|
||||
-e POSTGRES_DB=mayan \
|
||||
-e POSTGRES_PASSWORD=mayanuserpass \
|
||||
-v /docker-volumes/mayan-edms/postgres:/var/lib/postgresql/data \
|
||||
-d postgres:9.5
|
||||
|
||||
Launch the Mayan EDMS container with the network option and change the
|
||||
database hostname to the PostgreSQL container name (``mayan-edms-postgres``)
|
||||
instead of the IP address of the Docker host (``172.17.0.1``)::
|
||||
|
||||
docker run -d \
|
||||
--name mayan-edms \
|
||||
--network=mayan \
|
||||
--restart=always \
|
||||
-p 80:8000 \
|
||||
-e MAYAN_DATABASE_ENGINE=django.db.backends.postgresql \
|
||||
-e MAYAN_DATABASE_HOST=mayan-edms-postgres \
|
||||
-e MAYAN_DATABASE_NAME=mayan \
|
||||
-e MAYAN_DATABASE_PASSWORD=mayanuserpass \
|
||||
-e MAYAN_DATABASE_USER=mayan \
|
||||
-e MAYAN_DATABASE_CONN_MAX_AGE=60 \
|
||||
-v /docker-volumes/mayan-edms/media:/var/lib/mayan \
|
||||
mayanedms/mayanedms:<version>
|
||||
|
||||
Stopping and starting the container
|
||||
--------------------------------------
|
||||
|
||||
|
||||
195
docs/topics/scaling_up.rst
Normal file
195
docs/topics/scaling_up.rst
Normal file
@@ -0,0 +1,195 @@
|
||||
.. _scaling_up:
|
||||
|
||||
|
||||
==========
|
||||
Scaling up
|
||||
==========
|
||||
|
||||
The default installation method fits most use cases. If you use case requires
|
||||
more speed or capacity here are some suggestion that can help you improve the
|
||||
performance of your installation.
|
||||
|
||||
Change the database manager
|
||||
===========================
|
||||
Use PostgreSQL or MySQL as the database manager.
|
||||
Tweak the memory setting of the database manager to increase memory allocation.
|
||||
More PostgreSQL especific examples are available in their wiki page:
|
||||
https://wiki.postgresql.org/wiki/Performance_Optimization
|
||||
|
||||
Increase the number of Gunicorn workers
|
||||
=======================================
|
||||
The Gunicorn workers process HTTP requests and affect the speed at which the
|
||||
website responds.
|
||||
|
||||
If you are using the Docker image, change the value of the
|
||||
MAYAN_GUNICORN_WORKERS (https://docs.mayan-edms.com/topics/docker.html#environment-variables)
|
||||
environment variable. Normally this variable defaults to 2. Increase this
|
||||
number to match the number of CPU cores + 1.
|
||||
|
||||
If you are using the direct deployment methods, change the line that reads::
|
||||
|
||||
command = /opt/mayan-edms/bin/gunicorn -w 2 mayan.wsgi --max-requests 500 --max-requests-jitter 50 --worker-class gevent --bind 0.0.0.0:8000 --timeout 120
|
||||
|
||||
And increase the value of the ``-w 2`` argument. This line is found in the
|
||||
``[program:mayan-gunicorn]`` section of the supervisor configuration file.
|
||||
|
||||
|
||||
Background task processing
|
||||
==========================
|
||||
The Celery workers are system processes that take care of the background
|
||||
tasks requested by the frontend interactions like document image rendering
|
||||
and periodic tasks like OCR. There are several dozen tasks defined in the code.
|
||||
These tasks are divided into queues based on the app of the relationship
|
||||
between the tasks. The queues by default are divided into three groups
|
||||
based on the speed at which they need to be processed. The document page
|
||||
image rendering for example is categorized as a high volume, short duration
|
||||
task. The OCR is a high volume, long duration task. Email checking is a
|
||||
low volume, medium duration tasks. It is not advisable to have the same
|
||||
worker processing OCR to process image rendering too. If the worker is
|
||||
processing several OCR tasks it will not be able to provide fast images
|
||||
when an user is browsing the user interface. This is why by default the
|
||||
queues are split into 3 workers: fast, medium, and slow.
|
||||
|
||||
The fast worker handles the queues:
|
||||
|
||||
* converter: Handles document page rendering
|
||||
* sources_fast: Does staging file image rendering
|
||||
|
||||
The medium worker handles the queues:
|
||||
|
||||
* checkouts_periodic: Scheduled tasks that check if a document's checkout
|
||||
period has expired
|
||||
* documents_periodic:
|
||||
* indexing: Does reindexing of documents in the background when their
|
||||
properties change
|
||||
* metadata:
|
||||
* sources:
|
||||
* sources_periodic: Checking email accounts and watch folders for new
|
||||
documents.
|
||||
* uploads: Processes files to turn the into Mayan documents. Processing
|
||||
encompasses MIME type detection, page count detection.
|
||||
* documents:
|
||||
|
||||
The slow worker handles the queues:
|
||||
|
||||
* mailing: Does the actual sending of documents via email as requested by
|
||||
users via the mailing profiles
|
||||
* tools: Executes in the background maintenance requests from the options
|
||||
in the tools menu
|
||||
* statistics: Recalculates statistics and charts
|
||||
* parsing: Parses documents to extract actual text content
|
||||
* ocr: Performs OCR to transcribe page images to text
|
||||
|
||||
Optimizations
|
||||
-------------
|
||||
|
||||
* Increase the number of workers and redistribute the queues among them
|
||||
(only possible with direct deployments).
|
||||
* Launch more workers to service a queue. For example for faster document
|
||||
image generation launch 2 workers to process the converter queue only
|
||||
possible with direct deployments).
|
||||
* By default each worker process uses 1 thread. You can increase the thread
|
||||
count of each worker process with the Docker environment options:
|
||||
|
||||
* MAYAN_WORKER_FAST_CONCURRENCY
|
||||
* MAYAN_WORKER_MEDIUM_CONCURRENCY
|
||||
* MAYAN_WORKER_SLOW_CONCURRENCY
|
||||
|
||||
* If using direct deployment, increase the value of the --concurrency=1
|
||||
argument of each worker in the supervisor file. You can also remove this
|
||||
argument and let the Celery algorithm choose the number of threads to
|
||||
launch. Usually this defaults to the number of CPU cores + 1.
|
||||
|
||||
Change the message broker
|
||||
=========================
|
||||
Messages are the method of communication between front end interactive code
|
||||
and background tasks. In this regard messages can be thought as homologous
|
||||
to tasks requests. Improving how many messages can be sent, stored and
|
||||
sorted will impact the number of tasks the system can handle. To save on
|
||||
memory, the basic deployment method and the Docker image default to using
|
||||
Redis as a message broker. To increase capacity and reduce volatility of
|
||||
messages (pending tasks are not lost during shutdown) use RabbitMQ to
|
||||
shuffle messages.
|
||||
|
||||
For direct installs refer to the Advanced deployment method
|
||||
(https://docs.mayan-edms.com/topics/deploying.html#advanced-deployment) for
|
||||
the required changes.
|
||||
|
||||
For the Docker image, launch a separate RabbitMQ container
|
||||
(https://hub.docker.com/_/rabbitmq/)::
|
||||
|
||||
docker run -d --name mayan-edms-rabbitmq -e RABBITMQ_DEFAULT_USER=mayan -e RABBITMQ_DEFAULT_PASS=mayanrabbitmqpassword -e RABBITMQ_DEFAULT_VHOST=mayan rabbitmq:3
|
||||
|
||||
Pass the MAYAN_BROKER_URL environment variable (https://kombu.readthedocs.io/en/latest/userguide/connections.html#connection-urls)
|
||||
to the Mayan EDMS container so that it uses the RabbitMQ container the
|
||||
message broker::
|
||||
|
||||
-e MAYAN_BROKER_URL="amqp://mayan:mayanrabbitmqpassword@localhost:5672/mayan",
|
||||
|
||||
When tasks finish, they leave behind a return status or the result of a
|
||||
calculation, these are stored for a while so that whoever requested the
|
||||
background task, is able retrieve the result. These results are stored in the
|
||||
result storage. By default a Redis server is launched inside the Mayan EDMS
|
||||
container. You can launch a separate Docker Redis container and tell the Mayan
|
||||
EDMS container to use this via the MAYAN_CELERY_RESULT_BACKEND environment
|
||||
variable. The format of this variable is explained here: http://docs.celeryproject.org/en/3.1/configuration.html#celery-result-backend
|
||||
|
||||
Deployment type
|
||||
===============
|
||||
Docker provides a faster deployment and the overhead is not high on modern
|
||||
systems. It is however memory and CPU limited by default and you need to
|
||||
increase this limits. The settings to change the container resource limits
|
||||
are here: https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory
|
||||
|
||||
For the best performance possible use the advanced deployment method on a
|
||||
host dedicated to serving only Mayan EDMS.
|
||||
|
||||
Storage
|
||||
=======
|
||||
Mayan EDMS stores documents in their original file format only changing the
|
||||
filename to avoid collision. For best input and output speed use a block
|
||||
based local filesystem for the ``/media`` sub folder of the path specified by
|
||||
the MEDIA_ROOT setting. For increased storage capacity use an object storage
|
||||
filesystem like S3.
|
||||
|
||||
To use a S3 compatible object storage do the following:
|
||||
|
||||
* Install the Python packages ``django-storages`` and ``boto3``:
|
||||
|
||||
* Using Python::
|
||||
|
||||
pip install django-storages boto3
|
||||
|
||||
* Using Docker::
|
||||
|
||||
-e MAYAN_PIP_INSTALLS='django-storages boto3'
|
||||
|
||||
On the Mayan EDMS user interface, go to ``System``, ``Setup``, ``Settings``,
|
||||
``Documents`` and change the following setting:
|
||||
|
||||
* ``DOCUMENTS_STORAGE_BACKEND`` to ``storages.backends.s3boto3.S3Boto3Storage``
|
||||
* ``DOCUMENTS_STORAGE_BACKEND_ARGUMENTS`` to ``'{access_key: <your access key>, secret_key: <your secret key>, bucket_name: <bucket name>}'``.
|
||||
|
||||
Restart Mayan EDMS for the changes to take effect.
|
||||
|
||||
|
||||
Use additional hosts
|
||||
====================
|
||||
When one host is not enough you can use multiple hosts and share the load.
|
||||
Make sure that all hosts share the ``/media`` folder as specified by the
|
||||
MEDIA_ROOT setting, also the database, the broker, and the result storage.
|
||||
One setting that needs to be changed in this configuration is the lock
|
||||
manager backend.
|
||||
|
||||
Resource locking is a technique to avoid two processes or tasks to modify
|
||||
the same resource at the same time causing a race condition. Mayan EDMS uses
|
||||
its own lock manager. By default the lock manager with use a simple file
|
||||
based lock backend ideal for single host installations. For multiple hosts
|
||||
installation the database backend must be used in other to coordinate the
|
||||
resource locks between the different hosts over a share data medium. This is
|
||||
accomplished by modifying the environment variable LOCK_MANAGER_BACKEND in
|
||||
both the direct deployment or the Docker image. Use the value
|
||||
"lock_manager.backends.model_lock.ModelLock" to switch to the database
|
||||
resource lock backend. If you can also write your own lock manager backend
|
||||
for other data sharing mediums with better performance than a relational
|
||||
database like Redis, Memcached, Zoo Keeper.
|
||||
Reference in New Issue
Block a user