documenation: Add Docker installation method using a dedicated Docker network. Add scaling up chapter. Add S3 storage configuration section.

Signed-off-by: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
2018-10-14 03:47:41 -04:00
parent 5a922e2689
commit c9fb3814d9
5 changed files with 245 additions and 3 deletions
--- a/HISTORY.rst
+++ b/HISTORY.rst
@@ -1,4 +1,4 @@
-3.1.7 (2018-10-XX)
+3.1.7 (2018-10-14)
 ==================
 * Fix an issue with some browsers not firing the .load event on cached
  images. Ref: http://api.jquery.com/load-event/
@@ -18,6 +18,11 @@
 * Add a noop OCR backend that disables OCR and the check for the
  Tesseract OCR binaries. Set the OCR_BACKEND setting or MAYAN_OCR_BACKEND
  environment variable to ocr.backends.pyocr.PyOCR to use this.
 * All tests pass on Python 3.
 * documentation: Add Docker installation method using a dedicated
  Docker network.
 * documentation: Add scaling up chapter.
 * documentation: Add S3 storage configuration section.
 3.1.6 (2018-10-09)
 ==================
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -43,6 +43,7 @@ repository for electronic documents.
    Docker image <topics/docker>
    Direct deployments <topics/deploying>
    Scaling up <topics/scaling_up>
    Development <topics/development>
    App creation <topics/app_creation>
--- a/docs/releases/3.1.7.rst
+++ b/docs/releases/3.1.7.rst
@@ -2,7 +2,7 @@
 Mayan EDMS v3.1.7 release notes
 ===============================
-Released: October XX, 2018
+Released: October 14, 2018
 Changes
 ~~~~~~~
@@ -24,7 +24,7 @@ Changes
 * Add a noop OCR backend that disables OCR and the check for the
  Tesseract OCR binaries. Set the OCR_BACKEND setting or MAYAN_OCR_BACKEND
  environment variable to ocr.backends.pyocr.PyOCR to use this.
-
+* All tests pass on Python 3.
 Removals
 --------
--- a/docs/topics/docker.rst
+++ b/docs/topics/docker.rst
@@ -71,6 +71,47 @@ If another web server is running on port 80 use a different port in the
 ``-p`` option. For example: ``-p 81:8000``.
 Using a dedicated Docker network
 --------------------------------
 Use this method to avoid having to expose PostreSQL port to the host's network
 or if you have other PostgreSQL instances but still want to use the default
 port of 5432 for this installation.
 Create the network::
    docker network create mayan
 Launch the PostgreSQL container with the network option and remove the port
 binding (``-p 5432:5432``)::
    docker run -d \
    --name mayan-edms-postgres \
    --network=mayan \
    --restart=always \
    -e POSTGRES_USER=mayan \
    -e POSTGRES_DB=mayan \
    -e POSTGRES_PASSWORD=mayanuserpass \
    -v /docker-volumes/mayan-edms/postgres:/var/lib/postgresql/data \
    -d postgres:9.5
 Launch the Mayan EDMS container with the network option and change the
 database hostname to the PostgreSQL container name (``mayan-edms-postgres``)
 instead of the IP address of the Docker host (``172.17.0.1``)::
    docker run -d \
    --name mayan-edms \
    --network=mayan \
    --restart=always \
    -p 80:8000 \
    -e MAYAN_DATABASE_ENGINE=django.db.backends.postgresql \
    -e MAYAN_DATABASE_HOST=mayan-edms-postgres \
    -e MAYAN_DATABASE_NAME=mayan \
    -e MAYAN_DATABASE_PASSWORD=mayanuserpass \
    -e MAYAN_DATABASE_USER=mayan \
    -e MAYAN_DATABASE_CONN_MAX_AGE=60 \
    -v /docker-volumes/mayan-edms/media:/var/lib/mayan \
    mayanedms/mayanedms:<version>
 Stopping and starting the container
 --------------------------------------
--- a/docs/topics/scaling_up.rst
+++ b/docs/topics/scaling_up.rst
@@ -0,0 +1,195 @@
 .. _scaling_up:
 ==========
 Scaling up
 ==========
 The default installation method fits most use cases. If you use case requires
 more speed or capacity here are some suggestion that can help you improve the
 performance of your installation.
 Change the database manager
 ===========================
 Use PostgreSQL or MySQL as the database manager.
 Tweak the memory setting of the database manager to increase memory allocation.
 More PostgreSQL especific examples are available in their wiki page:
 https://wiki.postgresql.org/wiki/Performance_Optimization
 Increase the number of Gunicorn workers
 =======================================
 The Gunicorn workers process HTTP requests and affect the speed at which the
 website responds.
 If you are using the Docker image, change the value of the
 MAYAN_GUNICORN_WORKERS (https://docs.mayan-edms.com/topics/docker.html#environment-variables)
 environment variable. Normally this variable defaults to 2. Increase this
 number to match the number of CPU cores + 1.
 If you are using the direct deployment methods, change the line that reads::
    command = /opt/mayan-edms/bin/gunicorn -w 2 mayan.wsgi --max-requests 500 --max-requests-jitter 50 --worker-class gevent --bind 0.0.0.0:8000 --timeout 120
 And increase the value of the ``-w 2`` argument. This line is found in the
 ``[program:mayan-gunicorn]`` section of the supervisor configuration file.
 Background task processing
 ==========================
 The Celery workers are system processes that take care of the background
 tasks requested by the frontend interactions like document image rendering
 and periodic tasks like OCR. There are several dozen tasks defined in the code.
 These tasks are divided into queues based on the app of the relationship
 between the tasks. The queues by default are divided into three groups
 based on the speed at which they need to be processed. The document page
 image rendering for example is categorized as a high volume, short duration
 task. The OCR is a high volume, long duration task. Email checking is a
 low volume, medium duration tasks. It is not advisable to have the same
 worker processing OCR to process image rendering too. If the worker is
 processing several OCR tasks it will not be able to provide fast images
 when an user is browsing the user interface. This is why by default the
 queues are split into 3 workers: fast, medium, and slow.
 The fast worker handles the queues:
 * converter: Handles document page rendering
 * sources_fast: Does staging file image rendering
 The medium worker handles the queues:
 * checkouts_periodic: Scheduled tasks that check if a document's checkout
  period has expired
 * documents_periodic:
 * indexing: Does reindexing of documents in the background when their
  properties change
 * metadata:
 * sources:
 * sources_periodic: Checking email accounts and watch folders for new
  documents.
 * uploads: Processes files to turn the into Mayan documents. Processing
  encompasses MIME type detection, page count detection.
 * documents:
 The slow worker handles the queues:
 * mailing: Does the actual sending of documents via email as requested by
  users via the mailing profiles
 * tools: Executes in the background maintenance requests from the options
  in the tools menu
 * statistics: Recalculates statistics and charts
 * parsing: Parses documents to extract actual text content
 * ocr: Performs OCR to transcribe page images to text
 Optimizations
 -------------
 * Increase the number of workers and redistribute the queues among them
  (only possible with direct deployments).
 * Launch more workers to service a queue. For example for faster document
  image generation launch 2 workers to process the converter queue only
  possible with direct deployments).
 * By default each worker process uses 1 thread. You can increase the thread
  count of each worker process with the Docker environment options:
  * MAYAN_WORKER_FAST_CONCURRENCY
  * MAYAN_WORKER_MEDIUM_CONCURRENCY
  * MAYAN_WORKER_SLOW_CONCURRENCY
 * If using direct deployment, increase the value of the --concurrency=1
  argument of each worker in the supervisor file. You can also remove this
  argument and let the Celery algorithm choose the number of threads to
  launch. Usually this defaults to the number of CPU cores + 1.
 Change the message broker
 =========================
 Messages are the method of communication between front end interactive code
 and background tasks. In this regard messages can be thought as homologous
 to tasks requests. Improving how many messages can be sent, stored and
 sorted will impact the number of tasks the system can handle. To save on
 memory, the basic deployment method and the Docker image default to using
 Redis as a message broker. To increase capacity and reduce volatility of
 messages (pending tasks are not lost during shutdown) use RabbitMQ to
 shuffle messages.
 For direct installs refer to the Advanced deployment method
 (https://docs.mayan-edms.com/topics/deploying.html#advanced-deployment) for
 the required changes.
 For the Docker image, launch a separate RabbitMQ container
 (https://hub.docker.com/_/rabbitmq/)::
    docker run -d --name mayan-edms-rabbitmq -e RABBITMQ_DEFAULT_USER=mayan -e RABBITMQ_DEFAULT_PASS=mayanrabbitmqpassword -e RABBITMQ_DEFAULT_VHOST=mayan rabbitmq:3
 Pass the MAYAN_BROKER_URL environment variable (https://kombu.readthedocs.io/en/latest/userguide/connections.html#connection-urls)
 to the Mayan EDMS container so that it uses the RabbitMQ container the
 message broker::
    -e MAYAN_BROKER_URL="amqp://mayan:mayanrabbitmqpassword@localhost:5672/mayan",
 When tasks finish, they leave behind a return status or the result of a
 calculation, these are stored for a while so that whoever requested the
 background task, is able retrieve the result. These results are stored in the
 result storage. By default a Redis server is launched inside the Mayan EDMS
 container. You can launch a separate Docker Redis container and tell the Mayan
 EDMS container to use this via the MAYAN_CELERY_RESULT_BACKEND environment
 variable. The format of this variable is explained here: http://docs.celeryproject.org/en/3.1/configuration.html#celery-result-backend
 Deployment type
 ===============
 Docker provides a faster deployment and the overhead is not high on modern
 systems. It is however memory and CPU limited by default and you need to
 increase this limits. The settings to change the container resource limits
 are here: https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory
 For the best performance possible use the advanced deployment method on a
 host dedicated to serving only Mayan EDMS.
 Storage
 =======
 Mayan EDMS stores documents in their original file format only changing the
 filename to avoid collision. For best input and output speed use a block
 based local filesystem for the ``/media`` sub folder of the path specified by
 the MEDIA_ROOT setting. For increased storage capacity use an object storage
 filesystem like S3.
 To use a S3 compatible object storage do the following:
 * Install the Python packages ``django-storages`` and ``boto3``:
  * Using Python::
      pip install django-storages boto3
  * Using Docker::
    -e MAYAN_PIP_INSTALLS='django-storages boto3'
 On the Mayan EDMS user interface, go to ``System``, ``Setup``, ``Settings``,
 ``Documents`` and change the following setting:
 * ``DOCUMENTS_STORAGE_BACKEND`` to ``storages.backends.s3boto3.S3Boto3Storage``
 * ``DOCUMENTS_STORAGE_BACKEND_ARGUMENTS`` to ``'{access_key: <your access key>, secret_key: <your secret key>, bucket_name: <bucket name>}'``.
 Restart Mayan EDMS for the changes to take effect.
 Use additional hosts
 ====================
 When one host is not enough you can use multiple hosts and share the load.
 Make sure that all hosts share the ``/media`` folder as specified by the
 MEDIA_ROOT setting, also the database, the broker, and the result storage.
 One setting that needs to be changed in this configuration is the lock
 manager backend.
 Resource locking is a technique to avoid two processes or tasks to modify
 the same resource at the same time causing a race condition. Mayan EDMS uses
 its own lock manager. By default the lock manager with use a simple file
 based lock backend ideal for single host installations. For multiple hosts
 installation the database backend must be used in other to coordinate the
 resource locks between the different hosts over a share data medium. This is
 accomplished by modifying the environment variable LOCK_MANAGER_BACKEND in
 both the direct deployment or the Docker image. Use the value
 "lock_manager.backends.model_lock.ModelLock" to switch to the database
 resource lock backend. If you can also write your own lock manager backend
 for other data sharing mediums with better performance than a relational
 database like Redis, Memcached, Zoo Keeper.