From eff399612d0094cd0ead0f5e5a2b1e93e773965d Mon Sep 17 00:00:00 2001
From: Roberto Rosario <roberto.rosario.gonzalez@gmail.com>
Date: Wed, 1 Feb 2012 14:41:45 -0400
Subject: [PATCH] Documentation updates

---
 docs/faq/index.rst                     | 141 +++++++++++++++++++++++--
 docs/index.rst                         |  31 ++++--
 docs/topics/document_visualization.rst |  21 ++++
 docs/topics/file_storage.rst           |  25 +++++
 docs/topics/indexes.rst                |  12 +++
 docs/topics/ocr.rst                    |  19 ++++
 docs/topics/smart_links.rst            |   9 ++
 7 files changed, 238 insertions(+), 20 deletions(-)
 create mode 100644 docs/topics/document_visualization.rst
 create mode 100644 docs/topics/file_storage.rst
 create mode 100644 docs/topics/indexes.rst
 create mode 100644 docs/topics/ocr.rst
 create mode 100644 docs/topics/smart_links.rst

diff --git a/docs/faq/index.rst b/docs/faq/index.rst
index 2f836bcbdd..c92c72cc02 100644
--- a/docs/faq/index.rst
+++ b/docs/faq/index.rst
@@ -7,7 +7,19 @@ Frequently asked questions and solutions
 Database related
 ----------------
 
-Q: _mysql_exceptions.OperationalError: (1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
+Q: PostgreSQL vs. MySQL
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Since Django abstracts database operations from a functional point of view
+**Mayan EDMS** will behave exactly the same either way.  The only concern
+would be that MySQL doesn't support transactions for schema modifying
+commands.  The only moment this could cause problems is when running
+South migrations during upgrades, if a migration fails the database
+structure is left in a transitory state and has to be reverted manually
+before trying again.
+
+
+Q: _mysql_exceptions. OperationalError: (1267, "Illegal mix of collations (latin1_swedish_ci, IMPLICIT) and (utf8_general_ci, COERCIBLE) for operation '='")
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 * Solution::
@@ -66,8 +78,11 @@ Q: File system links not showing when serving content with ``Samba``
   - Ref: 1- http://www.samba.org/samba/docs/man/manpages-3/smb.conf.5.html
 
 
+Document handling
+-----------------
+
 How to store documents outside of **Mayan EDMS's** path
--------------------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 * Sub class Django's ``FileSystemStorage`` class:
     
@@ -87,8 +102,8 @@ How to store documents outside of **Mayan EDMS's** path
       DOCUMENTS_STORAGE_BACKEND = CustomStorage
 
 
-How to enable the ``GridFS`` storage backend
---------------------------------------------
+Q: How to enable the ``GridFS`` storage backend
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 * Solution:
    
@@ -99,9 +114,19 @@ How to enable the ``GridFS`` storage backend
   - Filesystem metadata indexing will not work with this storage backend as
     the files are inside a ``MongoDB`` database and can't be linked (at least for now)
 
+Q: How do you upload a new version of an existing file? 
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Site search is slow
--------------------
+* Solution:
+
+  - Choose a document, and go to the versions tab, on the right menu at
+    the bottom under ``Other available action`` there is
+    ``Upload new version``.  Clicking it will take you to a very similar
+    view as the ``Upload new document`` but you will be able to specify
+    version number and comments for the new version being uploaded.
+
+Q: Site search is slow
+~~~~~~~~~~~~~~~~~~~
 
 * Add indexes to the following fields:
   
@@ -109,8 +134,12 @@ Site search is slow
   - ``documents_documentpage`` - content, recommended size: 3000
 
 
-How to enable x-sendile support for ``Apache``
-----------------------------------------------
+Webserver
+---------
+
+Q: How to enable x-sendile support for ``Apache``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 * If using Ubuntu execute the following::
  
   $ sudo apt-get install libapache2-mod-xsendfile
@@ -125,11 +154,103 @@ How to enable x-sendile support for ``Apache``
     XSendFileAllowAbove on
       
 
-The included version of ``unoconv`` in my distribution is too old
--------------------------------------------------------------
+OCR
+---
+
+Q: The included version of ``unoconv`` in my distribution is too old
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       
 * Only the file 'unoconv' file from https://github.com/dagwieers/unoconv is needed.  
   Put it in a user designated directory for binaries such as /usr/local/bin and 
   setup Mayan's configuration option in your settings_local.py file like this::
     
     CONVERTER_UNOCONV_PATH = '/usr/local/bin/unoconv'
+    
+    
+Deployments
+-----------
+
+Q: Is virtualenv required as specified in the documentation?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* It is not necessary, it's just a strong recommendation mainly to reduce
+  dependency conflicts by isolation from the main Python system install.
+  If not using a virtualenv, pip would install Mayan's dependencies
+  globally coming in conflict with the distribution's prepackaged Python
+  libraries messing other Django projects or Python programs, or another
+  later Python/Django project dependencies coming into conflict causing
+  Mayan to stop working for no apparent reason.
+  
+  
+Q: Mayan EDMS installed correctly and works, but static files are not served
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Django's development server doesn't serve static files unless the ``DEBUG``
+option is set to ``True``, this mode of operation should only be used for 
+development or testing.  For production deployments the management command::
+
+  $ ./manage.py collectstatic
+  
+should be used and the resulting ``static`` folder served from a webserver.
+For more information, read https://docs.djangoproject.com/en/dev/howto/static-files/
+and https://docs.djangoproject.com/en/1.2/howto/static-files/ or 
+http://mayan-edms-ru.blogspot.com/2011/11/blog-post_09.html 
+
+  
+Other
+-----
+
+
+Q: How to connect Mayan EDMS to an Active Directory tree
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+I used these two libraries as they seemed the most maintained from the quick search I did.
+
+* http://www.python-ldap.org/
+* http://packages.python.org/django-auth-ldap/
+
+After figuring out the corresponding OU, CN and such (which took quite a while since I'm not well versed in LDAP).  For configuration options, Mayan EDMS imports settings_local.py after importing settings.py to allow users to override the defaults without modifying any file tracked by Git, this makes upgrading by using Git's pull command extremely easy.  My settings_local.py file is as follows:
+
+::
+
+    import ldap
+    from django_auth_ldap.config import LDAPSearch
+
+    # makes sure this works in Active Directory
+    ldap.set_option(ldap.OPT_REFERRALS, 0)
+
+    AUTH_LDAP_SERVER_URI = "ldap://172.16.XX.XX:389"
+    AUTH_LDAP_BIND_DN = 'cn=Roberto Rosario Gonzalez,ou=Aguadilla,ou=XX,ou=XX,dc=XX,dc=XX,dc=XX'
+    AUTH_LDAP_BIND_PASSWORD = 'XXXXXXXXXXXXXX'
+    AUTH_LDAP_USER_SEARCH = LDAPSearch('dc=XX,dc=XX,dc=XX', ldap.SCOPE_SUBTREE, '(SAMAccountName=%(user)s)')
+
+    # Populate the Django user from the LDAP directory.
+    AUTH_LDAP_USER_ATTR_MAP = {
+        "first_name": "givenName",
+        "last_name": "sn",
+        "email": "mail"
+    }
+
+    # This is the default, but I like to be explicit.
+    AUTH_LDAP_ALWAYS_UPDATE_USER = True
+
+    AUTHENTICATION_BACKENDS = (
+        'django_auth_ldap.backend.LDAPBackend',
+        'django.contrib.auth.backends.ModelBackend',
+    )
+
+
+
+if your organization policies don't allow anonymous directory queries,
+create a dummy account and set the ``AUTH_LDAP_BIND_DN`` and
+``AUTH_LDAP_BIND_PASSWORD`` options to match the account.
+
+For a more advanced example check this StackOverflow question:
+http://stackoverflow.com/questions/6493985/django-auth-ldap
+
+
+Q:  Can you change the display order of documents...i.e can they be in alphabetical order?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A the moment no, but it is something being considered.
+
diff --git a/docs/index.rst b/docs/index.rst
index 3e90a4a9ae..2d59c73f8d 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -5,20 +5,26 @@
 Mayan EDMS documentation
 ========================
 
-.. rubric:: Open source, Django_ based document manager with custom metadata indexing, file serving integration, OCR_ capabilities, document versioning and digital signature verification.
+.. rubric:: `Open source`_, Django_ based document manager with custom
+            metadata_ indexing_, file serving integration, OCR_ capabilities,
+            document versioning_ and `digital signature verification`_.
 
 .. _Django: http://www.djangoproject.com/
+.. _OCR: https://secure.wikimedia.org/wikipedia/en/wiki/Optical_character_recognition
+.. _digital signature verification: http://en.wikipedia.org/wiki/Digital_signature
+.. _versioning: http://en.wikipedia.org/wiki/Versioning
+.. _metadata: http://en.wikipedia.org/wiki/Metadata
+.. _indexing: http://en.wikipedia.org/wiki/Index_card
+.. _Open source: http://en.wikipedia.org/wiki/Open_source
 
-
-=================
 Links of interest
 =================
 
-:Website: http://www.mayan-edms.com
-:Source:  http://github.com/rosarior/mayan
-:Video:  http://bit.ly/pADNXv
-:Issue tracker:  http://github.com/rosarior/mayan/issues
-:Mailing list:  http://groups.google.com/group/mayan-edms 
+* Website: http://www.mayan-edms.com
+* Source: http://github.com/rosarior/mayan
+* Video: http://bit.ly/pADNXv
+* Issue tracker: http://github.com/rosarior/mayan/issues
+* Mailing list: http://groups.google.com/group/mayan-edms 
 
 
 First steps
@@ -33,7 +39,12 @@ First steps
 Understanding Mayan EDMS
 ========================
 
-  :doc:`Transformations <topics/transformations>`
+  :doc:`Transformations <topics/transformations>` |
+  :doc:`Indexes <topics/indexes>` |
+  :doc:`Smart links <topics/smart_links>` |
+  :doc:`Document visualization <topics/document_visualization>` |
+  :doc:`OCR <topics/ocr>` |
+  :doc:`File storage <topics/file_storage>`
 
 
 Between versions
@@ -61,7 +72,7 @@ Credits
 
   :doc:`Contributors <topics/contributors>` |
   :doc:`Software used <topics/software_used>` |
-  :doc:`Lincensing <license>`
+  :doc:`Licensing <license>`
     
 
 Getting help
diff --git a/docs/topics/document_visualization.rst b/docs/topics/document_visualization.rst
new file mode 100644
index 0000000000..9474f353df
--- /dev/null
+++ b/docs/topics/document_visualization.rst
@@ -0,0 +1,21 @@
+======================
+Document visualization
+======================
+
+
+Mayan EDMS tries to avoid having users to download a document and leave
+Mayan EDMS to be able to see them, so in essence making Mayan EDMS a
+visualization tool too.  The conversion backend is a stack of functions,
+first the mimetype is evaluated, if it is an office document it is passed
+to libreoffice working in headless mode (and managed by supervisor)
+via unoconv for conversion to PDF.  The PDF is stored in a temporary
+cache along side all the other files that were not office documents,
+from here they are inspected to determine the page count and the
+corresponding blank database entires are created.  After the database
+update they all go to the conversion driver specified by the user
+(``python``, ``graphicsmagick``, imagemagick``) and a high resolution
+master preview of each file is generated and stored in the persistent
+cache.  From the master previews in the persistent cache, volatile
+previews are then created on demand for the different sizes requested
+(thumbnail, page preview, full preview) and rotate interactively
+in the details view.
diff --git a/docs/topics/file_storage.rst b/docs/topics/file_storage.rst
new file mode 100644
index 0000000000..054df85465
--- /dev/null
+++ b/docs/topics/file_storage.rst
@@ -0,0 +1,25 @@
+============
+File storage
+============
+
+The files are stored and placed under Mayan EDMS "control" to avoid
+filename clashes (each file gets renamed to its UUID and with an extension)
+and stored in a simple flat arrangement in a directory.  This doesn't
+stop access to the files but it is not recommended because moving,
+renaming or updating the files directly would throw the database out
+of sync.  For access to the files the recommended way is to create and
+index which would create a directory tree like structure in the database
+and then turn on the index filesystem mirror options which would create
+an actual directory tree and links to the actual stored files but using
+the filename of the documents as stored in the database.  This
+filesystem mirror of the index can them be shared with Samba across the
+network.  This access would be read-only, and new versions of the files
+would have to be uploaded from the web GUI using the new document
+versioning support.
+
+Mayan's EDMS components are as decoupled from each other as possible,
+storage in this case is very decoupled and its behavior is controlled
+not by the project but by the Storage progamming class.  Why this design?
+All the other part don't make any assumptions about the actual file
+storage, so that Mayan EDMS can work saving files locally, over the
+network or even across the internet and still operate exactly the same.
diff --git a/docs/topics/indexes.rst b/docs/topics/indexes.rst
new file mode 100644
index 0000000000..ed4ab46e51
--- /dev/null
+++ b/docs/topics/indexes.rst
@@ -0,0 +1,12 @@
+=======
+Indexes
+=======
+
+Administrators first define the template of the index and an instance
+of the index is then auto-populated with links to the documents depending
+on the rules of each branch of the index evaluated againts the metadata
+of the documents.  The index cannot be edited manually, only changing
+the rules or the metadata of the documents would cause the index to be
+regenerated.  For manual organization of documents there are the folders,
+their structure is however flat, and they have to be manually updated and
+curated.
diff --git a/docs/topics/ocr.rst b/docs/topics/ocr.rst
new file mode 100644
index 0000000000..ef17a2885c
--- /dev/null
+++ b/docs/topics/ocr.rst
@@ -0,0 +1,19 @@
+===
+OCR
+===
+
+Because OCR is an intensive operation, documents are queued for OCR for
+later handling, the amount of documents processed in parallel is
+controlled by the ``OCR_NODE_CONCURRENT_EXECUTION`` configuration
+option.  Ideally the machine serving **Mayan EDMS** should disable OCR 
+processing by settings this options to 0, with other machines or cloud
+instances then connected to the same database doing the OCR processing.
+The document is checked to see if there are text parsers available, is
+no parser is available for that file type then the document is passed
+to tesseract page by page and the results stored per page, this is to
+keep the page image in sync with the transcribed text.  However when
+viewing the document in the details tab all the pages text are
+concatenated and shown to the user. Setting the ``OCR_AUTOMATIC_OCR``
+option to ``True`` would cause all newly uploaded documents to be
+queued automatically for OCR.
+ 
diff --git a/docs/topics/smart_links.rst b/docs/topics/smart_links.rst
new file mode 100644
index 0000000000..ea2559e965
--- /dev/null
+++ b/docs/topics/smart_links.rst
@@ -0,0 +1,9 @@
+===========
+Smart links
+===========
+
+Smart links are usefull for navigation between documents.  They are rule
+based but don't created any organizational structure just show the documents
+that match the rules as evaluated against the metadata of currently
+displayed document.  The index is global, the smart links are dependant
+on the current document the user is viewing.