From eff399612d0094cd0ead0f5e5a2b1e93e773965d Mon Sep 17 00:00:00 2001 From: Roberto Rosario Date: Wed, 1 Feb 2012 14:41:45 -0400 Subject: [PATCH] Documentation updates --- docs/faq/index.rst | 141 +++++++++++++++++++++++-- docs/index.rst | 31 ++++-- docs/topics/document_visualization.rst | 21 ++++ docs/topics/file_storage.rst | 25 +++++ docs/topics/indexes.rst | 12 +++ docs/topics/ocr.rst | 19 ++++ docs/topics/smart_links.rst | 9 ++ 7 files changed, 238 insertions(+), 20 deletions(-) create mode 100644 docs/topics/document_visualization.rst create mode 100644 docs/topics/file_storage.rst create mode 100644 docs/topics/indexes.rst create mode 100644 docs/topics/ocr.rst create mode 100644 docs/topics/smart_links.rst diff --git a/docs/faq/index.rst b/docs/faq/index.rst index 2f836bcbdd..c92c72cc02 100644 --- a/docs/faq/index.rst +++ b/docs/faq/index.rst @@ -7,7 +7,19 @@ Frequently asked questions and solutions Database related ---------------- -Q: _mysql_exceptions.OperationalError: (1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='") +Q: PostgreSQL vs. MySQL +~~~~~~~~~~~~~~~~~~~~~~~ + +Since Django abstracts database operations from a functional point of view +**Mayan EDMS** will behave exactly the same either way. The only concern +would be that MySQL doesn't support transactions for schema modifying +commands. The only moment this could cause problems is when running +South migrations during upgrades, if a migration fails the database +structure is left in a transitory state and has to be reverted manually +before trying again. + + +Q: _mysql_exceptions. OperationalError: (1267, "Illegal mix of collations (latin1_swedish_ci, IMPLICIT) and (utf8_general_ci, COERCIBLE) for operation '='") ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Solution:: @@ -66,8 +78,11 @@ Q: File system links not showing when serving content with ``Samba`` - Ref: 1- http://www.samba.org/samba/docs/man/manpages-3/smb.conf.5.html +Document handling +----------------- + How to store documents outside of **Mayan EDMS's** path -------------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Sub class Django's ``FileSystemStorage`` class: @@ -87,8 +102,8 @@ How to store documents outside of **Mayan EDMS's** path DOCUMENTS_STORAGE_BACKEND = CustomStorage -How to enable the ``GridFS`` storage backend --------------------------------------------- +Q: How to enable the ``GridFS`` storage backend +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Solution: @@ -99,9 +114,19 @@ How to enable the ``GridFS`` storage backend - Filesystem metadata indexing will not work with this storage backend as the files are inside a ``MongoDB`` database and can't be linked (at least for now) +Q: How do you upload a new version of an existing file? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Site search is slow -------------------- +* Solution: + + - Choose a document, and go to the versions tab, on the right menu at + the bottom under ``Other available action`` there is + ``Upload new version``. Clicking it will take you to a very similar + view as the ``Upload new document`` but you will be able to specify + version number and comments for the new version being uploaded. + +Q: Site search is slow +~~~~~~~~~~~~~~~~~~~ * Add indexes to the following fields: @@ -109,8 +134,12 @@ Site search is slow - ``documents_documentpage`` - content, recommended size: 3000 -How to enable x-sendile support for ``Apache`` ----------------------------------------------- +Webserver +--------- + +Q: How to enable x-sendile support for ``Apache`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * If using Ubuntu execute the following:: $ sudo apt-get install libapache2-mod-xsendfile @@ -125,11 +154,103 @@ How to enable x-sendile support for ``Apache`` XSendFileAllowAbove on -The included version of ``unoconv`` in my distribution is too old -------------------------------------------------------------- +OCR +--- + +Q: The included version of ``unoconv`` in my distribution is too old +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Only the file 'unoconv' file from https://github.com/dagwieers/unoconv is needed. Put it in a user designated directory for binaries such as /usr/local/bin and setup Mayan's configuration option in your settings_local.py file like this:: CONVERTER_UNOCONV_PATH = '/usr/local/bin/unoconv' + + +Deployments +----------- + +Q: Is virtualenv required as specified in the documentation? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* It is not necessary, it's just a strong recommendation mainly to reduce + dependency conflicts by isolation from the main Python system install. + If not using a virtualenv, pip would install Mayan's dependencies + globally coming in conflict with the distribution's prepackaged Python + libraries messing other Django projects or Python programs, or another + later Python/Django project dependencies coming into conflict causing + Mayan to stop working for no apparent reason. + + +Q: Mayan EDMS installed correctly and works, but static files are not served +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Django's development server doesn't serve static files unless the ``DEBUG`` +option is set to ``True``, this mode of operation should only be used for +development or testing. For production deployments the management command:: + + $ ./manage.py collectstatic + +should be used and the resulting ``static`` folder served from a webserver. +For more information, read https://docs.djangoproject.com/en/dev/howto/static-files/ +and https://docs.djangoproject.com/en/1.2/howto/static-files/ or +http://mayan-edms-ru.blogspot.com/2011/11/blog-post_09.html + + +Other +----- + + +Q: How to connect Mayan EDMS to an Active Directory tree +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +I used these two libraries as they seemed the most maintained from the quick search I did. + +* http://www.python-ldap.org/ +* http://packages.python.org/django-auth-ldap/ + +After figuring out the corresponding OU, CN and such (which took quite a while since I'm not well versed in LDAP). For configuration options, Mayan EDMS imports settings_local.py after importing settings.py to allow users to override the defaults without modifying any file tracked by Git, this makes upgrading by using Git's pull command extremely easy. My settings_local.py file is as follows: + +:: + + import ldap + from django_auth_ldap.config import LDAPSearch + + # makes sure this works in Active Directory + ldap.set_option(ldap.OPT_REFERRALS, 0) + + AUTH_LDAP_SERVER_URI = "ldap://172.16.XX.XX:389" + AUTH_LDAP_BIND_DN = 'cn=Roberto Rosario Gonzalez,ou=Aguadilla,ou=XX,ou=XX,dc=XX,dc=XX,dc=XX' + AUTH_LDAP_BIND_PASSWORD = 'XXXXXXXXXXXXXX' + AUTH_LDAP_USER_SEARCH = LDAPSearch('dc=XX,dc=XX,dc=XX', ldap.SCOPE_SUBTREE, '(SAMAccountName=%(user)s)') + + # Populate the Django user from the LDAP directory. + AUTH_LDAP_USER_ATTR_MAP = { + "first_name": "givenName", + "last_name": "sn", + "email": "mail" + } + + # This is the default, but I like to be explicit. + AUTH_LDAP_ALWAYS_UPDATE_USER = True + + AUTHENTICATION_BACKENDS = ( + 'django_auth_ldap.backend.LDAPBackend', + 'django.contrib.auth.backends.ModelBackend', + ) + + + +if your organization policies don't allow anonymous directory queries, +create a dummy account and set the ``AUTH_LDAP_BIND_DN`` and +``AUTH_LDAP_BIND_PASSWORD`` options to match the account. + +For a more advanced example check this StackOverflow question: +http://stackoverflow.com/questions/6493985/django-auth-ldap + + +Q: Can you change the display order of documents...i.e can they be in alphabetical order? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A the moment no, but it is something being considered. + diff --git a/docs/index.rst b/docs/index.rst index 3e90a4a9ae..2d59c73f8d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -5,20 +5,26 @@ Mayan EDMS documentation ======================== -.. rubric:: Open source, Django_ based document manager with custom metadata indexing, file serving integration, OCR_ capabilities, document versioning and digital signature verification. +.. rubric:: `Open source`_, Django_ based document manager with custom + metadata_ indexing_, file serving integration, OCR_ capabilities, + document versioning_ and `digital signature verification`_. .. _Django: http://www.djangoproject.com/ +.. _OCR: https://secure.wikimedia.org/wikipedia/en/wiki/Optical_character_recognition +.. _digital signature verification: http://en.wikipedia.org/wiki/Digital_signature +.. _versioning: http://en.wikipedia.org/wiki/Versioning +.. _metadata: http://en.wikipedia.org/wiki/Metadata +.. _indexing: http://en.wikipedia.org/wiki/Index_card +.. _Open source: http://en.wikipedia.org/wiki/Open_source - -================= Links of interest ================= -:Website: http://www.mayan-edms.com -:Source: http://github.com/rosarior/mayan -:Video: http://bit.ly/pADNXv -:Issue tracker: http://github.com/rosarior/mayan/issues -:Mailing list: http://groups.google.com/group/mayan-edms +* Website: http://www.mayan-edms.com +* Source: http://github.com/rosarior/mayan +* Video: http://bit.ly/pADNXv +* Issue tracker: http://github.com/rosarior/mayan/issues +* Mailing list: http://groups.google.com/group/mayan-edms First steps @@ -33,7 +39,12 @@ First steps Understanding Mayan EDMS ======================== - :doc:`Transformations ` + :doc:`Transformations ` | + :doc:`Indexes ` | + :doc:`Smart links ` | + :doc:`Document visualization ` | + :doc:`OCR ` | + :doc:`File storage ` Between versions @@ -61,7 +72,7 @@ Credits :doc:`Contributors ` | :doc:`Software used ` | - :doc:`Lincensing ` + :doc:`Licensing ` Getting help diff --git a/docs/topics/document_visualization.rst b/docs/topics/document_visualization.rst new file mode 100644 index 0000000000..9474f353df --- /dev/null +++ b/docs/topics/document_visualization.rst @@ -0,0 +1,21 @@ +====================== +Document visualization +====================== + + +Mayan EDMS tries to avoid having users to download a document and leave +Mayan EDMS to be able to see them, so in essence making Mayan EDMS a +visualization tool too. The conversion backend is a stack of functions, +first the mimetype is evaluated, if it is an office document it is passed +to libreoffice working in headless mode (and managed by supervisor) +via unoconv for conversion to PDF. The PDF is stored in a temporary +cache along side all the other files that were not office documents, +from here they are inspected to determine the page count and the +corresponding blank database entires are created. After the database +update they all go to the conversion driver specified by the user +(``python``, ``graphicsmagick``, imagemagick``) and a high resolution +master preview of each file is generated and stored in the persistent +cache. From the master previews in the persistent cache, volatile +previews are then created on demand for the different sizes requested +(thumbnail, page preview, full preview) and rotate interactively +in the details view. diff --git a/docs/topics/file_storage.rst b/docs/topics/file_storage.rst new file mode 100644 index 0000000000..054df85465 --- /dev/null +++ b/docs/topics/file_storage.rst @@ -0,0 +1,25 @@ +============ +File storage +============ + +The files are stored and placed under Mayan EDMS "control" to avoid +filename clashes (each file gets renamed to its UUID and with an extension) +and stored in a simple flat arrangement in a directory. This doesn't +stop access to the files but it is not recommended because moving, +renaming or updating the files directly would throw the database out +of sync. For access to the files the recommended way is to create and +index which would create a directory tree like structure in the database +and then turn on the index filesystem mirror options which would create +an actual directory tree and links to the actual stored files but using +the filename of the documents as stored in the database. This +filesystem mirror of the index can them be shared with Samba across the +network. This access would be read-only, and new versions of the files +would have to be uploaded from the web GUI using the new document +versioning support. + +Mayan's EDMS components are as decoupled from each other as possible, +storage in this case is very decoupled and its behavior is controlled +not by the project but by the Storage progamming class. Why this design? +All the other part don't make any assumptions about the actual file +storage, so that Mayan EDMS can work saving files locally, over the +network or even across the internet and still operate exactly the same. diff --git a/docs/topics/indexes.rst b/docs/topics/indexes.rst new file mode 100644 index 0000000000..ed4ab46e51 --- /dev/null +++ b/docs/topics/indexes.rst @@ -0,0 +1,12 @@ +======= +Indexes +======= + +Administrators first define the template of the index and an instance +of the index is then auto-populated with links to the documents depending +on the rules of each branch of the index evaluated againts the metadata +of the documents. The index cannot be edited manually, only changing +the rules or the metadata of the documents would cause the index to be +regenerated. For manual organization of documents there are the folders, +their structure is however flat, and they have to be manually updated and +curated. diff --git a/docs/topics/ocr.rst b/docs/topics/ocr.rst new file mode 100644 index 0000000000..ef17a2885c --- /dev/null +++ b/docs/topics/ocr.rst @@ -0,0 +1,19 @@ +=== +OCR +=== + +Because OCR is an intensive operation, documents are queued for OCR for +later handling, the amount of documents processed in parallel is +controlled by the ``OCR_NODE_CONCURRENT_EXECUTION`` configuration +option. Ideally the machine serving **Mayan EDMS** should disable OCR +processing by settings this options to 0, with other machines or cloud +instances then connected to the same database doing the OCR processing. +The document is checked to see if there are text parsers available, is +no parser is available for that file type then the document is passed +to tesseract page by page and the results stored per page, this is to +keep the page image in sync with the transcribed text. However when +viewing the document in the details tab all the pages text are +concatenated and shown to the user. Setting the ``OCR_AUTOMATIC_OCR`` +option to ``True`` would cause all newly uploaded documents to be +queued automatically for OCR. + diff --git a/docs/topics/smart_links.rst b/docs/topics/smart_links.rst new file mode 100644 index 0000000000..ea2559e965 --- /dev/null +++ b/docs/topics/smart_links.rst @@ -0,0 +1,9 @@ +=========== +Smart links +=========== + +Smart links are usefull for navigation between documents. They are rule +based but don't created any organizational structure just show the documents +that match the rules as evaluated against the metadata of currently +displayed document. The index is global, the smart links are dependant +on the current document the user is viewing.