Documentation updates

This commit is contained in:
Roberto Rosario
2012-02-01 14:41:45 -04:00
parent a90896e774
commit eff399612d
7 changed files with 238 additions and 20 deletions

View File

@@ -7,7 +7,19 @@ Frequently asked questions and solutions
Database related
----------------
Q: _mysql_exceptions.OperationalError: (1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='")
Q: PostgreSQL vs. MySQL
~~~~~~~~~~~~~~~~~~~~~~~
Since Django abstracts database operations from a functional point of view
**Mayan EDMS** will behave exactly the same either way. The only concern
would be that MySQL doesn't support transactions for schema modifying
commands. The only moment this could cause problems is when running
South migrations during upgrades, if a migration fails the database
structure is left in a transitory state and has to be reverted manually
before trying again.
Q: _mysql_exceptions. OperationalError: (1267, "Illegal mix of collations (latin1_swedish_ci, IMPLICIT) and (utf8_general_ci, COERCIBLE) for operation '='")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Solution::
@@ -66,8 +78,11 @@ Q: File system links not showing when serving content with ``Samba``
- Ref: 1- http://www.samba.org/samba/docs/man/manpages-3/smb.conf.5.html
Document handling
-----------------
How to store documents outside of **Mayan EDMS's** path
-------------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Sub class Django's ``FileSystemStorage`` class:
@@ -87,8 +102,8 @@ How to store documents outside of **Mayan EDMS's** path
DOCUMENTS_STORAGE_BACKEND = CustomStorage
How to enable the ``GridFS`` storage backend
--------------------------------------------
Q: How to enable the ``GridFS`` storage backend
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Solution:
@@ -99,9 +114,19 @@ How to enable the ``GridFS`` storage backend
- Filesystem metadata indexing will not work with this storage backend as
the files are inside a ``MongoDB`` database and can't be linked (at least for now)
Q: How do you upload a new version of an existing file?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Site search is slow
-------------------
* Solution:
- Choose a document, and go to the versions tab, on the right menu at
the bottom under ``Other available action`` there is
``Upload new version``. Clicking it will take you to a very similar
view as the ``Upload new document`` but you will be able to specify
version number and comments for the new version being uploaded.
Q: Site search is slow
~~~~~~~~~~~~~~~~~~~
* Add indexes to the following fields:
@@ -109,8 +134,12 @@ Site search is slow
- ``documents_documentpage`` - content, recommended size: 3000
How to enable x-sendile support for ``Apache``
----------------------------------------------
Webserver
---------
Q: How to enable x-sendile support for ``Apache``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* If using Ubuntu execute the following::
$ sudo apt-get install libapache2-mod-xsendfile
@@ -125,11 +154,103 @@ How to enable x-sendile support for ``Apache``
XSendFileAllowAbove on
The included version of ``unoconv`` in my distribution is too old
-------------------------------------------------------------
OCR
---
Q: The included version of ``unoconv`` in my distribution is too old
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Only the file 'unoconv' file from https://github.com/dagwieers/unoconv is needed.
Put it in a user designated directory for binaries such as /usr/local/bin and
setup Mayan's configuration option in your settings_local.py file like this::
CONVERTER_UNOCONV_PATH = '/usr/local/bin/unoconv'
Deployments
-----------
Q: Is virtualenv required as specified in the documentation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* It is not necessary, it's just a strong recommendation mainly to reduce
dependency conflicts by isolation from the main Python system install.
If not using a virtualenv, pip would install Mayan's dependencies
globally coming in conflict with the distribution's prepackaged Python
libraries messing other Django projects or Python programs, or another
later Python/Django project dependencies coming into conflict causing
Mayan to stop working for no apparent reason.
Q: Mayan EDMS installed correctly and works, but static files are not served
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Django's development server doesn't serve static files unless the ``DEBUG``
option is set to ``True``, this mode of operation should only be used for
development or testing. For production deployments the management command::
$ ./manage.py collectstatic
should be used and the resulting ``static`` folder served from a webserver.
For more information, read https://docs.djangoproject.com/en/dev/howto/static-files/
and https://docs.djangoproject.com/en/1.2/howto/static-files/ or
http://mayan-edms-ru.blogspot.com/2011/11/blog-post_09.html
Other
-----
Q: How to connect Mayan EDMS to an Active Directory tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I used these two libraries as they seemed the most maintained from the quick search I did.
* http://www.python-ldap.org/
* http://packages.python.org/django-auth-ldap/
After figuring out the corresponding OU, CN and such (which took quite a while since I'm not well versed in LDAP). For configuration options, Mayan EDMS imports settings_local.py after importing settings.py to allow users to override the defaults without modifying any file tracked by Git, this makes upgrading by using Git's pull command extremely easy. My settings_local.py file is as follows:
::
import ldap
from django_auth_ldap.config import LDAPSearch
# makes sure this works in Active Directory
ldap.set_option(ldap.OPT_REFERRALS, 0)
AUTH_LDAP_SERVER_URI = "ldap://172.16.XX.XX:389"
AUTH_LDAP_BIND_DN = 'cn=Roberto Rosario Gonzalez,ou=Aguadilla,ou=XX,ou=XX,dc=XX,dc=XX,dc=XX'
AUTH_LDAP_BIND_PASSWORD = 'XXXXXXXXXXXXXX'
AUTH_LDAP_USER_SEARCH = LDAPSearch('dc=XX,dc=XX,dc=XX', ldap.SCOPE_SUBTREE, '(SAMAccountName=%(user)s)')
# Populate the Django user from the LDAP directory.
AUTH_LDAP_USER_ATTR_MAP = {
"first_name": "givenName",
"last_name": "sn",
"email": "mail"
}
# This is the default, but I like to be explicit.
AUTH_LDAP_ALWAYS_UPDATE_USER = True
AUTHENTICATION_BACKENDS = (
'django_auth_ldap.backend.LDAPBackend',
'django.contrib.auth.backends.ModelBackend',
)
if your organization policies don't allow anonymous directory queries,
create a dummy account and set the ``AUTH_LDAP_BIND_DN`` and
``AUTH_LDAP_BIND_PASSWORD`` options to match the account.
For a more advanced example check this StackOverflow question:
http://stackoverflow.com/questions/6493985/django-auth-ldap
Q: Can you change the display order of documents...i.e can they be in alphabetical order?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A the moment no, but it is something being considered.

View File

@@ -5,20 +5,26 @@
Mayan EDMS documentation
========================
.. rubric:: Open source, Django_ based document manager with custom metadata indexing, file serving integration, OCR_ capabilities, document versioning and digital signature verification.
.. rubric:: `Open source`_, Django_ based document manager with custom
metadata_ indexing_, file serving integration, OCR_ capabilities,
document versioning_ and `digital signature verification`_.
.. _Django: http://www.djangoproject.com/
.. _OCR: https://secure.wikimedia.org/wikipedia/en/wiki/Optical_character_recognition
.. _digital signature verification: http://en.wikipedia.org/wiki/Digital_signature
.. _versioning: http://en.wikipedia.org/wiki/Versioning
.. _metadata: http://en.wikipedia.org/wiki/Metadata
.. _indexing: http://en.wikipedia.org/wiki/Index_card
.. _Open source: http://en.wikipedia.org/wiki/Open_source
=================
Links of interest
=================
:Website: http://www.mayan-edms.com
:Source: http://github.com/rosarior/mayan
:Video: http://bit.ly/pADNXv
:Issue tracker: http://github.com/rosarior/mayan/issues
:Mailing list: http://groups.google.com/group/mayan-edms
* Website: http://www.mayan-edms.com
* Source: http://github.com/rosarior/mayan
* Video: http://bit.ly/pADNXv
* Issue tracker: http://github.com/rosarior/mayan/issues
* Mailing list: http://groups.google.com/group/mayan-edms
First steps
@@ -33,7 +39,12 @@ First steps
Understanding Mayan EDMS
========================
:doc:`Transformations <topics/transformations>`
:doc:`Transformations <topics/transformations>` |
:doc:`Indexes <topics/indexes>` |
:doc:`Smart links <topics/smart_links>` |
:doc:`Document visualization <topics/document_visualization>` |
:doc:`OCR <topics/ocr>` |
:doc:`File storage <topics/file_storage>`
Between versions
@@ -61,7 +72,7 @@ Credits
:doc:`Contributors <topics/contributors>` |
:doc:`Software used <topics/software_used>` |
:doc:`Lincensing <license>`
:doc:`Licensing <license>`
Getting help

View File

@@ -0,0 +1,21 @@
======================
Document visualization
======================
Mayan EDMS tries to avoid having users to download a document and leave
Mayan EDMS to be able to see them, so in essence making Mayan EDMS a
visualization tool too. The conversion backend is a stack of functions,
first the mimetype is evaluated, if it is an office document it is passed
to libreoffice working in headless mode (and managed by supervisor)
via unoconv for conversion to PDF. The PDF is stored in a temporary
cache along side all the other files that were not office documents,
from here they are inspected to determine the page count and the
corresponding blank database entires are created. After the database
update they all go to the conversion driver specified by the user
(``python``, ``graphicsmagick``, imagemagick``) and a high resolution
master preview of each file is generated and stored in the persistent
cache. From the master previews in the persistent cache, volatile
previews are then created on demand for the different sizes requested
(thumbnail, page preview, full preview) and rotate interactively
in the details view.

View File

@@ -0,0 +1,25 @@
============
File storage
============
The files are stored and placed under Mayan EDMS "control" to avoid
filename clashes (each file gets renamed to its UUID and with an extension)
and stored in a simple flat arrangement in a directory. This doesn't
stop access to the files but it is not recommended because moving,
renaming or updating the files directly would throw the database out
of sync. For access to the files the recommended way is to create and
index which would create a directory tree like structure in the database
and then turn on the index filesystem mirror options which would create
an actual directory tree and links to the actual stored files but using
the filename of the documents as stored in the database. This
filesystem mirror of the index can them be shared with Samba across the
network. This access would be read-only, and new versions of the files
would have to be uploaded from the web GUI using the new document
versioning support.
Mayan's EDMS components are as decoupled from each other as possible,
storage in this case is very decoupled and its behavior is controlled
not by the project but by the Storage progamming class. Why this design?
All the other part don't make any assumptions about the actual file
storage, so that Mayan EDMS can work saving files locally, over the
network or even across the internet and still operate exactly the same.

12
docs/topics/indexes.rst Normal file
View File

@@ -0,0 +1,12 @@
=======
Indexes
=======
Administrators first define the template of the index and an instance
of the index is then auto-populated with links to the documents depending
on the rules of each branch of the index evaluated againts the metadata
of the documents. The index cannot be edited manually, only changing
the rules or the metadata of the documents would cause the index to be
regenerated. For manual organization of documents there are the folders,
their structure is however flat, and they have to be manually updated and
curated.

19
docs/topics/ocr.rst Normal file
View File

@@ -0,0 +1,19 @@
===
OCR
===
Because OCR is an intensive operation, documents are queued for OCR for
later handling, the amount of documents processed in parallel is
controlled by the ``OCR_NODE_CONCURRENT_EXECUTION`` configuration
option. Ideally the machine serving **Mayan EDMS** should disable OCR
processing by settings this options to 0, with other machines or cloud
instances then connected to the same database doing the OCR processing.
The document is checked to see if there are text parsers available, is
no parser is available for that file type then the document is passed
to tesseract page by page and the results stored per page, this is to
keep the page image in sync with the transcribed text. However when
viewing the document in the details tab all the pages text are
concatenated and shown to the user. Setting the ``OCR_AUTOMATIC_OCR``
option to ``True`` would cause all newly uploaded documents to be
queued automatically for OCR.

View File

@@ -0,0 +1,9 @@
===========
Smart links
===========
Smart links are usefull for navigation between documents. They are rule
based but don't created any organizational structure just show the documents
that match the rules as evaluated against the metadata of currently
displayed document. The index is global, the smart links are dependant
on the current document the user is viewing.