Documentation updates

2012-02-01 14:41:45 -04:00
parent a90896e774
commit eff399612d
7 changed files with 238 additions and 20 deletions
--- a/docs/topics/document_visualization.rst
+++ b/docs/topics/document_visualization.rst
@@ -0,0 +1,21 @@
+======================
+Document visualization
+======================
+
+
+Mayan EDMS tries to avoid having users to download a document and leave
+Mayan EDMS to be able to see them, so in essence making Mayan EDMS a
+visualization tool too.  The conversion backend is a stack of functions,
+first the mimetype is evaluated, if it is an office document it is passed
+to libreoffice working in headless mode (and managed by supervisor)
+via unoconv for conversion to PDF.  The PDF is stored in a temporary
+cache along side all the other files that were not office documents,
+from here they are inspected to determine the page count and the
+corresponding blank database entires are created.  After the database
+update they all go to the conversion driver specified by the user
+(``python``, ``graphicsmagick``, imagemagick``) and a high resolution
+master preview of each file is generated and stored in the persistent
+cache.  From the master previews in the persistent cache, volatile
+previews are then created on demand for the different sizes requested
+(thumbnail, page preview, full preview) and rotate interactively
+in the details view.
--- a/docs/topics/file_storage.rst
+++ b/docs/topics/file_storage.rst
@@ -0,0 +1,25 @@
+============
+File storage
+============
+
+The files are stored and placed under Mayan EDMS "control" to avoid
+filename clashes (each file gets renamed to its UUID and with an extension)
+and stored in a simple flat arrangement in a directory.  This doesn't
+stop access to the files but it is not recommended because moving,
+renaming or updating the files directly would throw the database out
+of sync.  For access to the files the recommended way is to create and
+index which would create a directory tree like structure in the database
+and then turn on the index filesystem mirror options which would create
+an actual directory tree and links to the actual stored files but using
+the filename of the documents as stored in the database.  This
+filesystem mirror of the index can them be shared with Samba across the
+network.  This access would be read-only, and new versions of the files
+would have to be uploaded from the web GUI using the new document
+versioning support.
+
+Mayan's EDMS components are as decoupled from each other as possible,
+storage in this case is very decoupled and its behavior is controlled
+not by the project but by the Storage progamming class.  Why this design?
+All the other part don't make any assumptions about the actual file
+storage, so that Mayan EDMS can work saving files locally, over the
+network or even across the internet and still operate exactly the same.
--- a/docs/topics/indexes.rst
+++ b/docs/topics/indexes.rst
@@ -0,0 +1,12 @@
+=======
+Indexes
+=======
+
+Administrators first define the template of the index and an instance
+of the index is then auto-populated with links to the documents depending
+on the rules of each branch of the index evaluated againts the metadata
+of the documents.  The index cannot be edited manually, only changing
+the rules or the metadata of the documents would cause the index to be
+regenerated.  For manual organization of documents there are the folders,
+their structure is however flat, and they have to be manually updated and
+curated.
--- a/docs/topics/ocr.rst
+++ b/docs/topics/ocr.rst
@@ -0,0 +1,19 @@
+===
+OCR
+===
+
+Because OCR is an intensive operation, documents are queued for OCR for
+later handling, the amount of documents processed in parallel is
+controlled by the ``OCR_NODE_CONCURRENT_EXECUTION`` configuration
+option.  Ideally the machine serving **Mayan EDMS** should disable OCR 
+processing by settings this options to 0, with other machines or cloud
+instances then connected to the same database doing the OCR processing.
+The document is checked to see if there are text parsers available, is
+no parser is available for that file type then the document is passed
+to tesseract page by page and the results stored per page, this is to
+keep the page image in sync with the transcribed text.  However when
+viewing the document in the details tab all the pages text are
+concatenated and shown to the user. Setting the ``OCR_AUTOMATIC_OCR``
+option to ``True`` would cause all newly uploaded documents to be
+queued automatically for OCR.
+ 
--- a/docs/topics/smart_links.rst
+++ b/docs/topics/smart_links.rst
@@ -0,0 +1,9 @@
+===========
+Smart links
+===========
+
+Smart links are usefull for navigation between documents.  They are rule
+based but don't created any organizational structure just show the documents
+that match the rules as evaluated against the metadata of currently
+displayed document.  The index is global, the smart links are dependant
+on the current document the user is viewing.