Documentation updates
This commit is contained in:
21
docs/topics/document_visualization.rst
Normal file
21
docs/topics/document_visualization.rst
Normal file
@@ -0,0 +1,21 @@
|
||||
======================
|
||||
Document visualization
|
||||
======================
|
||||
|
||||
|
||||
Mayan EDMS tries to avoid having users to download a document and leave
|
||||
Mayan EDMS to be able to see them, so in essence making Mayan EDMS a
|
||||
visualization tool too. The conversion backend is a stack of functions,
|
||||
first the mimetype is evaluated, if it is an office document it is passed
|
||||
to libreoffice working in headless mode (and managed by supervisor)
|
||||
via unoconv for conversion to PDF. The PDF is stored in a temporary
|
||||
cache along side all the other files that were not office documents,
|
||||
from here they are inspected to determine the page count and the
|
||||
corresponding blank database entires are created. After the database
|
||||
update they all go to the conversion driver specified by the user
|
||||
(``python``, ``graphicsmagick``, imagemagick``) and a high resolution
|
||||
master preview of each file is generated and stored in the persistent
|
||||
cache. From the master previews in the persistent cache, volatile
|
||||
previews are then created on demand for the different sizes requested
|
||||
(thumbnail, page preview, full preview) and rotate interactively
|
||||
in the details view.
|
||||
25
docs/topics/file_storage.rst
Normal file
25
docs/topics/file_storage.rst
Normal file
@@ -0,0 +1,25 @@
|
||||
============
|
||||
File storage
|
||||
============
|
||||
|
||||
The files are stored and placed under Mayan EDMS "control" to avoid
|
||||
filename clashes (each file gets renamed to its UUID and with an extension)
|
||||
and stored in a simple flat arrangement in a directory. This doesn't
|
||||
stop access to the files but it is not recommended because moving,
|
||||
renaming or updating the files directly would throw the database out
|
||||
of sync. For access to the files the recommended way is to create and
|
||||
index which would create a directory tree like structure in the database
|
||||
and then turn on the index filesystem mirror options which would create
|
||||
an actual directory tree and links to the actual stored files but using
|
||||
the filename of the documents as stored in the database. This
|
||||
filesystem mirror of the index can them be shared with Samba across the
|
||||
network. This access would be read-only, and new versions of the files
|
||||
would have to be uploaded from the web GUI using the new document
|
||||
versioning support.
|
||||
|
||||
Mayan's EDMS components are as decoupled from each other as possible,
|
||||
storage in this case is very decoupled and its behavior is controlled
|
||||
not by the project but by the Storage progamming class. Why this design?
|
||||
All the other part don't make any assumptions about the actual file
|
||||
storage, so that Mayan EDMS can work saving files locally, over the
|
||||
network or even across the internet and still operate exactly the same.
|
||||
12
docs/topics/indexes.rst
Normal file
12
docs/topics/indexes.rst
Normal file
@@ -0,0 +1,12 @@
|
||||
=======
|
||||
Indexes
|
||||
=======
|
||||
|
||||
Administrators first define the template of the index and an instance
|
||||
of the index is then auto-populated with links to the documents depending
|
||||
on the rules of each branch of the index evaluated againts the metadata
|
||||
of the documents. The index cannot be edited manually, only changing
|
||||
the rules or the metadata of the documents would cause the index to be
|
||||
regenerated. For manual organization of documents there are the folders,
|
||||
their structure is however flat, and they have to be manually updated and
|
||||
curated.
|
||||
19
docs/topics/ocr.rst
Normal file
19
docs/topics/ocr.rst
Normal file
@@ -0,0 +1,19 @@
|
||||
===
|
||||
OCR
|
||||
===
|
||||
|
||||
Because OCR is an intensive operation, documents are queued for OCR for
|
||||
later handling, the amount of documents processed in parallel is
|
||||
controlled by the ``OCR_NODE_CONCURRENT_EXECUTION`` configuration
|
||||
option. Ideally the machine serving **Mayan EDMS** should disable OCR
|
||||
processing by settings this options to 0, with other machines or cloud
|
||||
instances then connected to the same database doing the OCR processing.
|
||||
The document is checked to see if there are text parsers available, is
|
||||
no parser is available for that file type then the document is passed
|
||||
to tesseract page by page and the results stored per page, this is to
|
||||
keep the page image in sync with the transcribed text. However when
|
||||
viewing the document in the details tab all the pages text are
|
||||
concatenated and shown to the user. Setting the ``OCR_AUTOMATIC_OCR``
|
||||
option to ``True`` would cause all newly uploaded documents to be
|
||||
queued automatically for OCR.
|
||||
|
||||
9
docs/topics/smart_links.rst
Normal file
9
docs/topics/smart_links.rst
Normal file
@@ -0,0 +1,9 @@
|
||||
===========
|
||||
Smart links
|
||||
===========
|
||||
|
||||
Smart links are usefull for navigation between documents. They are rule
|
||||
based but don't created any organizational structure just show the documents
|
||||
that match the rules as evaluated against the metadata of currently
|
||||
displayed document. The index is global, the smart links are dependant
|
||||
on the current document the user is viewing.
|
||||
Reference in New Issue
Block a user