Difference between revisions of "Technology at Shakerpedia"

From Shaker Pedia

 
Line 9: Line 9:
  
 
In addition, various journals and research collections have been digitized and are displayed in a 'table of contents' format, and are searchable using Elasticsearch which shows search summaries and points back to the TOC entries.
 
In addition, various journals and research collections have been digitized and are displayed in a 'table of contents' format, and are searchable using Elasticsearch which shows search summaries and points back to the TOC entries.
 +
These can all be found at: [http://memoirs.shakerpedia.com/collections/ Collections]
  
 
The various data sources are processed into database and search engine loading format, largely, using the perl scripting languages.
 
The various data sources are processed into database and search engine loading format, largely, using the perl scripting languages.

Latest revision as of 16:28, 1 September 2018

Shakerpedia is built using MediaWiki, the same platform used by Wikipedia.

It is hosted on a linux server, using mariadb (the latest incarnation of mysql)
Memoirs is custom coded using the CodeIgniter PHP framework.

There is also an Elasticsearch collection hosted on an Amazon cloud instance.

The real core of the Memoirs project is some 16 plus (as of 9/1/2018) data tables, describing individuals, derived from many sources. Sources such as state and Federal census records, individual researchers, Western Reserve records, and the Shaker Manifesto. The tables are merged into a summary table, by the person's name, village, and birth or death dates. This allows having a common search page.

In addition, various journals and research collections have been digitized and are displayed in a 'table of contents' format, and are searchable using Elasticsearch which shows search summaries and points back to the TOC entries. These can all be found at: Collections

The various data sources are processed into database and search engine loading format, largely, using the perl scripting languages. Input sources vary from simple text, HTML pages. spreadsheets (mostly excel), PDF documents, images of text. Printed documents are scanned using a Multi-function scanner/printer or Epson GT-S50 sheet scanner and AABBY OCR software. Various linux open source software, such as ImageMagick, gs, pdftk and tesseract, are used to convert raw images into text.