Getting Started

In Consilience a registered user can quickly read, understand, categorize, and derive insights from huge quantities of unstructured text.

image1

Document Preparation

  • Each Document should be in a txt file format. Documents must be ulpoaded in a zip or tar archive and each set must contain at least ten documents.
  • Metadata file must be in a CSV format, at minimum, it must contain a column header named “Filename”, and it should not be included in the Documents but uploaded separately.

Sample of a Document file:

The translation of citizen votes into legislative seats is of central importance in democratic electoral systems. It has been a longstanding
concern among scholars in political science and in numerous other disciplines.

Sample of a Metadata file:

Date        Category        Filename
26-Jun-06   20Energy        26Jun202006Lautenberg20Energy.txt
30-Nov-04   153Environment  30Nov202004Lautenberg153Environment.txt

Add Document Set

A Document Set is collection of unstructured text organized in text files and file metadata.

  • Upload Documents:
    • It is required that docuemtns are put in a zip or tar archive before upload and that there are a minimum of ten documents included in the archive. Click on Upload Documents button to select a zip or tar file in order to upload the documents. There can be multiple uploads for one documents set.
  • Upload Metadata:
    • Click on Upload Metadata button to upload the CSV file containing document metadata. Upon upload the metadata file is validated and it is optional.
  • Enter Set Name:
    • Enter full name by which document set is known. This is a requiered field.
  • Description:
    • A summary describing the purpose, nature, and scope of the document set.
  • Collection:
    • A group of document sets put together in order to study them.

Click Create Document Set to complete this step.

Compute Clustering Map

  • Stop words:
    • A file containing words which are filtered out of the data processing. Stop Words (default selection SMART) or you can upload a custom File
  • Number of Clusters:
    • A number or range of clusters for processing.
  • Term Filter:
    • Percentage range of documents containing a term. This field is validated and the “Value must be between Min Doc % to Max. Doc %”
  • Stemmer Language:
    • Language used for stemming algorithm and the default language is English.

Data Exploration

Start New Workspace or explore the following areas.

  • Clustering Space:
    • A New or existing Workspace presents an initial clustering and four options of alternate clusterings to view, respectively red and gray dots in the Clustering Space.
    • Click on different areas of the Clustering Space to view additional clusterings.
  • Document set Cluster of Clusterings Table:
    • For each clustering there is a table of clusters in that clustering.
    • Click the drop down arrow next to # Documents to explore individual clusters and to view documents listed for that cluster and preview individual documents.
    • Click on toggle icon to have a better view of the table.
    • Choose to rename individual clusterings or the entire workspace.
  • Clustering History:
    • Exploring the Clustering Space records and lists a history of clusterings and clicking on one of the history items highlights corresponding clustering in the Clustering Space.

By selecting “Show All User Workspaces” checkbox, all public document sets and workspaces will be displayed.