Topedia will allow users to train the web to semantically identify related documents using plain English phraseology. The system implements several Natural Language Processing algorithms to store, relate and retrieve documents based on semantic similarity.

1.     Database Analysis and Indexing (DAI)

  1. Topedia extends beyond mere statistical based keyword searches and employs linguistic theories and algorithms to synthesise meaning from documents through detailed language analysis and deep knowledge representation. To this end, document databases are firstly analysed and indexed by the system. The DAI process consists of three modules. The most complex of these is the DatabaseAnalyser which employs Natural Language Processing to identify subject matter, lexical markers, semantic markers. A ‘fingerprint’ of the document is then built by the DatabaseIndexer. The DatabaseIndexer then employs several complex NLP algorithms to identify relationships between documents and code these relationships in an object tree. Finally the DatabaseIndexer performs an object-relational mapping of the semantic relationship between documents and persists them to a readily available database store.DatabaseReader
    1. Takes a feed of documents (journal articles, other academic works) from an academic document repository such as DSpace.
    2. Takes in documents in OpenOffice, HTML, XML & PDF formats. Can be extended to read other formats.
    3. Import documents from a local directory
    4. Upload documents via web interface
    5. DatabaseAnalyser
      1. Tokenise document
      2. Establish token scores
      3. Allow the user to weight token scores
      4. Implement sentence understanding
      5. Perform probabilistic parsing and tagging
      6. Perform word sense disambiguation
      7. DatabaseStorage
        1. Perform object relational mapping of documents to the database via a data abstraction layer that can handle semantic tagging.

2.     Document Query Interface (DQI)

The Document Query Interface allows a user to retrieve related documents from Topedia via a Natural Language search query. The QueryParser employs computational linguistics to perform analysis on the query and determine semantic links between documents. The Query profiler implements Computer-Aided Language Learning through historical analysis of queries to determine trends. The heart of the DQI is the QueryEngine which traverses the semantic object tree stored in the database and uses entailment and contradiction logic to answer actual human language questions instead of providing documents that match keywords.

  1. QueryParser
    1. Performs Natural Language Processing on the query to identify semantic information. Longer queries will provide more contextual data and enable the QueryEngine to provide more semantically relevant results.
    2. Takes input via a web interface and XML query format
    3. Pluggable grammar rules engine allows more accurate parsing analysis.
    4. QueryProfiler
      1. Responsible for recording historical queries and analyzing trends in querying.
      2. Allows query analysis comparisons between users
      3. The most common and the most useful queries can be profiles
      4. QueryEngine
        1. Performs ORM from database to semantically coded object tree.
        2. Enables searches across the object tree.
        3. Takes into account context, semantic, lexical and user weightings.
        4. Provides relevant documents in a side preview pane

Common uses of Topedia

  1. Plagiarism Detection

Academic Journals (online and scanned), Primary Source Documents, Secondary Source Documents, Academic Book Reviews can be fed into the system and analysed. Student work can then be semantically assessed in Topedia to detect overt plagiarism.

An interface to the academic institution’s online submission process (eg. Moodle) can detect plagiarism and notify the examining body for follow up action automatically.

  1. Bibliographic Data Generation

Students can use Topedia to generate an initial bibliography for assignments.

  1. Content Writing For Academic Research

Initial key documents can be sourced within the Topedia website document platform for academic research based on semantic relevance.
Researchers as they type will have suggested resources appear in the right pane of the platform.
Any selected text will appear in the document as a quote and references will be annotated in the footer of the document.

NLP Performance Objectives

NLP Performance Objectives

Tags: , , ,

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>