t3as: Text Analytics as a Service NICTA logo The Lab logo

Manifesto

Welcome to the NICTA Text Analytics As A Service (t3as) project page.

We want to deliver an open ability in Text Analysis to end users via open source, web-based services. Text Analytics is a significant bottleneck for data analysis: analytics of unstructured text is needed as text data remains largely unused outside research projects.

Text Analytics is an artisanal (or cottage) industry that does not yet lend itself to engineered processes. Text analytics lacks the standardisation required to deploy technology solutions composed of off the shelf components.

We want to build a framework that delivers the benefits of open-source text analytics, whilst overcoming the barriers of open data analysis.

Blog

You can read more about our latest projects in our blog.

GitHub

All our open source projects are available from the NICTA GitHub space. Please contribute, we gladly accept pull requests for fixes and new features!

Projects

These are our published projects:

PDF Redaction

This provides automation for removing private or sensitive information from PDF documents in preparation for release in Freedom of Information responses or as Open Data.

Website: http://redact.t3as.org/
GitHub: https://github.com/NICTA/t3as-redact


NICTA Named Entity Recogniser

NICTA Named Entity Recogniser is a rule based Named Entity Recogniser which extracts named entities from text such as Organisation, Location, Person names, and Dates. It is written in Java.

Website: http://ner.t3as.org/
GitHub: https://github.com/NICTA/nicta-ner
Maven Central: org.t3as


Patent Classifications web services

This project contains two web services and associated functionality for doing Patent Classification Search and Lookups of the CPC, IPC, and USPC patent classification systems.

Website: http://pat-clas.t3as.org/
GitHub: https://github.com/NICTA/t3as-pat-clas


SNOMED CT Text Analyser

A public web service (and a simple front-end) that can analyse English clinical text and report any SNOMED CT concepts that can be detected. The web service has been made very simple to use so that it will be as easy as possible to integrate into third party software. When analysing the text the web service makes use of NLM MetaMap and NLM UMLS.

Website: http://snomedct.t3as.org/
GitHub: https://github.com/NICTA/t3as-snomedct-service
Maven Central: org.t3as


Term Vector Visualization

Some related code that does Term Vector Visualization of documents. This includes a modified version of Barnes-Hut-SNE from http://homepage.tudelft.nl/19j49/t-SNE.html, as well as some code to process and display the results.

GitHub: https://github.com/NICTA/termvect-viz