The OPTIMA Action’s scientific and technical support is underpinned by the following R+D activities:
- Internet/Web Mining: a huge amount of information is harvested from the Internet. However, the way data is being published on the Internet is constantly changing, and this means that the techniques and tools necessary to harvest this data are also subject to constant change. As the Internet moves towards Web 3.0 the level of interaction between the client (in this case the harvesting software) and the server (where the data is located) is becoming increasingly interactive and the data can no longer be guaranteed to be available as a static resource like an HTML page. Furthermore, the increasingly fast growth of multi-lingual information in open sources and the increasing complexity of the variety of open sources (media, blogs, discussion forums, chat rooms, twitter…), implies that we increasingly need different and novel techniques. OPTIMA’s R+D focuses on developing advanced techniques for automatically mining, extracting, analysing and monitoring relevant information and knowledge from traditional and emerging multi-lingual open-sources.
- Computational Linguistics: The OPTIMA Action applies all of its developed solutions in a multi-lingual environment. This poses extra challenges to any of the problems OPTIMA attempts to solve. The algorithms and techniques used for the recognition of keywords, entities and place names have to be devised by OPTIMA in such a way that they are as much as possible language ‘agnostic’. When however there is an inescapable need for semantic analysis, as it is the case for event extraction, the algorithms developed by OPTIMA have to guarantee a minimal dependency on language specific resources. OPTIMA’s other research activities include opinion mining/sentiment detection, machine translation, and text summarisation.
- Computer Science: for any of the solutions devised to be effective, the information derived has to be available in a timely manner. Especially in the field of media monitoring, threat detection, and crisis situation awareness, it is mandatory that the required analysis is carried out immediately after the underlying information is published. Given the amount of information published and processed this means that solutions developed by OPTIMA are inherently capable of dealing with large data quantities in near real-time. This leads to the use by OPTIMA of fundamental computer science concepts not normally applied in this context.