Hey Mediasphere! Show me insects!

Jens Dobberthin

By Jens Dobberthin
29.01.2019 | 10 minutes reading time

Wouldn’t it be great if our research portal obeyed to our words? Our intern Douglas gave it a try.

What does it take?

Microphone

Here we have decided to use ReSpeaker Core v2.0 . It’s actually more than just a microphone, because it’s a small mini-computer that is specifically suited for the development of voice-controlled assistance systems. On the circuit board (shown in the video below right), 6 microphones are built into a ring. Thus, the direction from which the voice originates can be recognized. In addition, special algorithms work in the background to filter out speech even in loud environments.

Software for speech recognition

Our commands must be understood somehow. For example, the command ‘Show insects!’ should result in the research portal showing insects. For the realization, we use Zamia Speech , a collection of tools for automatic and cloudless speech recognition.

How does it all work?

The small minicomputer converts the audio signals recorded by the microphones into a continuous data stream and sends it to the laptop. There, the speech recognition software runs in the background, which evaluates the data stream and translates it into text. This text can then be searched for key phrases such as ‘Show insects’ or ‘Show snails’. Finally, the research portal is instructed to show insects or snails.

The test

In the following video Douglas tests his hack. With a little patience and persistence (the laptop for speech processing is not very fast), insects and snails are finally displayed.

Douglas tests the speech-recognition, Lizenz: CC-BY-SA

What happens next?

The little hack already shows impressively where the journey can go. But it would be even better if we could search for the scientific names of the individual species. ‘Hey Mediasphere! Show Diponthus dispar !’ or ‘Hey Mediasphere! Show Asperitas notabilis sounds great, doesn’t it?

But therefore a special language corpus is needed, which contains the individual names of the species. Maybe this could be created by the community similar to the Common Voice project of Mozilla . Based on this data, a language model would have to be trained, which can recognize the corresponding requests. As a freely available model, it is then finally available for various applications, e.g. for the realization of a voice-controlled augmented reality application.

So there is a lot of potential! Again, many thanks to Douglas for the little hack.

P.S. In the background our 3D-printer is humming and prints a suitable case for the little minicomputer.