Which hard- and software is required to develop the Historic Voicebot?

Creating a interactive installation with a voicebot requires a number of different components.

The technical architecture of the Historic Voicebot was created based on the functional analysis, which can be found in the appendix.

The Historic Voicebot consists of three main parts: the interactive installation, the frontend and the backend.

Architecture of the Historic Voicebot
Architecture of the Historic Voicebot

Interactive Installation

Raspberry Pi – Google AIY Voicekit

A crucial aspect of the interactive installation is that it can capture speech, so visitors can ask questions. It also needs to be able to reply, by using a speaker.

A very practical and easy setup to achieve this is the Google AIY Voicekit. This is a DIY kit to play around with voice-controlled AI, containing a Raspberry Pi, a microphone and a speaker.

I’ve chosen the AIY Voicekit, because it’s easy to use and it can be integrated into almost any type of physical installation. On top is this, it’s also a very affordable solution.

The Raspberry Pi will capture speech with the microphone, turn this into text and send it to the server in the backend. The answer returned by the server will be turned into speech and announced via the speaker attached to the Raspberry Pi.

The microphone and the speaker of the AIY Voicekit will be integrated into a vintage telephone, creating an interesting physical object that will catch the eye of museum visitors.


The second aspect of the interactive installation a touchscreen. Museum visitors can use this screen to type their own questions or select a frequently asked one, to start a conversation with the historical figure.

This screen also shows the output of the Voicebot as text (subtitles), as well as an animated figure that represents the historic person.  



The main part of the Historic Voicebot, the brains of the operation, will be provided by the Node.js server. The server orchestrates everything, and is connected to Dialogflow, the frontend graphics, the fact extraction software, the database and a management console.


Dialogflow is Google’s conversational agent software, which perfectly integrates with the AIY Voicekit, which is one of the reasons why I’ve chosen it. It also offers a lot of practical features, including making smalltalk and webhooks.

Dialogflow will get the text input from both the touchscreen and the microphone from the server, and return the intent of the user. The server will then match this to the facts in the database, and return an answer. This answer will be sent to both the Raspberry Pi for voice output, and to the HTML canvas frontend for graphical output.


Another important puzzle piece is fact-extraction software, to turn texts about historical figures into facts that can be used to answer questions.

There are a number of different software packages available, and I would like to test a number of these to see which one works best.  


The database of the Historic Voicebot will contain two main sets of data. First of all, it will store the facts extracted via the fact extractor about the historical person.

Secondly, it will also contain data about the museum itself, for instance a number of interesting facts and the location of items and rooms within the museum.

This database will be connected to Node.js backend so the facts can be used to create responses to the input of visitors.



The frontend consists of a HTML Canvas page that shows the output of the Voicebot, via an animated version of the historic person with subtitles.

Visitors can also use this screen to type their own question or select a frequently asked one, to start a conversation with the historical figure.

Management Console

The management console is a simple graphical user interface (GUI) for the administrator of the Historic Voicebot to access and manage certain functions. This could for instance include access to the database and an overview of the most recent interactions with the voicebot.