Education





Problem

In the past few years, the use of smart speakers has risen dramatically. We have been using these devices for various purposes such as asking a question, streaming music, setting up alarms, checking the weather, and so on. In many cases, these devices are flawless at performing simple tasks, but struggle with more complex requests. My research interest is to compare human intelligence and artificial intelligence for smart speaker and, how we can integrate both to get the best output.



Motivation

In the earlier literature the smart speaker has been extensively tested in both personal and public environments. In the case of former, Human powered and Machine powered were placed in different locations in Dharavi. My motivation for this project is based on the same, while performing qualitative interviews with shopkeepers for [5], I certainly felt that HPD and MPI had a potential for emergent users. But, each of them had its pros and cons, like, HPD was better at understanding the context and gave elaborative response, but the users didn’t appreciate the 10 minute of waiting time. Although MPI was instant in answering, but it was unable to answer the complex and local questions which were relatively simple.

Also, none of the study conducted in India, has performed a longitudinal evaluation with a CA. My aim in this project is to explore the idea if both these systems are combined and deployed at domestic settings of emergent users. Further, understanding the potential uses cases for both the systems. Lastly, it will help in uncovering how good are this smart speaker at understanding Indian languages, dialects and accents.

Following are the research question this project could answer,
I. Given a choice, which device would a user prefer?
II. Do HPD and MPI get the same questions as compared to the previous study?
III. If users are not satisfied with the answers, do they switch to the other device? If yes then what are these cases and which types of questions are this?
IV. After a particular usage does the user develop a preference for a system? On what parameters this preference is based on?
V. How this smart speaker can be useful for emergent users?


Protoypes



Google Powered




The prototype shown in image(left) is built using google voice kit version 2. The kit consists of Raspberry Pi and voiceHAT and it is powered by Google Assistant.
The interaction of the prototype was as follows
1. User press the button—lights blink which is serves as a cue to the user, device is listening to the question.
2. User ask a question.
4. After detecting the silence, the prototype process the question.
3. The prototype delivers the answer or an error.



I decided to use Human-powered delay(HPD) version from our earlier study(Streetwise). Although, the HPD was developed for the public places, the advantage of this prototype is the user can ask multiple questions at a time, and there is no restriction of retrieving the answers before asking the next question. For Developing the HPD prototype, I replicated the previous HPD prototype built and assembled it. The steps of interaction of this prototype is shown in the image below



Human Powered
Steps for interaction

Method

I decided to deploy each system discretely for the period of 6 days and then both together for the same period. This method posed an advantage over the previous since each system would be deployed separately, the user will get a chance to explore each and learn the pros and cons of it. Further, this method could avoid the chances of developing the inclination towards one system without sufficiently using the other. Using this method, I deployed the systems in two rounds.
Round 1 - Deployment of each prototype individually for 6 days.
Round 2 – Deployment of both the prototypes together.
First the Google powered prototype was deployed, followed by human powered. Later both of them would be deployed together. After completing the deployment of each prototype, the qualitative questions mentioned in appendix A were asked.


In progress

Quantitative and Qualitative Analysis
Final Prototype
Main Study