Use your computer with Voice and Joy-Con
Joy-Con ButtonVideo Demo
Joy-Con StickVideo Demo
For those who want to:
- Avoid maintaining the typing posture.
- Play PC game with motion control.
How it works
1. The Mode concept
The idea comes from Vim, it can be configured to have multiple modes: normal, speech, motion control. For example, hold down the shoulder button ZR to enter gyro mode and you can rotate it to move the mouse cursor, release it to get back to normal mode.
The Joy-Con has a very limited number of buttons, but a button can trigger different actions in different modes. As there are more than 20 buttons on both side, 20 buttons * 20 modes, in theory it’s 400 buttons for different actions. Just bind some most used actions like , in the default mode, maybe even bind the 26 alphabets within 2 modes so you can do button typing without using voice at all.
2. Word Mapping
Speech words can be mapped to different actions like:
twenty twenty two->
snake hello world->
camel hello world->
control alt delete->
task explorer launched
- Run shell script:
brave-browser --no-sandbox www.test.com
3. Limit the dictionary for better accuracy
VOSK is used as the backend recognition engine, with the phrase_list parameter, it’s possible to use a small dictionary, for instance, when the dictionary is limited to alphabet
cwill never be recognized as
For programming, switch to a speech mode with limited dictionary for typing keywords, punctuation, numbers. And use another mode with unlimited dictionary for variables and comments, once found a conflict, solve it by adding a word mapping.
1. Connect Joy-Con to PC via Bluetooth
- From the system bluetooth manager, click
scan new device
- Hold the sync button for 2 seconds, until the lights start blinking.
- Find -> Pair -> Connect it in the BT manager.
2. Install Docker
Windows Download and run the installer from https://docs.docker.com/desktop/install/windows-install/. If it prompt something like
download and install WSL 2 Linux kernel upgrade package, just install it.
apt install docker.io
Mac I have zero experience with Mac, also there is no prebuilt binary for Mac, but the steps should be similar to Windows/Linux, all the dependency packages claim to support Mac, check the next section to build it from souce.
3. Run VOSK server
docker run -d -p 2701:2701 aj3423/vosk_lgraph:latest
4. Download the prebuilt binary from release page
Build from Source
- install dependency
- Windows: Download the release file from https://github.com/libusb/hidapi/releases, which contains header, lib and dll.
apt install libhidapi-dev
- Install golang from https://go.dev/dl/
- Clone this repo:
git clone github.com/aj3423/joy-typing
- Go to ‘main’ directory:
go build .
- Keeps disconnecting right after connected
The Joy-Con can ONLY be paired to one device at a time, once you attach it back to the switch console for charging, it’s auto re-paired to the console, you’ll have to remove it in the system BT Manager and re-pair it again. I tried some hacky way like attach it to the console during the shutting down or powering up, to get it being charged but not re-paired, I succeeded only once by accident but can’t remember how, ended up using some dedicated charging cable.
- No sound input or inaccurate recognition
Diagnose with this tool: vosk-sound-test
It’s capable of playing back your voice and saving to a .wav file, so you can verify if the sound quality is expected.
docker run -d -p 2701:2701 aj3423/vosk_lgraph:latest
2) Download the binary from the release page
./vosk-sound-test -host "127.0.0.1:2701"
4) Say something, press enter to playback, press enter again to save to a .wav file.
- Unexpected behavior of Joy-Con
When press both side buttons(
SR), Joy-Con enters an interesting mode, maybe it’s the mode when being attached. Check the lights, if the first light keeps on and other three are off, just re-connect it and make sure not press both side buttons. BTW, in this mode, the buttons are bound to some system operations, for example the Joy-Con-Right:
- +: toggle on/off the events below - R: mouse right click - B: `Esc` - Home: system volumn down - Stick button: system volumn up - Stick spin: mouse move ...
config.toml is generated at the first launch, it monitors file modification and applys new changes on the fly. The sections:
A mode id must be assigned by parameter
-id, it can be any string as long as not conflicts. The first in the list is used as the default mode.
|[idle]||do nothing, normally used as default mode||
|[gyro]||enable/disable the gyroscope on enter/exit||
|[speech]||start/stop capturing audio input on enter/exit||
2. Mode Rule
A Mode does very little, jobs are done by mode rules. There two types of rules:
trigger: One-time-event like button press. When an input matches the trigger condition, the corresponding action is performed.
switch: It can be turned on and off, when it’s on, the modifier will be applied to the input signal.
[switch] button -id R -> [boost] -speed 3 means when the button
R is down, the cursor moves 3 times faster.
[trigger] stick -side Right -> [cursor] -speed 40Spin right stick to move mouse cursor.
[trigger] button -id R-SR -> [hotkey] -keys t control altPress button R-SR to trigger hotkey “ctrl+alt+t”.
[switch] button -id A -> [prefix] -prefix "[camel] [title] "When button A is down, speech text is decorated to camel+title case, e.g.: “hello world again” -> “HelloWorldAgain”.
[switch] button -id R -> [mode] -id MouseModeSwitch to MouseMode by holding R, release R go get back to default mode.
Note: Most parameters are set by single dash:
-text hello, use double dash for boolean parameters:
--number=false, use space seperated strings for array types:
-map a b c. For special character, it must be wrapped with double quote, such as “-“.
|[button]||button down/up event||
|[stick]||stick spinning event||
|[gyro]||when gyroscope is enabled|
|[speech]||when the voice is recognized and returned as text|
|[cursor]||move mouse cursor||
|[hotkey]||single key press or combination||
|[notify]||show a system notification||
|[speech]||execute a speech, words in a sentence can be executed in different ways, which can be configured in section WordMapping||
|[speak]||used for complex task that cannot be done in a single action, works by simulating a speech text which will be handled by the above [speech] action||
|[flush]||this currently works by sending a chunk of zero data to speech engine, the engine may consider the zeroes as a long period of silence, hence it stops waiting for more voice input and returns result quicker. Only use this with limited phrase list, otherwise it can cause stuck behavior as it doesn’t return result until next speech.|
|[repeat]||repeat last action|
|[button]||switched on when button down, off when button up||
|[stick]||switched on when stick moves to the edge, off when leaving that edge||
|[mode]||switch to another mode||
|[boost]||speed up/down cursor movement||
|[camel][title][snake][upper]||convert speech text to different case by adding a prefix|
|[prefix]||add custom prefix to the speech text||
3. Phrase List
A dictionary for limiting the speech model, only the words in the list are recognized. For example:
alphabet = ['a', 'b', 'c', ...'zed'] golang = ['package', 'switch', ...'']
Different groups can be used together for different speech modes. e.g.
-phrase alphabet golang java
If you found some words conflict a lot, like
for, remove the
for from the phrase list, map
4 loop ->
for , or
for in the mapping section below. Then the conflict is avoided:
- when you say
4, it types
- when you say
forever, it types
4. Word Mapping
A sentence is splitted to many words and executed by different executors, which are registered in this WordMapping section, it can handle complex task like:
run calc.exe(shell) ->
delay 1 second(delay) ->
type "1+1"(typing) ->
A list of executors:
|word executor Type||Description||Parameters|
|normal words||if there is no tag([…]), it’s simply a word replacement Empty or special word should be wrapped with double quote “”.||e.g.
|[hotkey]||trigger a hotkey||The key combination a full key liste.g.
|[shell]||execute a shell command||command and argumentse.g.
|[delay]||delay some period||duration stringe.g.
|[typing]||the word will be typed if not handled by other executors|
|[repeat]||repeat last speech|
Mappings are grouped and can be used together like
-map programming application go
- Auto change mode when switch between applications
- Show the speech text directly on screen
- This is greatly inspired by Talon Voice
- The awesome dekuNukem/Nintendo_Switch_Reverse_Engineering
- All the Joy-Con protocol implementations: riking/joycon wazho/ns-joycon Davidobot/BetterJoy tomayac/joy-con-webhid looking-glass/joyconlib