DwayneBot: Architecture
How it works.
Architecture
DwayneBot exists inside the Dwayne.xyz web app. I designed it to run as a service that:
- Reads messages from multiple endpoints. These include a WebSocket endpoint, REST API endpoint, and internal syncronous and asyncronous functions.
- Parses these messages for Actions and adds them to one of a set of Action Queues.
- Continuously monitors and processes the Actions in these Queues.
- Returns text and HTML responses to rooms that have registered to receive them.
Instances
There's always one instance of the core DwayneBot service running at any given time, but it was designed to connect to or exist in multiple places at once, including:
- Twitch
- IRC (including multiple channels/rooms on multiple servers)
- The two different types of web chats on this server (group chats like on the Stream page, or any of the 1-1 video/text chat rooms I host here)
- Either of the DwayneBot web forms here on the website
- Mastodon
The first three run in asyncronous mode, meaning all the chat messages are sent to the Bot, and only a few responses are sent back. The rest are syncronous, meaning there's an immediate response for each message sent.
Inside the Bot process, an Instance is created for each room it connects to. Each one contains an Action Queue, a Context object that holds configuration options, and a list of destinations for responses.
REST API and Websockets
Inside the website process, there are both syncronous and asyncronous functions available for access to DwayneBot, which are used for both the web chats and web forms here on the website. For anything outside the website (like external chat rooms), I created REST API and Websocket endpoints.
If another app opens a valid Websocket connection to this endpoint, the Websocket handler code uses the asyncronous functions to provide an easy interface for messages and responses. The REST API is relatively simple and just uses the syncronous function to get an immediate response.
Twitch/IRC Bridge App
DwayneBot connects to external services like Twitch or IRC through a separate app. This Bridge App was designed to connect to a configurable list of Twitch or IRC channels, and then make that many separate Websocket connections to the Bot.
Its job is to handle passing messages and responses back and forth for multiple Instances at the same time, while also monitoring the connections to the external services and the Bot itself and restarting them if necessary. It does this so that the website process can be updated without disconnecting the Bot from each server/channel.
Additionally, while it was mostly designed to be a "shell" App that only passes messages, it can report its status independently of the Bot, which is presented to users as an Analysis mode.
Parser
The Parser is the part of DwayneBot that reads all incoming messages and figures out what Actions to execute with which Arguments. Here's what it does for each message:
- Decide if the message should be logged, based on the Instance type (IRC, Twitch, etc), and whether the Bot was recently asked to either start or stop logging.
- Determine if this user had previously asked a question that needs additional Arguments. If so, return that existing pending Action object with this message appended to the Argument list.
- If not, figure out if the message is a command (if it begins with "!" or whatever the configured command prefix is). If so, send it to the Simple Parser function.
- If not, see if it has a link in it (if Link Preview is enabled). If so, return the
Link Preview
action. - If not, determine if anymore parsing should happen (if it starts with "hey bot", "DwayneBot", or whatever the configured response name is).
- If so, and if Auto Translate is enabled, detect the language of the message and translate it if it's not English.
- Send it to the NLP Parser function.
Simple Parser
The Simple Parser function reads a message and returns one Action object.
- Read the first word of the message and remove the command prefix.
- Check if the Bot was previously asked to respond to this command with a specific response. If so, respond with a
Say
Action object with the response. - If not, check the list of built in commands for a match. If there's a match, return an Action object for it.
NLP Parser
DwayneBot wasn't really designed to be generally intelligent or carry actual conversation, but it does try to understand if the message contains questions or statements that sound like Actions. The NLP Parser function reads a message and returns any number of Action objects. It does this using a system I designed that uses a mapping between regular expression strings and Actions.
This mapping of regular expression strings and Actions is maintained in the code. Helper functions to make it easier to manage the many different ways a thing can be asked and how statements get translated into Action arguments.
Actions
Actions are the set of underlying thing the Bot can respond to. Each Action object in the system contains an Action type and list of text Arguments (Action options). Each Action type is mapped to a function that returns a set of Responses. So far, I've written 56 Actions. Here's an example of one:
Latest Post: This Action will retrieve 1 or more Posts from the main Dwayne.xyz database. It accepts 3 Arguments: The limit of the posts, the offset of the posts, and the tag of the posts.
Parsing
When the Parser processes a message, it reads the text and tries to return a list of Action objects. Here are 4 different messages that will result in the Latest Post Action:
-
!latest-postLimit of 1, offset of 0, and no tag.
-
Hey bot, get me the three most recent posts.Limit of 3, offset of 0, and no tag.
-
Hey bot, get me the 6th most recent post.Limit of 1, offset of 5, and no tag.
-
Hey bot, get me the most recent post about golang.Limit of 1, offset of 0, and the tag golang.
Action Queues
The Actions (and their Options) that come from parsing are added to the Action Queue for the specific Instance. Every two seconds, each Queue is processed, and the Actions are potentially combined and run. This enables asking something like "Hey bot, tell me the weather and the song that's playing" and getting one combined response. It also lets two users both ask for the same thing at similar times and get one response addressed to both.
Sometimes a Parser result might result in an Action being sent to the Scheduler instead of directly to a Queue, like if the Bot is asked "Hey bot, tell us the weather in 20 minutes". The Scheduler is just a separate list of pending Actions and the times they're ready to be added to their Queues. It's checked every few seconds.
Creating Responses
Once the Actions in a Queue are combined and processed, they're passed to functions that are primarily responsible for making requests for data and creating responses. Each one of these functions receives an Action and its Options, the current "mood" to respond in, the user(s) and their authorization, and the type of environment the instance is in (Twitch, IRC, etc).
These Actions might have any number of side effects, including storing data in Bot memory, changing the Bot's configuration, making authenticated API calls, and more. Sometimes, an Action might even pass a message to one of the Action Queues in order to get and modify a response that it itself came up with (this is mostly used with language translation).
All of the responses from the list of Actions are collected and then potentially combined again. Then they're sent back down to the room they came from to be displayed.
Memory
DwayneBot needs to store data about the things that happen while it's running. Some of that data includes:
- Its current mood (the variation of responses) for a specific Instance.
- Whether a specific user should be ignored.
- The specific text to respond to a command with (if it was previous asked to do so).
- Whether it's waiting for more input from a user to complete an Action.
- The current Trust level for a user.
- Chat logs.
All of the data the Bot uses is stored in a custom storage system I designed. It's backed by the main dwayne.xyz PostgreSQL database and heavily cached for performance. This data is stored as an immutable log of expiring entries, which is an great way to quickly store lots of small entries that can be easily and safely deleted after expiration.
Next Section