Doctor Droid
Doctor Droid is a platform that helps engineering teams manage and improve their alert systems and incident responses. It offers tools to reduce alert fatigue, enhance incident investigation, and improve overall efficiency.
Key Features
Doctor Droid introduces the AlertOps Bot, a tool that helps engineering teams identify and address poorly configured and noisy alerts. By adding the bot to a Slack channel, engineers can instantly receive reports about noisy alerts over the last one month, three months, or six months. The bot provides immediate analysis of alerts that buzz too often and extracts key tags from the strings, such as service names and database names, to help faster drill-downs of noise. This tool is particularly useful for engineering teams at startups of all sizes, from seed stage to unicorns.
The Doctor Droid AIOps Platform builds a knowledge graph using company data to provide investigation and remediation recommendations during on-call issues and incidents. The platform stands out by making AIOps accessible to teams of all sizes, offering value from day one without requiring significant upfront investment or enterprise-level decisions.
Knowledge Graph Technology
At the core of Doctor Droid''s platform is its knowledge graph generator. This system ingests and analyzes various data sources to build a comprehensive understanding of the IT environment. Sources include past incident reports, issue tickets from systems like JIRA, on-call playbooks and SOPs, service documentation, and historical alert data. By analyzing these sources, Doctor Droid can understand the patterns of incidents, the relationships between different components, and the typical actions taken to resolve issues.
Fine-Tuning AI Models with Doctor Droid
Doctor Droid also provides tools for fine-tuning AI models, such as Meta''s LlaMa 3.1-8B. The platform helps with challenges related to deployment, reliability, accuracy, and latency. Fine-tuning a quantized model can be more effective in terms of quality, cost, and latency, especially for domain-oriented use cases.
Fine-Tuning Pipeline
The fine-tuning pipeline for a 4-bit quantized LlaMa 3.1-8B model involves several steps. Extracting attributes related to research papers using an 8-bit quantized LlaMa 8.1B model and storing them in a database. Generating a fine-tuning dataset using the information in the database. Fine-tuning the 4-bit quantized model in Unsloth. Testing the fine-tuned model and debugging issues using Doctor Droid Playbooks.
Debugging Issues
Common issues during fine-tuning include pipeline failures, processing issues, and data integrity checks. Doctor Droid Playbooks help automate investigations and provide quick access to relevant information, enabling faster debugging.
Playbooks for Automated Investigations
Doctor Droid Playbooks is an open-source tool that codifies and automates investigations. It enables engineers to invoke workflows based on specified conditions, making the results of investigations readily available during incidents. This helps engineers debug issues faster and more efficiently.
Creating and Executing Playbooks
Playbooks can be created with various tasks, such as Cloudwatch metric tasks, database queries, and bash commands. Conditions can be applied to these tasks to determine the next steps in the playbook. Time-based conditions can also be configured to add an additional parameter of the time of day to help decide the final output.
Installing Playbooks Using Docker Compose
To install Doctor Droid Playbooks using Docker Compose, follow these steps. Clone the repository, git clone [email protected]:DrDroidLab/PlayBooks.git. Install the docker containers, docker-compose -f deploy.docker-compose.yaml up -d. Verify the containers and proceed to sign up on the Playbooks platform. Create a user with name, email, and password, and then log in.
For using a custom Postgres database, create an env file in the root directory with the database credentials.
Machine Specifications
For installing playbooks on a computer, start with a machine that has 2 CPUs and 8GB of RAM. Depending on the load and the number of playbooks running and their frequency, it may be necessary to upgrade the machine.
Doctor Droid offers a comprehensive suite of tools designed to enhance the efficiency and effectiveness of engineering teams. From the AlertOps Bot to the AIOps Platform and fine-tuning AI models, Doctor Droid provides valuable solutions for managing alerts, investigating incidents, and optimizing AI models.