Repository Guide

Warning

This document is out of date, originating from the 2022-23 team, and is currently being migrated. There are some broken links, and some unfinished content.

In order to keep the entire codebase in sync as it is developed, we have opted to organize every service into a singular monorepo.

The top level repository folder does not contain any projects on its own outside of various configuration files, convenience scripts, and GitHub actions.

Next, we'll break down the role of each portion of the system, what technologies it uses, and why it exists.

This section is currently partly in the Project Plan document, and will be expanded upon here later when the details are finalized.

TODO embed image of system diagram

Github Projects Task Board

New: Needs clarification with teammates and/or client
Consult Jason: specifically awaiting info from or attention of client
Backlog: Stuff we could do but don't plan to work on soon
Ready: Stuff we could do and do plan to work on soon
In Progress: Stuff we're actively working on (move it back to Ready if you aren't)
In Review: Stuff we've finished but need to check it over next client meeting
Done: Stuff we've finished and client has checked it over
Eventually: Stretch goals and/or stuff that won't happen for a while

Tried it out over Trello because it would keep the information tied to our repo

Github Milestones system used to keep track of when tasks are supposed to be due, it kinda sucks because you can't say "I want it due this day" unless you have a milestone for that day. Access it from the Github issues page as a tiny button, or here https://github.com/AutomatingSciencePipeline/Monorepo/milestones

We are using a monorepo partly because Milestones are repo-specific

GitHub Actions Continuous Integration

You can learn more about CI's relevance to our repo here: Understanding GitHub CI.

This section explains each of the files in our repo's /.github/workflows folder and why they exist.

node-test.yml

Success is required before PRs can be merged into main.

Ensures there are no compile errors
Runs ESLint (code style and bug early detection)
Performs typescript type checking (bug early detection)
Runs our unit tests

python-test.yml

Success is required before PRs can be merged into main.

Ensures there are no compile errors
Runs pylint (code style and bug early detection)
Runs our unit tests

docker.yml

Success is required before PRs can be merged into main.

Ensures that the production docker images can be built successfully.

stale.yml

Posts a comment and applies a label to our GitHub issues if we haven't modified them in a while.

Repo Setup

TODO Talk about why the bash setup scripts exist and why devcontainers only half worked (the editor side needs ALL dependencies of every component installed for editor tooling to work)

Shellcheck linter for bash scripts

Check the gitattributes file for various settings and explanations of why they are that way

Github Wiki as a git sub module so it's easier to edit in editor and as you work (see the Contributing page "Local Copy of the Docs" section)

Markdownlint

VSCode with suggested extensions and workspace config as a middle ground

Docker

TODO

We've been using windows host machines exclusively, it's important to note that windows/linux hosts have different capabilities on some levels, ex. I (Rob) don't think linux machines can run Windows container images, but not certain

Glados server is a linux host

Frontend

Typescript

Javascript but better

Type safety, can get kinda annoying, but catches a lot of mistakes at editor time rather than runtime

Next.js

We're using it because the past team was.

Framework built on top of React

Offers a bunch of optimization stuff like server-side componenets that we aren't really using

Node.js

Server-side javascript, the frontend web server is implemented using this

Required by Next.js

The version of Node.js installed by the dev install script is set in variables.sh

Node Version Manager (and its windows variant)

Installs Node for you without conflicting with other installs on your machine.

Dev install script should handle it for you.

Node Package Manager NPM

Installs node dependencies.

Unclear if using pnpm or yarn would have much of a performance benefit over npm because we already use Node Version Manager

Updating Project Node Dependencies

See the page Updating Project Node Dependencies for more details.

Mantine forms

This is currently behind a major versions, we should™ update it

Tailwind CSS

We're using it because the past team was. Not super attached.

Sorta like boostrap where it has a bunch of css classes that are named based on what css properties they apply

ESLint

Backend

It currently serves as the only experiment runner

Python (3.8, basic types support)

The version of Python installed by the dev install script is set in variables.sh

Pyenv

We use Pyenv to install python on the host machine, the dev install script will handle it for you.

Pipenv and dependencies

We're using Pipenv to manage our python virtual environment and package installs because:

We don't always want builds of the containers include the latest versions of packages, or else stuff like this happens.
We have a package that only needs to be installed on Windows (see the Pipfile for more info)

We're using it instead of pip freeze because that approach would need multiple freeze files (normal, windows-specific, etc).

Updating Project Python Dependencies

TODO As described above, we're using pipenv to manage python dependencies. Usage info can be found here.

To install new python dependencies, make sure you're in the backend app folder, then use pipenv install instead of pip install.

To install new dev dependencies use the --dev flag, for example, pipenv install --dev pytest to add pytest

The lockfile should be updated automatically when you use pipenv install, but in case you need to make one manually, use pipenv lock.

If you encounter a merge conflict on Pipfile.lock, regenerate a new copy via pipenv lock (see here and here)

pylint (linter)

The config file for this is currently in repo root

yapf (auto-formatter)

Python Runner

TODO

Currently the backend serves this role

Java Runner

TODO

Currently the backend serves this role

Firebase

Ideally this will be removed from the system eventually because it is not open source

Free plan

Our client is a firebase console admin and can add/remove people on the project.

https://console.firebase.google.com/u/0/project/gladosbase/overview

Replaced self-hosted Supabase used by the past team.

In the interest of getting a running prototype, since deciphering the past team’s Supabase configuration was blocking meaningful progress elsewhere.

Authentication

Not the end-goal solution for auth, but a great middle ground for now.

Firestore Database

We are using the firestore 'subscribe to data' feature on the frontend

Can't keep everything in here because of storage + read/write usage costs

Some of the data here could be moved to MongoDB

Firestore

In the process of being replaced with MongoDB

Holds experiment source code and experiment results/logs for download via the frontend

MongoDB Experiment Data Storage

TODO @Brian

Docker Swarm vs Kubernetes

TODO

Have to pick which distributed system platform will work best

We ran into issues with having both linux and windows host machines on docker swarm (which means we couldn't have both Glados Server and our own dev machines contributing computing power), unclear if kube will allow this or not

GLADOS Server

Linux machine that is the docker host for the deployed system (nothing runs at system level, it's all inside docker)

Hosted in the rose CSSE department

See Connecting to the GLADOS Server