A server rack with calm lighting representing system integrity, uptime, and reliable stewardship.

System Integrity & Uptime

A calm relay needs a calm platform: predictable uptime, honest status, and steady recovery when something breaks. This page explains how we monitor health, publish a simple Red/Amber/Green status, and protect trust without hype.

Why integrity matters here

In a practice-support project, “uptime” is not a bragging metric — it’s a form of care. People arrive at all hours. Sometimes they come for five minutes. Sometimes they come because the day is heavy. The platform can’t promise a perfect world, but it can promise something simpler: it will try to be steady.

“System integrity” means the relay behaves the way it claims to behave: totals are counted correctly, privacy boundaries remain intact, and outages are handled transparently — without blame, without drama, and without making users guess what’s happening.

What “system integrity” means

Integrity is wider than “the site is up.” It includes:

  • Correctness — sessions are counted accurately; no double-counting, no silent drops.
  • Consistency — the same action produces the same result; totals reconcile across devices.
  • Safety — abuse protections exist; admin actions are gated and audited.
  • Boundaries — privacy and retention rules match what we publish publicly.
  • Recoverability — if something breaks, we can restore service calmly and reliably.

If you want the data side of integrity, see Data Retention & Memory Ethics and Privacy & GDPR.

The RAG status model

We use a simple Red/Amber/Green (RAG) status because it’s readable. You shouldn’t need a technical background to know whether the lane is open.

  • Green — core features are working normally.
  • Amber — the service is operating, but some parts may be slow or degraded.
  • Red — major outage or critical feature unavailable; we’re working on recovery.

When status changes, we aim to publish a plain-English note: what’s affected, what we’re doing, and what you can expect next. Not perfect predictability — just honest communication.

Tip: If you want a dedicated “Status” page layout (with a short log of events), tell me your preferred URL and I’ll format it to match your site.

What we monitor

Monitoring is stewardship, not surveillance. We watch systems to keep them stable — not to profile users. The goal is early detection: catch problems while they are small and recover before frustration spreads.

Typical signals include:

  • Availability — can the app and API respond successfully?
  • Error rate — are requests failing more than usual?
  • Latency — are pages or API calls becoming slow?
  • Dependency health — database connectivity, email delivery, and geo lookup availability.
  • Queue/backlog — are background tasks building up (a sign of stress)?
  • Resource pressure — saturation signals (timeouts, memory pressure, cold starts).

We keep these signals minimal and operational. For what we keep and how long, see: Data Retention & Memory Ethics.

Health checks explained calmly

A health check is a simple test the system runs to confirm it’s okay — like tapping a microphone before a talk. It doesn’t record your private practice; it checks whether core components are reachable and behaving.

A calm health-check approach usually tests:

  • Web — the site responds and key pages load.
  • API — core endpoints respond without error.
  • Database — reads/writes succeed within a normal time budget.
  • Email pipeline — if enabled, messages can be queued and unsubscribe headers behave as expected.
  • Geo lookup — if used, derived city/country/timezone can be obtained without blocking the system.

These checks let us detect breakage early — especially after deployments or dependency changes.

What uptime means and does not mean

Uptime means the service is reachable and functioning. It does not mean:

  • every feature is perfect all the time
  • every third-party dependency is always available
  • the internet is always stable between you and the server

Stewardship-first uptime is about being honest and resilient: graceful degradation when something upstream fails, and calm recovery when something breaks.

Degradation over disaster

The best outages are the ones you barely notice because the system degrades gracefully. That can look like:

  • the app continues to run with cached content if a dependency is slow
  • non-essential features temporarily pause while core chanting session tracking remains available
  • community totals may show “last known good” values while the system catches up
  • status flips to Amber with a clear note rather than silently failing

Degradation is a design choice: protect the core experience first.

Incident response without drama

When something goes wrong, the job is not to look impressive. The job is to restore safety and clarity. Our incident response aims to follow a calm sequence:

  • Detect — monitoring or reports indicate a problem.
  • Assess — is it Green/Amber/Red? What is impacted?
  • Stabilise — reduce blast radius; pause non-essential work if needed.
  • Recover — restore core features and verify totals/integrity.
  • Explain — plain-English summary: what happened, what we changed, what we learned.

The most important part is the last one: explanation. Silence feels like neglect. Clarity rebuilds trust.

Change management and deployments

Many incidents come from change: new code, new config, new dependencies. Stewardship-first change management means:

  • Small releases — easier to understand, easier to roll back.
  • Visible versioning — changes are named and dated.
  • Rollback paths — if something breaks, we can undo safely.
  • Post-change checks — verify core functions (sessions, totals, status).

If you’ve ever used an app that “changed overnight” and became noisy or manipulative, you’ve felt the opposite approach. We want changes to feel like maintenance on a bridge — not a marketing campaign.

Integrity of totals and counting

In a relay, totals matter because they represent continuity. Integrity means: if you chant for 20 minutes, it counts as 20 minutes — not 19, not 25, not doubled.

To protect counting integrity, systems typically rely on:

  • Idempotency — the same event processed twice should not double-count.
  • Server-side validation — totals are computed safely, not only on the client.
  • Audit markers — minimal markers to reconcile “start/stop/pause/resume” sequences.
  • Reconciliation — periodic checks to ensure rollups match raw totals within tolerance.

We keep these controls operational, not invasive. For how long any detail is kept, see: Data Retention & Memory Ethics.

Privacy-compatible observability

Observability is the ability to answer: “What’s happening in the system?” without guessing. The risk is that observability tools can drift into surveillance.

Our boundary is: monitor systems, not people. We prefer aggregated operational metrics and time-bounded logs, and we avoid third-party ad-tech scripts entirely. See Why we don’t use ads and Privacy & GDPR.

What you can do when something feels off

If something feels wrong — totals not updating, pages not loading, sessions failing to stop — the most helpful thing you can do is:

  • check the status note (Green/Amber/Red) if available
  • try a refresh (PWA caches can occasionally need a nudge after an update)
  • if it persists, report it via Support with the approximate time and what you saw

You don’t need to write a technical report. A calm sentence is enough: “At 21:10 UTC, stop button didn’t save the session.” That helps us triangulate quickly.

Commitments without overpromising

We won’t promise impossible perfection. Instead we make practical commitments:

  • Transparency — clear RAG status and plain-English notes during issues.
  • Steady improvement — fix root causes where possible, not just symptoms.
  • Data boundaries — integrity monitoring stays privacy-compatible.
  • No dark patterns — we won’t use outages as excuses for manipulative monetisation.

This is what “Stewardship First” looks like in the infrastructure layer.

FAQ

Do you have a live status page?

If the status page is published separately, it will use the same RAG model with a short plain-English log. (If you want, I can generate a dedicated /system-integrity.aspx status layout to match your site classes.)

Does monitoring mean you’re tracking me?

Monitoring is aimed at the system: availability, error rates, and performance. It’s not ad tracking and not behavioural profiling. The data boundaries are described in Privacy & GDPR and Data Retention & Memory Ethics.

What if my totals look wrong?

Report it via Support with the approximate time and what you expected vs. what you saw. We’ll treat it as an integrity issue and investigate calmly.