diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 6d1a3c91..0842fcd3 100755 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -1,14 +1,23 @@ +### ROLE: NETALERTX ARCHITECT & STRICT CODE AUDITOR +You are a cynical Security Engineer and Core Maintainer of NetAlertX. Your goal is not just to "help," but to "deliver verified, secure, and production-ready solutions." + +### MANDATORY BEHAVIORAL OVERRIDES: +1. **Obsessive Verification:** Never provide a solution without a corresponding proof of correctness. If you write a function, you MUST write a test case or validation step immediately after. +2. **Anti-Laziness Protocol:** You are forbidden from using placeholders (e.g., `// ... rest of code`, ``). You must output the full, functional block every time to ensure context is preserved. +3. **Priority Hierarchy:** Priority 1 is Correctness. Priority 2 is Completeness. Priority 3 is Speed. +4. **Mantra:** "Job's not done 'till unit tests run." + +--- + # NetAlertX AI Assistant Instructions This is NetAlertX — network monitoring & alerting. NetAlertX provides Network inventory, awareness, insight, categorization, intruder and presence detection. This is a heavily community-driven project, welcoming of all contributions. -You are expected to be concise, opinionated, and biased toward security and simplicity. - ## Architecture (what runs where) - Backend (Python): main loop + GraphQL/REST endpoints orchestrate scans, plugins, workflows, notifications, and JSON export. - - Key: `server/__main__.py`, `server/plugin.py`, `server/initialise.py`, `server/api_server/api_server_start.py` + - Key: `server/__main__.py`, `server/plugin.py`, `server/initialise.py`, `server/api_server/api_server_start.py` - Data (SQLite): persistent state in `db/app.db`; helpers in `server/database.py` and `server/db/*`. - Frontend (Nginx + PHP + JS): UI reads JSON, triggers execution queue events. - - Key: `front/`, `front/js/common.js`, `front/php/server/*.php` + - Key: `front/`, `front/js/common.js`, `front/php/server/*.php` - Plugins (Python): acquisition/enrichment/publishers under `front/plugins/*` with `config.json` manifests. - Messaging/Workflows: `server/messaging/*`, `server/workflows/*` - API JSON Cache for UI: generated under `api/*.json` @@ -34,8 +43,8 @@ Backend loop phases (see `server/__main__.py` and `server/plugin.py`): `once`, ` - Use logging as shown in other plugins. - Collect results with `Plugin_Objects.add_object(...)` during processing and call `plugin_objects.write_result_file()` exactly once at the end of the script. - Prefer to log a brief summary before writing (e.g., total objects added) to aid troubleshooting; keep logs concise at `info` level and use `verbose` or `debug` for extra context. - - Do not write ad‑hoc files for results; the only consumable output is `last_result..log` generated by `Plugin_Objects`. + ## API/Endpoints quick map - Flask app: `server/api_server/api_server_start.py` exposes routes like `/device/`, `/devices`, `/devices/export/{csv,json}`, `/devices/import`, `/devices/totals`, `/devices/by-status`, plus `nettools`, `events`, `sessions`, `dbquery`, `metrics`, `sync`. - Authorization: all routes expect header `Authorization: Bearer ` via `get_setting_value('API_TOKEN')`. @@ -44,7 +53,7 @@ Backend loop phases (see `server/__main__.py` and `server/plugin.py`): `once`, ` ## Conventions & helpers to reuse - Settings: add/modify via `ccd()` in `server/initialise.py` or per‑plugin manifest. Never hardcode ports or secrets; use `get_setting_value()`. - Logging: use `mylog(level, [message])`; levels: none/minimal/verbose/debug/trace. `none` is used for most important messages that should always appear, such as exceptions. -- Time/MAC/strings: `helper.py` (`timeNowDB`, `normalize_mac`, sanitizers). Validate MACs before DB writes. +- Time/MAC/strings: `server/utils/datetime_utils.py` (`timeNowDB`), `front/plugins/plugin_helper.py` (`normalize_mac`), `server/helper.py` (sanitizers). Validate MACs before DB writes. - DB helpers: prefer `server/db/db_helper.py` functions (e.g., `get_table_json`, device condition helpers) over raw SQL in new paths. ## Dev workflow (devcontainer) @@ -65,28 +74,13 @@ Backend loop phases (see `server/__main__.py` and `server/plugin.py`): `once`, ` ## Useful references - Docs: `docs/PLUGINS_DEV.md`, `docs/SETTINGS_SYSTEM.md`, `docs/API_*.md`, `docs/DEBUG_*.md` - Logs: All logs are under `/tmp/log/`. Plugin logs are very shortly under `/tmp/log/plugins/` until picked up by the server. - - plugin logs: `/tmp/log/app.log` - - backend logs: `/tmp/log/stdout.log` and `/tmp/log/stderr.log` - - frontend commands logs: `/tmp/log/app_front.log` - - php errors: `/tmp/log/app.php_errors.log` - - nginx logs: `/tmp/log/nginx-access.log` and `/tmp/log/nginx-error.log` - -## Assistant expectations: -- Be concise, opinionated, and biased toward security and simplicity. -- Reference concrete files/paths/environmental variables. -- Use existing helpers/settings. -- Offer a quick validation step (log line, API hit, or JSON export) for anything you add. -- Be blunt about risks and when you offer suggestions ensure they're also blunt, -- Ask for confirmation before making changes that run code or change multiple files. -- Make statements actionable and specific; propose exact edits. -- Request confirmation before applying changes that affect more than a single, clearly scoped line or file. -- Ask the user to debug something for an actionable value if you're unsure. -- Be sure to offer choices when appropriate. -- Always understand the intent of the user's request and undo/redo as needed. -- Above all, use the simplest possible code that meets the need so it can be easily audited and maintained. -- Always leave logging enabled. If there is a possiblity it will be difficult to debug with current logging, add more logging. -- Always run the testFailure tool before executing any tests to gather current failure information and avoid redundant runs. -- Always prioritize using the appropriate tools in the environment first. As an example if a test is failing use `testFailure` then `runTests`. Never `runTests` first. -- Docker tests take an extremely long time to run. Avoid changes to docker or tests until you've examined the exisiting testFailures and runTests results. -- Environment tools are designed specifically for your use in this project and running them in this order will give you the best results. + - plugin logs: `/tmp/log/plugins/*.log` + - backend logs: `/tmp/log/stdout.log` and `/tmp/log/stderr.log` + - frontend commands logs: `/tmp/log/app_front.log` + - php errors: `/tmp/log/app.php_errors.log` + - nginx logs: `/tmp/log/nginx-access.log` and `/tmp/log/nginx-error.log` +## Execution Protocol (Strict) +- Always run the `testFailure` tool before executing any tests to gather current failure information and avoid redundant runs. +- Always prioritize using the appropriate tools in the environment first. Example: if a test is failing use `testFailure` then `runTests`. +- Docker tests take an extremely long time to run. Avoid changes to docker or tests until you've examined the existing `testFailure`s and `runTests` results. \ No newline at end of file diff --git a/docs/PUID_PGID_SECURITY.md b/docs/PUID_PGID_SECURITY.md new file mode 100644 index 00000000..a68418e2 --- /dev/null +++ b/docs/PUID_PGID_SECURITY.md @@ -0,0 +1,30 @@ +# PUID/PGID Security — Why the entrypoint requires numeric IDs + +## Purpose + +This short document explains the security rationale behind the root-priming entrypoint's validation of runtime user IDs (`PUID`) and group IDs (`PGID`). The validation is intentionally strict and is a safety measure to prevent environment-variable based command injection when running as root during the initial priming stage. + +## Key points + +- The entrypoint accepts only values that are strictly numeric (digits only). Non-numeric values are treated as malformed and are a fatal error. +- The fatal check exists to prevent *injection* or accidental shell interpretation of environment values while the container runs as root (e.g., `PUID="20211 && rm -rf /"`). +- There is **no artificial upper bound** enforced by the validation — any numeric UID/GID is valid (for example, `100000` is acceptable). + +## Behavior on malformed input + +- If `PUID` or `PGID` cannot be parsed as numeric (digits-only), the entrypoint prints an explicit security message to stderr and exits with a non-zero status. +- This is a deliberate, conservative safety measure — we prefer failing fast on potentially dangerous input rather than continuing with root-privileged operations. + +## Operator guidance + +- Always supply numeric values for `PUID` and `PGID` in your environment (via `docker-compose.yml`, `docker run -e`, or equivalent). Example: `PUID=20211`. +- If you need to run with a high-numbered UID/GID (e.g., `100000`), that is fine — the entrypoint allows it as long as the value is numeric. +- Don’t pass shell meta-characters, spaces, or compound commands in `PUID` or `PGID` — those will be rejected as malformed and cause the container to exit. + +## Related docs + +- See `docs/docker-troubleshooting/file-permissions.md` for general permission troubleshooting and guidance about setting `PUID`/`PGID`. + +--- + +*Document created to clarify the security behavior of the root-priming entrypoint (PUID/PGID validation).* \ No newline at end of file diff --git a/docs/docker-troubleshooting/PUID_PGID_SECURITY.md b/docs/docker-troubleshooting/PUID_PGID_SECURITY.md new file mode 100644 index 00000000..4a9ebcc4 --- /dev/null +++ b/docs/docker-troubleshooting/PUID_PGID_SECURITY.md @@ -0,0 +1,43 @@ +# PUID/PGID Security — Why the entrypoint requires numeric IDs + +## Purpose + +This short document explains the security rationale behind the root-priming entrypoint's validation of runtime user IDs (`PUID`) and group IDs (`PGID`). The validation is intentionally strict and is a safety measure to prevent environment-variable based command injection when running as root during the initial priming stage. + +## Key points + +- The entrypoint accepts only values that are strictly numeric (digits only). Non-numeric values are treated as malformed and are a fatal error. +- The fatal check exists to prevent *injection* or accidental shell interpretation of environment values while the container runs as root (e.g., `PUID="20211 && rm -rf /"`). +- There is **no artificial upper bound** enforced by the validation — any numeric UID/GID is valid (for example, `100000` is acceptable). + +## Behavior on malformed input + +- If `PUID` or `PGID` cannot be parsed as numeric (digits-only), the entrypoint prints an explicit security message to stderr and exits with a non-zero status. +- This is a deliberate, conservative safety measure — we prefer failing fast on potentially dangerous input rather than continuing with root-privileged operations. + +## Operator guidance + +- Always supply numeric values for `PUID` and `PGID` in your environment (via `docker-compose.yml`, `docker run -e`, or equivalent). Example: `PUID=20211`. +- If you need to run with a high-numbered UID/GID (e.g., `100000`), that is fine — the entrypoint allows it as long as the value is numeric. +- Don’t pass shell meta-characters, spaces, or compound commands in `PUID` or `PGID` — those will be rejected as malformed and cause the container to exit. + +## Required Capabilities for Privilege Drop + +If you are hardening your container by dropping capabilities (e.g., `cap_drop: [ALL]`), you **must** explicitly grant the `SETUID` and `SETGID` capabilities. + +- **Why?** The entrypoint runs as root to set permissions, then uses `su-exec` to switch to the user specified by `PUID`/`PGID`. This switch requires the kernel to allow the process to change its own UID/GID. +- **Symptom:** If these capabilities are missing, the container will log a warning ("su-exec failed") and continue running as **root** (UID 0), defeating the purpose of setting `PUID`/`PGID`. +- **Fix:** Add `SETUID` and `SETGID` to your `cap_add` list. + +```yaml +cap_drop: + - ALL +cap_add: + - SETUID + - SETGID + # ... other required caps like CHOWN, NET_ADMIN, etc. +``` + +--- + +*Document created to clarify the security behavior of the root-priming entrypoint (PUID/PGID validation).* \ No newline at end of file diff --git a/docs/docker-troubleshooting/missing-capabilities.md b/docs/docker-troubleshooting/missing-capabilities.md index 9cb4fb0e..dd75c7f0 100644 --- a/docs/docker-troubleshooting/missing-capabilities.md +++ b/docs/docker-troubleshooting/missing-capabilities.md @@ -29,4 +29,22 @@ Add the required capabilities to your container: Docker Compose setup can be complex. We recommend starting with the default docker-compose.yml as a base and modifying it incrementally. -For detailed Docker Compose configuration guidance, see: [DOCKER_COMPOSE.md](https://github.com/jokob-sk/NetAlertX/blob/main/docs/DOCKER_COMPOSE.md) \ No newline at end of file +For detailed Docker Compose configuration guidance, see: [DOCKER_COMPOSE.md](https://github.com/jokob-sk/NetAlertX/blob/main/docs/DOCKER_COMPOSE.md) + +## CAP_CHOWN required when cap_drop: [ALL] + +When you start NetAlertX with `cap_drop: [ALL]`, the container loses `CAP_CHOWN`. The root priming step needs `CAP_CHOWN` to adjust ownership of `/data` and `/tmp` before dropping privileges to `PUID:PGID`. Without it, startup fails with a fatal `failed to chown` message and exits. + +To fix: +- Add `CHOWN` back in `cap_add` when you also set `cap_drop: [ALL]`: + + ```yaml + cap_drop: + - ALL + cap_add: + - CHOWN + ``` + +- Or pre-chown the mounted host paths to your target `PUID:PGID` so the priming step does not need the capability. + +If you harden capabilities further, expect priming to fail until you restore the minimum set needed for ownership changes. \ No newline at end of file diff --git a/install/production-filesystem/README.md b/install/production-filesystem/README.md index c7451358..4343b31f 100755 --- a/install/production-filesystem/README.md +++ b/install/production-filesystem/README.md @@ -85,8 +85,22 @@ Scripts that start and manage the core services required for NetAlertX operation - `healthcheck.sh` - Container health verification - `cron_script.sh` - Scheduled task definitions + +### `/root-entrypoint.sh` - Initial Entrypoint and Permission Priming +This script is the very first process executed in the production container (it becomes PID 1 and `/` in the Docker filesystem). Its primary role is to perform best-effort permission priming for all runtime and persistent paths, ensuring that directories like `/data`, `/tmp`, and their subpaths are owned and writable by the correct user and group (as specified by the `PUID` and `PGID` environment variables, defaulting to 20211). + +Key behaviors: +- If started as root, attempts to create and chown all required paths, then drops privileges to the target user/group using `su-exec`. +- If started as non-root, skips priming and expects the operator to ensure correct host-side permissions. +- All permission operations are best-effort: failures to chown/chmod do not halt startup, but are logged for troubleshooting. +- The only fatal condition is a malformed (non-numeric) `PUID` or `PGID` value, which is treated as a security risk and halts startup with a clear error message and troubleshooting URL. +- No artificial upper bound is enforced on UID/GID; any numeric value is accepted. +- If privilege drop fails, the script logs a warning and continues as the current user for resilience. + +This design ensures that NetAlertX can run securely and portably across a wide range of host environments (including NAS appliances and hardened Docker setups), while minimizing the risk of privilege escalation or misconfiguration. + ### `/entrypoint.sh` - Container Startup Script -The main orchestration script that runs when the container starts. It coordinates the entire container initialization process, from pre-startup validation through service startup and ongoing monitoring, ensuring NetAlertX operates reliably in production environments. +The main orchestration script that runs after `/root-entrypoint.sh` completes. It coordinates the entire container initialization process, from pre-startup validation through service startup and ongoing monitoring, ensuring NetAlertX operates reliably in production environments. The main script that runs when the container starts: - Runs all pre-startup checks from `/services/scripts`