Published on

Part 3 — On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz

Authors
  • avatar
    Name
    Konrad Bartecki
    Twitter

Part 3 — On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz

Bring every cron script, legacy EXE, PowerShell task, and Python job into one dashboard you host yourself.

Part 3 of 4 in a series on observing a .NET system with OpenTelemetry and SigNoz (series index). It reuses the Collector, Compose file, and SigNoz install from Part 1.

Web apps are the easy part of observability. The hard part is everything else that keeps your business running after hours: a nightly cron job you inherited, a Python ETL (extract, transform, load) job a data team owns, a PowerShell backup task, a 12-year-old exporter that writes nothing but export-20260529.txt. You can't dotnet add package your way out of those. And they're often exactly the jobs that fail silently at 3 a.m.

This post shows how to get all of them into SigNoz, running entirely on your own infrastructure. By the end you'll be able to observe a job no matter where it sits on the spectrum -- from "only writes text files" to "full OpenTelemetry SDK." Every job-specific script is included at the end; the shared Collector config, Compose file, and SigNoz install come from Part 1's appendix. There's no repo to clone.

What you'll learn

  • Why on-prem observability is the right fit for background jobs
  • How the OpenTelemetry Collector acts as a universal on-ramp for anything
  • Case A: collect a job that only writes .txt files (no code changes)
  • Case B: instrument a Python job with the OpenTelemetry SDK
  • Case C: get telemetry out of PowerShell, which has no official SDK
  • How they all show up side by side in SigNoz

First, the words you'll see

This post builds on the stack from Part 1, so a few terms come up again and again. If you read Part 1, this is review.

  • OTLP -- OpenTelemetry's wire protocol. Your apps and jobs speak OTLP to a Collector, which forwards it to a backend like SigNoz. (Part 1 has the longer explanation.)
  • The Collector -- a small service that takes telemetry in from many sources and forwards it to SigNoz. It is the one piece every job in this post funnels through.
  • The filelog receiver -- the part of the Collector that tails text files and turns each line into a log record. It is how you collect a job that can't speak OTLP at all.

Why on-prem?

SigNoz is fully open-source and self-hosted, which matters more for background jobs than for almost anything else:

  • Your data stays in your network. Batch jobs touch your most sensitive data -- financial exports, PII, backups. With self-hosted SigNoz, the telemetry about that work never leaves your infrastructure. That's a real advantage in regulated or air-gapped environments where shipping logs to a SaaS is a non-starter.
  • No per-GB surprise bill. Jobs are noisy -- verbose logs, high-frequency runs. On a usage-priced SaaS, job logs are where the bill explodes. Self-hosted, the cost is the box it runs on.
  • It works where the jobs work. Plenty of these jobs run on an on-prem server or a locked-down VM with no outbound internet. The Collector and SigNoz run right there next to them.

Everything in this post runs locally: the apps and the Collector in your network, and SigNoz storing data in its own ClickHouse database on your infrastructure.

The one idea: the Collector is the on-ramp

The OpenTelemetry Collector is a small service that ingests telemetry from many sources and forwards it to SigNoz. Apps you control speak OTLP to it directly. Apps you don't control get adapted at the Collector:

  in-solution .NET worker ──OTLP──┐
  Python job (OTEL SDK) ──OTLP────┤
  PowerShell (OTLP/HTTP) ─────────┤──▶  OpenTelemetry Collector ──▶  SigNoz (on your infra)
  legacy job (.txt files) ─(filelog reads the files)─┘

Think of these jobs on a maturity ladder, from "emits no telemetry at all" at the bottom to "full OpenTelemetry SDK" at the top. The Collector handles every rung:

  1. No telemetry, only log files → the Collector reads the files (its filelog receiver) and turns each line into a log record. This is Case A below.
  2. Can write structured lines or POST a payload, but has no SDK → it writes JSON-lines files, or POSTs OTLP over HTTP. This is Case C, the PowerShell job.
  3. Has a real SDK (Python, Java, Go, Node) → it emits traces, metrics, and logs natively. This is Case B, the Python job.

Where a job sits on the ladder only changes how its signal gets in -- never where it lands. We start with Case A (text files), at the bottom of the ladder, because it is the hardest case and the most common. Then Case B (a real SDK) and Case C (no SDK, but it can POST).

The easy case (for contrast): an in-solution .NET worker

A job you own, written in .NET, is the easy case: observability is a one-liner.

A background worker calls the same AddObservability helper the web apps use, points it at its own WorkerTelemetry class of custom instruments, and turns off the web-server instrumentation. (Full source for AddObservability is in Part 1; for WorkerTelemetry, the appendix.)

builder.AddObservability("worker-jobs", options =>
{
    options.InstrumentAspNetCore = false;            // not a web server
    options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
    options.Meters.Add(WorkerTelemetry.MeterName);
});

That gives you custom spans, custom metrics, and -- because its HttpClient is instrumented -- automatic distributed traces (worker-jobs → backend-api → db). Everything below is what you do when you can't make that one call.

Case A -- a job that only writes .txt files

This is the job you'll meet most often: a legacy exporter, no SDK, no source you can change. It only appends human-readable lines:

2026-05-29 12:00:00 [INFO] run #3: wrote 161 records to dataset
2026-05-29 12:00:21 [ERROR] run #7: export failed: connection reset by peer
    at LegacyExporter.Flush(batchId=7)
    at LegacyExporter.Run()

You collect it with the Collector's filelog receiver, which tails the files and turns each entry into a log record. Here's the config, with each part explained:

receivers:
  filelog/legacy:
    include: [/var/log/legacy/*.txt] # 1. which files (glob handles daily rotation)
    start_at: beginning
    multiline: # 2. keep stack traces together
      line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
    operators:
      - type: regex_parser # 3. split each entry into fields
        regex: '(?s)^(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<sev>\w+)\] (?P<msg>.*)$'
        timestamp: { parse_from: attributes.ts, layout: '%Y-%m-%d %H:%M:%S' }
        severity: { parse_from: attributes.sev }
      - type: move # 4. put the message in the log body
        from: attributes.msg
        to: body

What each numbered part does:

  1. include is a glob, so export-20260529.txt, …30.txt, … are all picked up -- no config change when the date rolls over.
  2. multiline.line_start_pattern marks a new entry only on a line starting with a timestamp -- so a four-line stack trace becomes one record instead of four useless ones.
  3. regex_parser pulls the timestamp, severity, and message out of each line: the timestamp block uses the log's own time (not when the line was read), and severity maps INFO/WARN/ERROR to a level you can filter on. One gotcha: the leading (?s) lets . match across newlines in Go's regex engine -- without it the multiline errors fail to parse.
  4. move promotes the clean message into the log body.

One more step: a file has no service name, so we stamp one with a resource processor and route it through its own logs pipeline:

processors:
  resource/legacy:
    attributes:
      - { key: service.name, value: legacy-batch-job, action: upsert }
      - { key: service.namespace, value: blazor-signoz, action: upsert }

service:
  pipelines:
    logs/filelog:
      receivers: [filelog/legacy]
      processors: [resource/legacy, batch]
      exporters: [otlp/signoz]

(This receiver, processor, and pipeline are part of the full Collector config in Part 1's appendix.)

Now the file-only job groups under legacy-batch-job in SigNoz, right next to your real services -- that multiline error as a single, parsed record:

A single log record in SigNoz's Logs Explorer, from a job that only writes .txt files. The [ERROR] line and its three at … frames are grouped into one record, with log.file.name pointing back to the source file -- all done in the Collector, with no change to the job.

On-prem note -- don't lose your place. The demo uses start_at: beginning on purpose, so you see the existing .txt lines the first time the Collector starts. But on its own that re-reads from the top on every restart (duplicates), while start_at: end skips anything written while the Collector was down (gaps). For production, add a file_storage extension so the read offsets survive restarts:

extensions:
  file_storage: { directory: /var/lib/otelcol/storage }
receivers:
  filelog/legacy: { include: [/var/log/legacy/*.txt], start_at: end, storage: file_storage }

Case B -- Python, with the OpenTelemetry SDK

When the job's language has a real SDK, use it. The Python ETL job produces traces, metrics, and logs identical in shape to the C# services, from just three packages:

opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-grpc

You build one Resource (the job's identity) and share it across the three providers, so everything groups under one service:

resource = Resource.create({"service.name": "python-etl-job", "service.namespace": "blazor-signoz"})

tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")

You set up a meter_provider (for the python.etl.* metrics) and a logger_provider (for logs) in exactly the same shape -- the full script is in the appendix.

Then your work nests spans naturally, so SigNoz shows an extract → transform → load waterfall:

with tracer.start_as_current_span("python.etl.run"):
    with tracer.start_as_current_span("extract"):   ...
    with tracer.start_as_current_span("transform"): ...
    with tracer.start_as_current_span("load"):      ...

One Python ETL run as a trace in SigNoz. The root python.etl.run span (484 ms) nests extract, transform, and load; this is one of the ~15% of runs that fail at load, so that span is red and the header reads Errors: 1. You see the failing stage and its timing without opening a log file on the box.

The one line short-lived jobs must not skip: batch processors buffer telemetry and flush on a timer. A job that exits normally drops whatever is still buffered -- your last spans and logs vanish. Always flush before exit:

finally:
    tracer_provider.shutdown(); meter_provider.shutdown(); logger_provider.shutdown()

This is the #1 cause of "my cron job ran but I see nothing."

Don't want to touch the script? Use zero-code auto-instrumentation instead:

pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
OTEL_SERVICE_NAME=python-etl-job \
  OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:5317 OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
  opentelemetry-instrument python etl_job.py

A note on ports. The Collector listens on two: 5317 for gRPC (a fast binary protocol) and 5318 for OTLP/HTTP. The protocol flag has to match the port. Set OTEL_EXPORTER_OTLP_PROTOCOL=grpc to go with 5317. The auto-instrumentation distro defaults to OTLP/HTTP, so if you keep that default, point it at 5318 instead.

For a job that calls instrumented libraries (HTTP clients, DB drivers, web frameworks), that gives you traces and logs for free. A pure-compute job like this ETL stand-in gets logs but few spans until you add them by hand -- which is what the manual SDK above is for.

Case C -- PowerShell, which has no official SDK

PowerShell has no official OpenTelemetry SDK, so a small dot-sourceable helper (OtelExport.ps1, in the appendix) offers two pragmatic paths.

Option A -- write JSON-lines to a file (most robust). The script just appends one JSON object per line; the Collector's filelog receiver picks it up with a json_parser:

Write-JobLog -Message "Backed up $files files" -Level INFO -Attributes @{ files = $files }
# -> {"time":"2026-05-29T12:00:00...","level":"INFO","msg":"Backed up 231 files","files":231}

The Collector side is a second filelog receiver, just like the legacy one in Case A but with a json_parser instead of a regex -- add it next to filelog/legacy and wire it into the logs pipeline:

receivers:
  filelog/powershell:
    include: [/var/log/powershell/*.log]
    start_at: beginning
    operators:
      - type: json_parser
        timestamp: { parse_from: attributes.time, layout: '%Y-%m-%dT%H:%M:%S.%L%z' }
        severity: { parse_from: attributes.level }
      - type: move # promote the message to the log body
        from: attributes.msg
        to: body

This is the safest option: the script never blocks on the network, and if the Collector is down it catches up later. Best for batch jobs.

Option B -- POST OTLP/HTTP directly (real-time). PowerShell has no SDK, so Send-OtelLog and Send-OtelTrace build the OTLP payload by hand and POST it with Invoke-RestMethod to the Collector's HTTP port (:5318 on your host, mapped to 4318 in Docker). This is OTLP/HTTP with a JSON body -- sometimes called OTLP/JSON. You get real spans as the job runs. Hand-built JSON has four traps, and the helper handles each one:

  • 64-bit timestamps must be quoted strings. timeUnixNano is built as a string (([long]$ms * 1000000).ToString()), because JSON numbers can't safely hold int64.
  • Trace/span IDs are hex, not base64 -- 32 hex chars for the trace, 16 for the span.
  • Enums are integers -- severityNumber (INFO = 9), span kind, status code.
  • ConvertTo-Json -Depth 12 -- the default depth of 2 silently truncates the nested resourceLogs → scopeLogs → logRecords structure.

Which to use? File (Option A) for batch jobs where durability beats latency; OTLP/HTTP (Option B) when you want spans in real time. Doing both costs almost nothing -- the example job does exactly that.

SigNoz's Logs Explorer, filtered to service.name = powershell-backup-job. Each "Backed up N files" line is a structured record POSTed over OTLP/HTTP, with a filterable INFO severity and a files attribute -- from a language with no SDK.

And it is not just logs. The hand-rolled OTLP/JSON from Option B produces a genuine span, so the same script shows up in Traces:

A single trace in SigNoz: one backup span (463 ms) on powershell-backup-job, built by hand in Send-OtelTrace and POSTed as OTLP/JSON. A language with no SDK lands in the same Traces view as the C# services.

See it all in SigNoz

Bring the stack up, generate some traffic, and run the jobs (commands below). The result: a C# worker, a Python script, and a PowerShell script all land in the same Services list, because they share a service.namespace. The .txt-only legacy job emits no spans, so it shows up in Logs rather than the Services list:

SigNoz's Services page. python-etl-job and powershell-backup-job sit right next to backend-api, blazor-frontend, and worker-jobs. (The .txt-only legacy job appears in Logs, not here, since it emits no spans.)

Then:

  • Logs -- filter service.name = legacy-batch-job, then severity ERROR, and open one: the connection reset by peer message and its three at … frames are one grouped record.
  • Traces -- filter service.name = python-etl-job, open a python.etl.run trace, and see the extract → transform → load waterfall (about 15% of runs fail at load, on purpose).
  • Metrics -- chart python.etl.rows_processed for volume, and python.etl.runs grouped by success for the pass/fail split.

What it costs to keep

On-prem flips the cost model: there is no per-GB ingest bill, just disk you already own. The trade is that you decide how long data lives, and a chatty filelog pipeline can fill a disk if you let it.

The biggest lever is retention per signal. Each signal differs in value and volume, so give each its own lifetime instead of one blanket number -- in SigNoz, set it per signal under Settings → General:

  • Logs are the highest-volume signal -- keep them short (say 15 days). The legacy .txt pipeline alone can be noisy.
  • Traces are bursty; a week or two is usually plenty for incident forensics.
  • Metrics are tiny once aggregated -- keep them longest (a quarter or more) for capacity trends and year-over-year comparisons.

The other lever is volume at the source: sample the noisy producers before they reach the Collector -- the legacy .txt pipeline is usually the loudest. Part 2 shows how. (Storage itself is rarely the constraint: ClickHouse compresses telemetry around 10x, so sizing is far cheaper than raw volume suggests.)

Cheat sheet -- which path for which job

The job…UseYou get
Only writes .txt/log files, can't change itCollector filelog receiverLogs (with severity + multiline)
Writes structured lines, no SDKJSON-lines file + json_parserLogs with parsed fields
Can do an HTTP POST, no SDKOTLP/HTTP via Invoke-RestMethodLogs + real-time spans
Has a real OTEL SDK (Python, etc.)The SDK + OTLP exporterTraces + metrics + logs
Is your own .NET workerAddObservability(...)Everything, plus distributed traces

The complete code

Everything for the jobs in this post. The shared AddObservability bootstrap, the full Collector config (which already includes the filelog/legacy receiver shown above), the docker-compose.yml, and the SigNoz install are in Part 1's appendix -- reuse them as-is.

Run it

# 1. Start SigNoz (one-time, self-hosted) -- full install in Part 1
git clone -b main https://github.com/SigNoz/signoz.git
cd signoz/deploy/docker && docker compose up -d        # UI at http://localhost:8080
cd -

# 2. Start the stack (docker-compose.yml + collector from Part 1).
#    The legacy .txt job runs by default and starts filling /var/log/legacy/*.txt
docker compose up -d --build
docker compose --profile jobs up -d --build            # add the Python job

# 3. Run the PowerShell job on your host (PowerShell 7+), pointed at the collector's HTTP port:
$env:OTEL_EXPORTER_OTLP_ENDPOINT='http://localhost:5318'
pwsh ./backup-job.ps1

docker-compose services for the jobs

Add these two services to the docker-compose.yml from Part 1 (the legacy job runs by default; the Python job is behind a jobs profile):

# A job with NO OpenTelemetry awareness -- only writes .txt files. The collector's filelog
# receiver (in Part 1's collector config) reads them. Shares the job-logs volume with the collector.
legacy-job:
  image: alpine:3.20
  command: ['sh', '/opt/job/run-batch.sh']
  environment: { LOG_DIR: /var/log/legacy, INTERVAL_SECONDS: '20' }
  volumes:
    - ./external-jobs/legacy-batch/run-batch.sh:/opt/job/run-batch.sh:ro
    - job-logs:/var/log/legacy
  networks: [blazorsignoz]

python-job:
  build: { context: ./external-jobs/python }
  profiles: ['jobs']
  environment:
    OTEL_SERVICE_NAME: python-etl-job
    OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
    JOB_INTERVAL_SECONDS: '30'
  depends_on: [otel-collector]
  networks: [blazorsignoz]

external-jobs/legacy-batch/run-batch.sh

The stand-in legacy job. No SDK -- just .txt lines, including occasional multi-line stack traces.

#!/usr/bin/env sh
# A stand-in for a legacy batch job that has NO telemetry SDK and cannot be changed:
# it only appends human-readable lines to a .txt log file.
#
# Run locally:   LOG_DIR=./out INTERVAL_SECONDS=5 ./run-batch.sh
set -eu

LOG_DIR="${LOG_DIR:-./out}"
INTERVAL="${INTERVAL_SECONDS:-20}"
mkdir -p "$LOG_DIR"

logfile() { echo "$LOG_DIR/export-$(date '+%Y%m%d').txt"; }

emit() {
  level="$1"; shift
  printf '%s [%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$level" "$*" >> "$(logfile)"
}

emit INFO "legacy batch job started (pid $$)"

count=0
while true; do
  count=$((count + 1))
  records=$(( (count * 37) % 500 + 50 ))

  emit INFO  "run #$count: exporting nightly inventory snapshot"
  emit INFO  "run #$count: wrote $records records to dataset"

  if [ $((count % 4)) -eq 0 ]; then
    emit WARN "run #$count: 3 records skipped (failed validation)"
  fi

  # A multi-line error. The continuation lines do not start with a timestamp, so the
  # collector's multiline rule attaches them to the [ERROR] entry as one record.
  if [ $((count % 7)) -eq 0 ]; then
    file="$(logfile)"
    {
      printf '%s [ERROR] run #%s: export failed: connection reset by peer\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$count"
      printf '    at LegacyExporter.Flush(batchId=%s)\n' "$count"
      printf '    at LegacyExporter.Run()\n'
      printf '    at main()\n'
    } >> "$file"
  fi

  sleep "$INTERVAL"
done

external-jobs/python/requirements.txt

opentelemetry-api>=1.29,<2
opentelemetry-sdk>=1.29,<2
opentelemetry-exporter-otlp-proto-grpc>=1.29,<2

external-jobs/python/etl_job.py

"""
A standalone Python ETL job instrumented with the OpenTelemetry SDK. It exports traces, metrics,
and logs over OTLP to the collector (which forwards to SigNoz).

Env vars:
  OTEL_SERVICE_NAME           default "python-etl-job"
  OTEL_EXPORTER_OTLP_ENDPOINT default "http://localhost:5317" (the demo collector's host port)
  JOB_INTERVAL_SECONDS        0 = run once and exit; >0 = loop forever
"""
import logging
import os
import random
import time

from opentelemetry import metrics, trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import Status, StatusCode

SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "python-etl-job")
ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:5317")
INTERVAL = int(os.getenv("JOB_INTERVAL_SECONDS", "0"))

resource = Resource.create(
    {
        "service.name": SERVICE_NAME,
        "service.namespace": "blazor-signoz",
        "service.instance.id": os.getenv("HOSTNAME", "local"),
    }
)

tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")

metric_reader = PeriodicExportingMetricReader(
    OTLPMetricExporter(endpoint=ENDPOINT, insecure=True),
    export_interval_millis=5000,
)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
metrics.set_meter_provider(meter_provider)
meter = metrics.get_meter("etl_job")
rows_counter = meter.create_counter("python.etl.rows_processed", unit="{row}", description="Rows processed by the ETL job")
runs_counter = meter.create_counter("python.etl.runs", unit="{run}", description="ETL job executions, tagged by outcome")

logger_provider = LoggerProvider(resource=resource)
set_logger_provider(logger_provider)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter(endpoint=ENDPOINT, insecure=True)))
logging.basicConfig(
    level=logging.INFO,
    handlers=[LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider), logging.StreamHandler()],
)
log = logging.getLogger("etl_job")


def run_once(run_id: int) -> None:
    with tracer.start_as_current_span("python.etl.run") as span:
        span.set_attribute("etl.run", run_id)
        log.info("ETL run %s starting", run_id)

        with tracer.start_as_current_span("extract"):
            time.sleep(random.uniform(0.05, 0.25))
            rows = random.randint(100, 1000)

        with tracer.start_as_current_span("transform"):
            time.sleep(random.uniform(0.05, 0.25))

        with tracer.start_as_current_span("load") as load_span:
            time.sleep(random.uniform(0.05, 0.25))
            if random.random() < 0.15:
                load_span.set_status(Status(StatusCode.ERROR, "load failed"))
                log.error("ETL run %s: load step failed", run_id)
                runs_counter.add(1, {"success": "false"})
                return

        rows_counter.add(rows)
        runs_counter.add(1, {"success": "true"})
        span.set_attribute("etl.rows", rows)
        log.info("ETL run %s finished: %s rows processed", run_id, rows)


def main() -> None:
    run_id = 0
    try:
        if INTERVAL > 0:
            log.info("Looping every %ss; exporting to %s", INTERVAL, ENDPOINT)
            while True:
                run_id += 1
                run_once(run_id)
                time.sleep(INTERVAL)
        else:
            run_once(1)
    except KeyboardInterrupt:
        pass
    finally:
        # Critical for short-lived jobs: flush batched telemetry before the process exits.
        tracer_provider.shutdown()
        meter_provider.shutdown()
        logger_provider.shutdown()


if __name__ == "__main__":
    main()

external-jobs/python/Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY etl_job.py .
ENV JOB_INTERVAL_SECONDS=30
ENTRYPOINT ["python", "etl_job.py"]

external-jobs/powershell/OtelExport.ps1

Dot-source this; it provides both PowerShell telemetry paths.

<#
    OtelExport.ps1 -- minimal OpenTelemetry helpers for PowerShell (7+).
    Send-OtelLog / Send-OtelTrace : POST OTLP/HTTP+JSON straight to a collector (host port 5318, mapped to 4318 in Docker).
    Write-JobLog                  : append JSON-lines to a file for the collector's filelog receiver.
    OTLP/JSON gotchas handled: timeUnixNano as quoted strings, hex trace/span ids, integer enums.
#>

function Get-OtelNano {
    $ms = [DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()
    return ([long]$ms * 1000000).ToString()
}

function New-OtelId {
    param([int]$Bytes)
    $buffer = New-Object byte[] $Bytes
    [System.Security.Cryptography.RandomNumberGenerator]::Fill($buffer)
    return (($buffer | ForEach-Object { $_.ToString('x2') }) -join '')
}

function ConvertTo-OtelAttributes {
    param([hashtable]$Attributes)
    $list = @()
    foreach ($key in $Attributes.Keys) {
        $list += @{ key = $key; value = @{ stringValue = [string]$Attributes[$key] } }
    }
    return , $list   # leading comma forces an array even for 0/1 elements
}

function Send-OtelLog {
    param(
        [Parameter(Mandatory)][string]$Message,
        [ValidateSet('TRACE', 'DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')][string]$Severity = 'INFO',
        [string]$Service = 'powershell-job',
        [hashtable]$Attributes = @{},
        [string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
    )
    if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
    $severityNumber = @{ TRACE = 1; DEBUG = 5; INFO = 9; WARN = 13; ERROR = 17; FATAL = 21 }[$Severity]

    $payload = @{
        resourceLogs = @(@{
            resource  = @{ attributes = @(
                @{ key = 'service.name'; value = @{ stringValue = $Service } },
                @{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
            ) }
            scopeLogs = @(@{
                scope      = @{ name = 'powershell' }
                logRecords = @(@{
                    timeUnixNano   = (Get-OtelNano)
                    severityNumber = $severityNumber
                    severityText   = $Severity
                    body           = @{ stringValue = $Message }
                    attributes     = (ConvertTo-OtelAttributes $Attributes)
                })
            })
        })
    }

    $json = $payload | ConvertTo-Json -Depth 12 -Compress
    try {
        Invoke-RestMethod -Uri "$Endpoint/v1/logs" -Method Post -ContentType 'application/json' -Body $json | Out-Null
    }
    catch {
        Write-Warning "OTLP log export failed: $($_.Exception.Message)"
    }
}

function Send-OtelTrace {
    param(
        [Parameter(Mandatory)][string]$Name,
        [int]$DurationMs = 100,
        [ValidateSet('UNSET', 'OK', 'ERROR')][string]$Status = 'OK',
        [string]$Service = 'powershell-job',
        [hashtable]$Attributes = @{},
        [string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
    )
    if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
    $endNano = [long]([DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()) * 1000000
    $startNano = $endNano - ([long]$DurationMs * 1000000)
    $statusCode = @{ UNSET = 0; OK = 1; ERROR = 2 }[$Status]

    $payload = @{
        resourceSpans = @(@{
            resource   = @{ attributes = @(
                @{ key = 'service.name'; value = @{ stringValue = $Service } },
                @{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
            ) }
            scopeSpans = @(@{
                scope = @{ name = 'powershell' }
                spans = @(@{
                    traceId           = (New-OtelId 16)
                    spanId            = (New-OtelId 8)
                    name              = $Name
                    kind              = 1   # INTERNAL
                    startTimeUnixNano = $startNano.ToString()
                    endTimeUnixNano   = $endNano.ToString()
                    attributes        = (ConvertTo-OtelAttributes $Attributes)
                    status            = @{ code = $statusCode }
                })
            })
        })
    }

    $json = $payload | ConvertTo-Json -Depth 12 -Compress
    try {
        Invoke-RestMethod -Uri "$Endpoint/v1/traces" -Method Post -ContentType 'application/json' -Body $json | Out-Null
    }
    catch {
        Write-Warning "OTLP trace export failed: $($_.Exception.Message)"
    }
}

function Write-JobLog {
    param(
        [Parameter(Mandatory)][string]$Message,
        [string]$Level = 'INFO',
        [string]$Path = './out/powershell-job.log',
        [hashtable]$Attributes = @{}
    )
    $dir = Split-Path -Parent $Path
    if ($dir -and -not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }

    $entry = [ordered]@{ time = (Get-Date).ToString('o'); level = $Level; msg = $Message }
    foreach ($key in $Attributes.Keys) { $entry[$key] = $Attributes[$key] }
    ($entry | ConvertTo-Json -Compress) | Add-Content -Path $Path
}

external-jobs/powershell/backup-job.ps1

An example job that uses both paths -- belt and suspenders.

. "$PSScriptRoot/OtelExport.ps1"

$ErrorActionPreference = 'Stop'
$service = 'powershell-backup-job'
$logFile = Join-Path $PSScriptRoot 'out/powershell-job.log'
$start = Get-Date

Send-OtelLog -Service $service -Severity INFO -Message 'Backup job started' -Attributes @{ host = $env:COMPUTERNAME }
Write-JobLog -Path $logFile -Level INFO -Message 'Backup job started (file path)'

try {
    Start-Sleep -Milliseconds 400
    $files = Get-Random -Minimum 50 -Maximum 500

    Send-OtelLog -Service $service -Severity INFO -Message "Backed up $files files" -Attributes @{ files = $files }
    Write-JobLog -Path $logFile -Level INFO -Message "Backed up $files files" -Attributes @{ files = $files }

    $duration = [int]((Get-Date) - $start).TotalMilliseconds
    Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status OK -Attributes @{ files = $files }
    Write-Host "Backup complete: $files files in ${duration}ms"
}
catch {
    $message = $_.Exception.Message
    Send-OtelLog -Service $service -Severity ERROR -Message "Backup failed: $message"
    Write-JobLog -Path $logFile -Level ERROR -Message "Backup failed: $message"
    $duration = [int]((Get-Date) - $start).TotalMilliseconds
    Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status ERROR
    throw
}

The in-solution .NET worker (the easy case)

For completeness, here is the worker that gets full telemetry from the shared bootstrap. It's a plain Microsoft.NET.Sdk.Worker app with a <ProjectReference> to Shared.Telemetry.

src/Worker.Jobs/Program.cs

using Shared.Telemetry;
using Worker.Jobs.Jobs;
using Worker.Jobs.Telemetry;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddSingleton<WorkerTelemetry>();

var backendBaseUrl = builder.Configuration["Backend:BaseUrl"] ?? "http://localhost:5081";
builder.Services.AddHttpClient("backend", client => client.BaseAddress = new Uri(backendBaseUrl));

builder.Services.AddHostedService<InventoryReconciliationJob>();

// Same shared bootstrap as the web apps. A worker is not a web server, so ASP.NET Core
// instrumentation is off; HttpClient + runtime instrumentation stay on.
builder.AddObservability("worker-jobs", options =>
{
    options.InstrumentAspNetCore = false;
    options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
    options.Meters.Add(WorkerTelemetry.MeterName);
});

var host = builder.Build();
host.Run();

src/Worker.Jobs/Telemetry/WorkerTelemetry.cs

using System.Diagnostics;
using System.Diagnostics.Metrics;

namespace Worker.Jobs.Telemetry;

public sealed class WorkerTelemetry : IDisposable
{
    public const string ActivitySourceName = "Worker.Jobs";
    public const string MeterName = "Worker.Jobs";

    public static readonly ActivitySource ActivitySource = new(ActivitySourceName);

    private readonly Meter _meter = new(MeterName, "1.0.0");
    private readonly Counter<long> _jobRuns;
    private readonly Histogram<double> _jobDuration;
    private readonly Counter<long> _itemsProcessed;

    public WorkerTelemetry()
    {
        _jobRuns = _meter.CreateCounter<long>("worker.job.runs", unit: "{run}",
            description: "Number of background job executions, tagged by job name and outcome.");
        _jobDuration = _meter.CreateHistogram<double>("worker.job.duration", unit: "ms",
            description: "Duration of background job executions.");
        _itemsProcessed = _meter.CreateCounter<long>("worker.job.items_processed", unit: "{item}",
            description: "Items processed by background jobs.");
    }

    public Activity? StartActivity(string name) => ActivitySource.StartActivity(name, ActivityKind.Internal);

    public void RecordRun(string jobName, bool success, TimeSpan duration, int itemsProcessed)
    {
        var tags = new TagList { { "job.name", jobName }, { "success", success } };
        _jobRuns.Add(1, tags);
        _jobDuration.Record(duration.TotalMilliseconds, tags);
        if (itemsProcessed > 0) _itemsProcessed.Add(itemsProcessed, new TagList { { "job.name", jobName } });
    }

    public void Dispose() => _meter.Dispose();
}

src/Worker.Jobs/Jobs/InventoryReconciliationJob.cs

using System.Diagnostics;
using System.Net.Http;
using System.Net.Http.Json;
using Worker.Jobs.Telemetry;

namespace Worker.Jobs.Jobs;

/// <summary>A periodic job. Each run starts a root span, calls the backend over an instrumented
/// HttpClient (→ worker-jobs → backend-api → db in SigNoz), logs, and records run metrics.</summary>
public sealed class InventoryReconciliationJob(
    IHttpClientFactory httpClientFactory,
    WorkerTelemetry telemetry,
    IConfiguration configuration,
    ILogger<InventoryReconciliationJob> logger) : BackgroundService
{
    private const string JobName = "inventory-reconciliation";

    private readonly TimeSpan _interval =
        TimeSpan.FromSeconds(Math.Clamp(configuration.GetValue("Worker:IntervalSeconds", 15), 1, 3600));

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        logger.LogInformation("{JobName} starting; interval {IntervalSeconds}s", JobName, _interval.TotalSeconds);

        try { await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken); }   // let the backend come up
        catch (OperationCanceledException) { return; }

        // Run once immediately, then start the timer so its clock begins AFTER the warm-up run.
        await RunOnceAsync(stoppingToken);

        using var timer = new PeriodicTimer(_interval);
        while (await WaitForNextTickAsync(timer, stoppingToken))
        {
            await RunOnceAsync(stoppingToken);
        }
    }

    private async Task RunOnceAsync(CancellationToken ct)
    {
        using var activity = telemetry.StartActivity($"job.{JobName}");
        activity?.SetTag("job.name", JobName);
        var stopwatch = Stopwatch.StartNew();

        try
        {
            var client = httpClientFactory.CreateClient("backend");
            var stats = await client.GetFromJsonAsync<ProductStats>("/api/products/stats", ct);
            var count = stats?.TotalCount ?? 0;

            activity?.SetTag("job.items", count);
            logger.LogInformation("Reconciled {ProductCount} products (inventory value {InventoryValue})",
                count, stats?.InventoryValue);

            telemetry.RecordRun(JobName, success: true, stopwatch.Elapsed, itemsProcessed: count);
        }
        catch (OperationCanceledException) when (ct.IsCancellationRequested)
        {
            return;   // graceful shutdown -- not a failure
        }
        catch (Exception ex)
        {
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            logger.LogError(ex, "{JobName} run failed", JobName);
            telemetry.RecordRun(JobName, success: false, stopwatch.Elapsed, itemsProcessed: 0);
        }
    }

    private static async Task<bool> WaitForNextTickAsync(PeriodicTimer timer, CancellationToken ct)
    {
        try { return await timer.WaitForNextTickAsync(ct); }
        catch (OperationCanceledException) { return false; }
    }

    private sealed record ProductStats(int TotalCount, int TotalQuantity, decimal InventoryValue, decimal AveragePrice);
}

Wrapping up

The lesson is the same across every case: the Collector is the seam. Apps you own export OTLP directly. Apps you don't own get adapted at the Collector, with whichever mechanism fits the job:

  • a filelog receiver for plain text,
  • a resource processor to give those files an identity,
  • a json_parser for structured lines,
  • and OTLP/HTTP for anything that can POST.

Stamp a consistent service.name and service.namespace, and every job -- C#, Python, PowerShell, or a decade-old batch script -- shows up side by side in a dashboard you host yourself.

Next, Part 4 -- instrument now, collect later goes under the hood to the instrumentation every post here relies on -- ActivitySource, Meter, and ILogger -- and shows you can write it all, and read it back, before committing to any backend. Or revisit Part 1 -- Blazor Server observability (which carries the shared foundation code) and Part 2 -- the C# API with Postgres or SQL Server, or jump back to the series index.

💼Open for consulting

I take on consulting and delivery work across .NET and React — on my own or alongside a trusted group of senior engineers I work with. Together we can build, untangle and modernize your software:

  • Building ASP.NET / Blazor / C# / WPF apps with Postgres / ClickHouse
  • Untangling, refactoring & modernizing legacy ASP.NET, C#, Blazor and WPF into a modern stack (modular monolith C# + React)
  • Cloud & on-premise DevOps: Azure DevOps, CI/CD pipelines and automation
  • Observability & analytics — in the cloud and on-premise
  • On-premise migrations
  • Scaling up delivery with experienced .NET, backend and React engineers, plus technical leadership