Published on

On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz

Authors
  • avatar
    Name
    Konrad Bartecki
    Twitter

On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz

Bring every cron script, legacy EXE, PowerShell task, and Python job into one dashboard you host yourself -- without sending a single byte to a SaaS vendor.

Part 3 of 3 in a series on observing a .NET system with OpenTelemetry and SigNoz (series index). It reuses the Collector, Compose file, and SigNoz install from Part 1.

Web apps are the easy part of observability. The hard part is everything else that keeps your business running after hours: a nightly cron job you inherited, a Python ETL a data team owns, a PowerShell backup task, a 12-year-old exporter that writes nothing but export-20260529.txt. You can't dotnet add package your way out of those -- and they're often exactly the jobs that fail silently at 3 a.m.

This post shows how to get all of them into SigNoz, running entirely on your own infrastructure. By the end you'll be able to observe a job no matter where it sits on the spectrum -- from "only writes text files" to "full OpenTelemetry SDK." Every script and config is included at the end -- there's no repo to clone.

What you'll learn

  • Why on-prem observability is the right fit for background jobs
  • How the OpenTelemetry Collector acts as a universal on-ramp for anything
  • Case A: collect a job that only writes .txt files (no code changes)
  • Case B: instrument a Python job with the OpenTelemetry SDK
  • Case C: get telemetry out of PowerShell, which has no official SDK
  • How they all show up side by side in SigNoz

Why on-prem?

SigNoz is fully open-source and self-hosted, which matters more for background jobs than for almost anything else:

  • Your data stays in your network. Batch jobs touch your most sensitive data -- financial exports, PII, backups. With self-hosted SigNoz, the telemetry about that work never leaves your infrastructure. That's a real advantage in regulated or air-gapped environments where shipping logs to a SaaS is a non-starter.
  • No per-GB surprise bill. Jobs are noisy -- verbose logs, high-frequency runs. On a usage-priced SaaS, job logs are where the bill explodes. Self-hosted, the cost is the box it runs on.
  • It works where the jobs work. Plenty of these jobs run on an on-prem server or a locked-down VM with no outbound internet. The Collector and SigNoz run right there next to them.

Everything in this post runs locally: the apps and the Collector in your network, and SigNoz storing data in its own ClickHouse database on your infrastructure. Nothing egresses.

The one idea: the Collector is the on-ramp

Here's the whole mental model. The OpenTelemetry Collector is a small service that ingests telemetry from many sources and forwards it to SigNoz. Apps you control speak OTLP to it directly. Apps you don't control get adapted at the Collector:

  in-solution .NET worker ──OTLP──┐
  Python job (OTEL SDK) ──OTLP────┤
  PowerShell (OTLP/HTTP) ─────────┤──▶  OpenTelemetry Collector ──▶  SigNoz (on your infra)
  legacy job (.txt files) ─(filelog reads the files)─┘

Think of jobs on a maturity ladder, and the Collector handles every rung:

  1. No telemetry, only log files → the Collector reads the files (filelog receiver) and turns each line into a log record.
  2. Can write structured lines or POST a payload, but no SDK → JSON-lines files, or OTLP/HTTP over the wire.
  3. Has a real SDK (Python, Java, Go, Node) → traces + metrics + logs natively.

Where a job sits only changes how its signal gets in -- never where it lands. Let's start at the bottom of the ladder, which is the hardest and most common.

The easy case (for contrast): an in-solution .NET worker

If you own the code and it's .NET, it's a one-liner. A background worker calls the same AddObservability helper the web apps use (full source in Part 1), points it at its own WorkerTelemetry class of custom instruments (full source in the appendix), and just turns off the web-server instrumentation:

builder.AddObservability("worker-jobs", options =>
{
    options.InstrumentAspNetCore = false;            // not a web server
    options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
    options.Meters.Add(WorkerTelemetry.MeterName);
});

That's the gold standard: custom spans, custom metrics, and -- because its HttpClient is instrumented -- automatic distributed traces (worker-jobs → backend-api → db). Everything below is what you do when you can't make that one call.

Case A -- a job that only writes .txt files

This is the job you'll meet most often: a legacy exporter, no SDK, no source you can change. It only appends human-readable lines:

2026-05-29 12:00:00 [INFO] run #3: wrote 161 records to dataset
2026-05-29 12:00:21 [ERROR] run #7: export failed: connection reset by peer
    at LegacyExporter.Flush(batchId=7)
    at LegacyExporter.Run()

You collect it with the Collector's filelog receiver, which tails the files and turns each entry into a log record. Here's the config, with each part explained:

receivers:
  filelog/legacy:
    include: [/var/log/legacy/*.txt] # 1. which files (glob handles daily rotation)
    start_at: beginning
    multiline: # 2. keep stack traces together
      line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
    operators:
      - type: regex_parser # 3. split each entry into fields
        regex: '(?s)^(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<sev>\w+)\] (?P<msg>.*)$'
        timestamp: { parse_from: attributes.ts, layout: '%Y-%m-%d %H:%M:%S' }
        severity: { parse_from: attributes.sev }
      - type: move # 4. put the message in the log body
        from: attributes.msg
        to: body

What each numbered part does:

  1. include is a glob, so export-20260529.txt, …30.txt, … are all picked up -- no config change when the date rolls over.
  2. multiline.line_start_pattern says "a new log entry begins only on a line that starts with a timestamp." Without it, a four-line stack trace becomes four useless records. With it, the [ERROR] line and its at … frames become one record. This is the single most important setting for legacy logs.
  3. regex_parser pulls out the timestamp, severity, and message. The leading (?s) matters: Go's regex engine needs it so . can span newlines -- without it the multiline error entries fail to parse. The timestamp block makes SigNoz use the log's own time, and severity maps INFO/WARN/ERROR so severity filtering works.
  4. move promotes the clean message into the log body.

One more step: a file has no service name, so we stamp one with a resource processor and route it through its own logs pipeline:

processors:
  resource/legacy:
    attributes:
      - { key: service.name, value: legacy-batch-job, action: upsert }
      - { key: service.namespace, value: blazor-signoz, action: upsert }

service:
  pipelines:
    logs/filelog:
      receivers: [filelog/legacy]
      processors: [resource/legacy, batch]
      exporters: [otlp/signoz]

(This receiver, processor, and pipeline are part of the full Collector config in Part 1's appendix.)

Now the file-only job groups under legacy-batch-job in SigNoz, right next to your real services. Here's the payoff -- that multiline error as a single, parsed record:

A job that only writes .txt files, in SigNoz. The [ERROR] line and its three at … stack frames are one record body; severity is parsed to ERROR, the timestamp is the log's own, and log.file.name points back to the source file -- all done in the Collector, with zero changes to the job.

On-prem note -- don't lose your place. The demo uses start_at: beginning on purpose, so you see the existing .txt lines the first time the Collector starts. But on its own that re-reads from the top on every restart (duplicates), while start_at: end skips anything written while the Collector was down (gaps). For production, add a file_storage extension so the read offsets survive restarts:

extensions:
  file_storage: { directory: /var/lib/otelcol/storage }
receivers:
  filelog/legacy: { include: [/var/log/legacy/*.txt], start_at: end, storage: file_storage }

Case B -- Python, with the OpenTelemetry SDK

When the job's language has a real SDK, use it. The Python ETL job produces traces, metrics, and logs identical in shape to the C# services, from just three packages:

opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-grpc

You build one Resource (the job's identity) and share it across the three providers, so everything groups under one service:

resource = Resource.create({"service.name": "python-etl-job", "service.namespace": "blazor-signoz"})

tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")

You set up a meter_provider (for the python.etl.* metrics) and a logger_provider (for logs) in exactly the same shape -- the full script is in the appendix.

Then your work nests spans naturally, so SigNoz shows an extract → transform → load waterfall:

with tracer.start_as_current_span("python.etl.run"):
    with tracer.start_as_current_span("extract"):   ...
    with tracer.start_as_current_span("transform"): ...
    with tracer.start_as_current_span("load"):      ...

One ETL run as a trace. The root python.etl.run span (484 ms) contains extract, transform, and load, exactly as the with blocks nest them. This run is one of the roughly 15% that fail at load -- the span is red and the header reads Errors: 1, so you can see the stage and the timing of the failure without opening a log file on the box. A Python script you can read top to bottom, shown the same way a distributed microservice trace would be.

The one line short-lived jobs must not skip: batch processors buffer telemetry and flush on a timer. A job that exits normally drops whatever is still buffered -- your last spans and logs vanish. Always flush before exit:

finally:
    tracer_provider.shutdown(); meter_provider.shutdown(); logger_provider.shutdown()

This is the #1 cause of "my cron job ran but I see nothing."

Don't want to touch the script? Use zero-code auto-instrumentation instead:

pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
OTEL_SERVICE_NAME=python-etl-job \
  OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:5317 OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
  opentelemetry-instrument python etl_job.py

(Set OTEL_EXPORTER_OTLP_PROTOCOL=grpc because 5317 is the Collector's gRPC port; the auto-instrumentation distro defaults to OTLP/HTTP, which would need the 5318 port instead.)

That gives you traces and logs for free; drop to the manual SDK only when you want custom spans like extract/transform/load and custom metrics.

Case C -- PowerShell, which has no official SDK

PowerShell has no official OpenTelemetry SDK, so a small dot-sourceable helper (OtelExport.ps1, in the appendix) offers two pragmatic paths.

Option A -- write JSON-lines to a file (most robust). The script just appends one JSON object per line; the Collector's filelog receiver picks it up with a json_parser:

Write-JobLog -Message "Backed up $files files" -Level INFO -Attributes @{ files = $files }
# -> {"time":"2026-05-29T12:00:00...","level":"INFO","msg":"Backed up 231 files","files":231}

The Collector side is a second filelog receiver, just like the legacy one in Case A but with a json_parser instead of a regex -- add it next to filelog/legacy and wire it into the logs pipeline:

receivers:
  filelog/powershell:
    include: [/var/log/powershell/*.log]
    start_at: beginning
    operators:
      - type: json_parser
        timestamp: { parse_from: attributes.time, layout: '%Y-%m-%dT%H:%M:%S.%L%z' }
        severity: { parse_from: attributes.level }
      - type: move # promote the message to the log body
        from: attributes.msg
        to: body

This is the safest option: the script never blocks on the network, and if the Collector is down it catches up later. Best for batch jobs.

Option B -- POST OTLP/HTTP directly (real-time). Send-OtelLog / Send-OtelTrace POST JSON straight to the Collector's HTTP port (:4318) with Invoke-RestMethod. You get real spans as the job runs -- but OTLP/JSON has sharp edges, and the helper handles each one:

  • 64-bit timestamps must be quoted strings. timeUnixNano is built as a string (([long]$ms * 1000000).ToString()), because JSON numbers can't safely hold int64.
  • Trace/span IDs are hex, not base64 -- 32 hex chars for the trace, 16 for the span.
  • Enums are integers -- severityNumber (INFO = 9), span kind, status code.
  • ConvertTo-Json -Depth 12 -- the default depth of 2 silently truncates the nested resourceLogs → scopeLogs → logRecords structure.

Which to use? File (Option A) for batch jobs where durability beats latency; OTLP/HTTP (Option B) when you want spans in real time. Doing both costs almost nothing -- the example job does exactly that.

The backup job's logs in SigNoz, filtered to service.name = powershell-backup-job. Each "Backed up N files" line is a structured record POSTed straight from PowerShell over OTLP/HTTP, with severity (INFO) and the files attribute preserved -- not a flat text line. The service shows up in the left-hand filter right next to backend-api, blazor-frontend, and python-etl-job, even though PowerShell has no SDK.

And it is not just logs. The hand-rolled OTLP/JSON from Option B produces a genuine span, so the same script shows up in Traces:

A real distributed-tracing span emitted by a PowerShell script. The backup span (463 ms) is the whole job, on service powershell-backup-job, built by hand in Send-OtelTrace and POSTed as OTLP/JSON -- hex IDs, quoted nanosecond timestamps and all. It is a single span here because the job does one unit of work, but nothing stops you from nesting child spans the same way the Python ETL does. The point: even a language with no SDK lands in the same Traces view as the C# services.

See it all in SigNoz

Bring the stack up, generate some traffic, and run the jobs (commands below). The payoff: a C# worker, a Python script, and a PowerShell script all land in the same Services list (they share a service.namespace), and the .txt-only legacy job shows up in Logs:

python-etl-job and powershell-backup-job sit in the Services list right next to backend-api, blazor-frontend, and worker-jobs. (The .txt-only legacy job appears in Logs, not the APM list, since it emits no spans.)

Then:

  • Logs -- filter service.name = legacy-batch-job, then severity ERROR, and open one: the connection reset by peer message and its three at … frames are one grouped record.
  • Traces -- filter service.name = python-etl-job, open a python.etl.run trace, and see the extract → transform → load waterfall (about 15% of runs fail at load, on purpose).
  • Metrics -- chart python.etl.rows_processed and python.etl.runs grouped by success.

What it costs to keep

On-prem flips the cost model: there is no per-GB ingest bill, just disk you already own. The trade is that you decide how long data lives, and a chatty filelog pipeline can fill a disk if you let it.

Retention in SigNoz is set per signal under Settings → General (it becomes a ClickHouse TTL under the hood). Tune each signal to its value and volume:

  • Logs are the highest-volume signal -- keep them short (say 15 days). The legacy .txt pipeline alone can be noisy.
  • Traces are bursty; a week or two is usually plenty for incident forensics.
  • Metrics are tiny once aggregated -- keep them longest (a quarter or more) for capacity trends and year-over-year comparisons.

ClickHouse compresses telemetry hard (often around 10x), so sizing is far cheaper than raw volume suggests, but the discipline is the same as any self-hosted store: set retention deliberately, watch the disk, and sample the noisy sources (see Part 2's note on sampling) before they become the problem.

Cheat sheet -- which path for which job

The job…UseYou get
Only writes .txt/log files, can't change itCollector filelog receiverLogs (with severity + multiline)
Writes structured lines, no SDKJSON-lines file + json_parserLogs with parsed fields
Can do an HTTP POST, no SDKOTLP/HTTP via Invoke-RestMethodLogs + real-time spans
Has a real OTEL SDK (Python, etc.)The SDK + OTLP exporterTraces + metrics + logs
Is your own .NET workerAddObservability(...)Everything, plus distributed traces

The complete code

Everything for the jobs in this post. The shared AddObservability bootstrap, the full Collector config (which already includes the filelog/legacy receiver shown above), the docker-compose.yml, and the SigNoz install are in Part 1's appendix -- reuse them as-is.

Run it

# 1. Start SigNoz (one-time, self-hosted) -- full install in Part 1
git clone -b main https://github.com/SigNoz/signoz.git
cd signoz/deploy/docker && docker compose up -d        # UI at http://localhost:8080
cd -

# 2. Start the stack (docker-compose.yml + collector from Part 1).
#    The legacy .txt job runs by default and starts filling /var/log/legacy/*.txt
docker compose up -d --build
docker compose --profile jobs up -d --build            # add the Python job

# 3. Run the PowerShell job on your host (PowerShell 7+), pointed at the collector's HTTP port:
$env:OTEL_EXPORTER_OTLP_ENDPOINT='http://localhost:5318'
pwsh ./backup-job.ps1

docker-compose services for the jobs

Add these two services to the docker-compose.yml from Part 1 (the legacy job runs by default; the Python job is behind a jobs profile):

# A job with NO OpenTelemetry awareness -- only writes .txt files. The collector's filelog
# receiver (in Part 1's collector config) reads them. Shares the job-logs volume with the collector.
legacy-job:
  image: alpine:3.20
  command: ['sh', '/opt/job/run-batch.sh']
  environment: { LOG_DIR: /var/log/legacy, INTERVAL_SECONDS: '20' }
  volumes:
    - ./external-jobs/legacy-batch/run-batch.sh:/opt/job/run-batch.sh:ro
    - job-logs:/var/log/legacy
  networks: [blazorsignoz]

python-job:
  build: { context: ./external-jobs/python }
  profiles: ['jobs']
  environment:
    OTEL_SERVICE_NAME: python-etl-job
    OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
    JOB_INTERVAL_SECONDS: '30'
  depends_on: [otel-collector]
  networks: [blazorsignoz]

external-jobs/legacy-batch/run-batch.sh

The stand-in legacy job. No SDK -- just .txt lines, including occasional multi-line stack traces.

#!/usr/bin/env sh
# A stand-in for a legacy batch job that has NO telemetry SDK and cannot be changed:
# it only appends human-readable lines to a .txt log file.
#
# Run locally:   LOG_DIR=./out INTERVAL_SECONDS=5 ./run-batch.sh
set -eu

LOG_DIR="${LOG_DIR:-./out}"
INTERVAL="${INTERVAL_SECONDS:-20}"
mkdir -p "$LOG_DIR"

logfile() { echo "$LOG_DIR/export-$(date '+%Y%m%d').txt"; }

emit() {
  level="$1"; shift
  printf '%s [%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$level" "$*" >> "$(logfile)"
}

emit INFO "legacy batch job started (pid $$)"

count=0
while true; do
  count=$((count + 1))
  records=$(( (count * 37) % 500 + 50 ))

  emit INFO  "run #$count: exporting nightly inventory snapshot"
  emit INFO  "run #$count: wrote $records records to dataset"

  if [ $((count % 4)) -eq 0 ]; then
    emit WARN "run #$count: 3 records skipped (failed validation)"
  fi

  # A multi-line error. The continuation lines do not start with a timestamp, so the
  # collector's multiline rule attaches them to the [ERROR] entry as one record.
  if [ $((count % 7)) -eq 0 ]; then
    file="$(logfile)"
    {
      printf '%s [ERROR] run #%s: export failed: connection reset by peer\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$count"
      printf '    at LegacyExporter.Flush(batchId=%s)\n' "$count"
      printf '    at LegacyExporter.Run()\n'
      printf '    at main()\n'
    } >> "$file"
  fi

  sleep "$INTERVAL"
done

external-jobs/python/requirements.txt

opentelemetry-api>=1.29,<2
opentelemetry-sdk>=1.29,<2
opentelemetry-exporter-otlp-proto-grpc>=1.29,<2

external-jobs/python/etl_job.py

"""
A standalone Python ETL job instrumented with the OpenTelemetry SDK. It exports traces, metrics,
and logs over OTLP to the collector (which forwards to SigNoz).

Env vars:
  OTEL_SERVICE_NAME           default "python-etl-job"
  OTEL_EXPORTER_OTLP_ENDPOINT default "http://localhost:5317" (the demo collector's host port)
  JOB_INTERVAL_SECONDS        0 = run once and exit; >0 = loop forever
"""
import logging
import os
import random
import time

from opentelemetry import metrics, trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import Status, StatusCode

SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "python-etl-job")
ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:5317")
INTERVAL = int(os.getenv("JOB_INTERVAL_SECONDS", "0"))

resource = Resource.create(
    {
        "service.name": SERVICE_NAME,
        "service.namespace": "blazor-signoz",
        "service.instance.id": os.getenv("HOSTNAME", "local"),
    }
)

tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")

metric_reader = PeriodicExportingMetricReader(
    OTLPMetricExporter(endpoint=ENDPOINT, insecure=True),
    export_interval_millis=5000,
)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
metrics.set_meter_provider(meter_provider)
meter = metrics.get_meter("etl_job")
rows_counter = meter.create_counter("python.etl.rows_processed", unit="{row}", description="Rows processed by the ETL job")
runs_counter = meter.create_counter("python.etl.runs", unit="{run}", description="ETL job executions, tagged by outcome")

logger_provider = LoggerProvider(resource=resource)
set_logger_provider(logger_provider)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter(endpoint=ENDPOINT, insecure=True)))
logging.basicConfig(
    level=logging.INFO,
    handlers=[LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider), logging.StreamHandler()],
)
log = logging.getLogger("etl_job")


def run_once(run_id: int) -> None:
    with tracer.start_as_current_span("python.etl.run") as span:
        span.set_attribute("etl.run", run_id)
        log.info("ETL run %s starting", run_id)

        with tracer.start_as_current_span("extract"):
            time.sleep(random.uniform(0.05, 0.25))
            rows = random.randint(100, 1000)

        with tracer.start_as_current_span("transform"):
            time.sleep(random.uniform(0.05, 0.25))

        with tracer.start_as_current_span("load") as load_span:
            time.sleep(random.uniform(0.05, 0.25))
            if random.random() < 0.15:
                load_span.set_status(Status(StatusCode.ERROR, "load failed"))
                log.error("ETL run %s: load step failed", run_id)
                runs_counter.add(1, {"success": "false"})
                return

        rows_counter.add(rows)
        runs_counter.add(1, {"success": "true"})
        span.set_attribute("etl.rows", rows)
        log.info("ETL run %s finished: %s rows processed", run_id, rows)


def main() -> None:
    run_id = 0
    try:
        if INTERVAL > 0:
            log.info("Looping every %ss; exporting to %s", INTERVAL, ENDPOINT)
            while True:
                run_id += 1
                run_once(run_id)
                time.sleep(INTERVAL)
        else:
            run_once(1)
    except KeyboardInterrupt:
        pass
    finally:
        # Critical for short-lived jobs: flush batched telemetry before the process exits.
        tracer_provider.shutdown()
        meter_provider.shutdown()
        logger_provider.shutdown()


if __name__ == "__main__":
    main()

external-jobs/python/Dockerfile

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY etl_job.py .
ENV JOB_INTERVAL_SECONDS=30
ENTRYPOINT ["python", "etl_job.py"]

external-jobs/powershell/OtelExport.ps1

Dot-source this; it provides both PowerShell telemetry paths.

<#
    OtelExport.ps1 -- minimal OpenTelemetry helpers for PowerShell (7+).
    Send-OtelLog / Send-OtelTrace : POST OTLP/HTTP+JSON straight to a collector (port 4318).
    Write-JobLog                  : append JSON-lines to a file for the collector's filelog receiver.
    OTLP/JSON gotchas handled: timeUnixNano as quoted strings, hex trace/span ids, integer enums.
#>

function Get-OtelNano {
    $ms = [DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()
    return ([long]$ms * 1000000).ToString()
}

function New-OtelId {
    param([int]$Bytes)
    $buffer = New-Object byte[] $Bytes
    [System.Security.Cryptography.RandomNumberGenerator]::Fill($buffer)
    return (($buffer | ForEach-Object { $_.ToString('x2') }) -join '')
}

function ConvertTo-OtelAttributes {
    param([hashtable]$Attributes)
    $list = @()
    foreach ($key in $Attributes.Keys) {
        $list += @{ key = $key; value = @{ stringValue = [string]$Attributes[$key] } }
    }
    return , $list   # leading comma forces an array even for 0/1 elements
}

function Send-OtelLog {
    param(
        [Parameter(Mandatory)][string]$Message,
        [ValidateSet('TRACE', 'DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')][string]$Severity = 'INFO',
        [string]$Service = 'powershell-job',
        [hashtable]$Attributes = @{},
        [string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
    )
    if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
    $severityNumber = @{ TRACE = 1; DEBUG = 5; INFO = 9; WARN = 13; ERROR = 17; FATAL = 21 }[$Severity]

    $payload = @{
        resourceLogs = @(@{
            resource  = @{ attributes = @(
                @{ key = 'service.name'; value = @{ stringValue = $Service } },
                @{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
            ) }
            scopeLogs = @(@{
                scope      = @{ name = 'powershell' }
                logRecords = @(@{
                    timeUnixNano   = (Get-OtelNano)
                    severityNumber = $severityNumber
                    severityText   = $Severity
                    body           = @{ stringValue = $Message }
                    attributes     = (ConvertTo-OtelAttributes $Attributes)
                })
            })
        })
    }

    $json = $payload | ConvertTo-Json -Depth 12 -Compress
    try {
        Invoke-RestMethod -Uri "$Endpoint/v1/logs" -Method Post -ContentType 'application/json' -Body $json | Out-Null
    }
    catch {
        Write-Warning "OTLP log export failed: $($_.Exception.Message)"
    }
}

function Send-OtelTrace {
    param(
        [Parameter(Mandatory)][string]$Name,
        [int]$DurationMs = 100,
        [ValidateSet('UNSET', 'OK', 'ERROR')][string]$Status = 'OK',
        [string]$Service = 'powershell-job',
        [hashtable]$Attributes = @{},
        [string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
    )
    if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
    $endNano = [long]([DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()) * 1000000
    $startNano = $endNano - ([long]$DurationMs * 1000000)
    $statusCode = @{ UNSET = 0; OK = 1; ERROR = 2 }[$Status]

    $payload = @{
        resourceSpans = @(@{
            resource   = @{ attributes = @(
                @{ key = 'service.name'; value = @{ stringValue = $Service } },
                @{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
            ) }
            scopeSpans = @(@{
                scope = @{ name = 'powershell' }
                spans = @(@{
                    traceId           = (New-OtelId 16)
                    spanId            = (New-OtelId 8)
                    name              = $Name
                    kind              = 1   # INTERNAL
                    startTimeUnixNano = $startNano.ToString()
                    endTimeUnixNano   = $endNano.ToString()
                    attributes        = (ConvertTo-OtelAttributes $Attributes)
                    status            = @{ code = $statusCode }
                })
            })
        })
    }

    $json = $payload | ConvertTo-Json -Depth 12 -Compress
    try {
        Invoke-RestMethod -Uri "$Endpoint/v1/traces" -Method Post -ContentType 'application/json' -Body $json | Out-Null
    }
    catch {
        Write-Warning "OTLP trace export failed: $($_.Exception.Message)"
    }
}

function Write-JobLog {
    param(
        [Parameter(Mandatory)][string]$Message,
        [string]$Level = 'INFO',
        [string]$Path = './out/powershell-job.log',
        [hashtable]$Attributes = @{}
    )
    $dir = Split-Path -Parent $Path
    if ($dir -and -not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }

    $entry = [ordered]@{ time = (Get-Date).ToString('o'); level = $Level; msg = $Message }
    foreach ($key in $Attributes.Keys) { $entry[$key] = $Attributes[$key] }
    ($entry | ConvertTo-Json -Compress) | Add-Content -Path $Path
}

external-jobs/powershell/backup-job.ps1

An example job that uses both paths -- belt and suspenders.

. "$PSScriptRoot/OtelExport.ps1"

$ErrorActionPreference = 'Stop'
$service = 'powershell-backup-job'
$logFile = Join-Path $PSScriptRoot 'out/powershell-job.log'
$start = Get-Date

Send-OtelLog -Service $service -Severity INFO -Message 'Backup job started' -Attributes @{ host = $env:COMPUTERNAME }
Write-JobLog -Path $logFile -Level INFO -Message 'Backup job started (file path)'

try {
    Start-Sleep -Milliseconds 400
    $files = Get-Random -Minimum 50 -Maximum 500

    Send-OtelLog -Service $service -Severity INFO -Message "Backed up $files files" -Attributes @{ files = $files }
    Write-JobLog -Path $logFile -Level INFO -Message "Backed up $files files" -Attributes @{ files = $files }

    $duration = [int]((Get-Date) - $start).TotalMilliseconds
    Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status OK -Attributes @{ files = $files }
    Write-Host "Backup complete: $files files in ${duration}ms"
}
catch {
    $message = $_.Exception.Message
    Send-OtelLog -Service $service -Severity ERROR -Message "Backup failed: $message"
    Write-JobLog -Path $logFile -Level ERROR -Message "Backup failed: $message"
    $duration = [int]((Get-Date) - $start).TotalMilliseconds
    Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status ERROR
    throw
}

The in-solution .NET worker (the easy case)

For completeness, here is the worker that gets full telemetry from the shared bootstrap. It's a plain Microsoft.NET.Sdk.Worker app with a <ProjectReference> to Shared.Telemetry.

src/Worker.Jobs/Program.cs

using Shared.Telemetry;
using Worker.Jobs.Jobs;
using Worker.Jobs.Telemetry;

var builder = Host.CreateApplicationBuilder(args);

builder.Services.AddSingleton<WorkerTelemetry>();

var backendBaseUrl = builder.Configuration["Backend:BaseUrl"] ?? "http://localhost:5081";
builder.Services.AddHttpClient("backend", client => client.BaseAddress = new Uri(backendBaseUrl));

builder.Services.AddHostedService<InventoryReconciliationJob>();

// Same shared bootstrap as the web apps. A worker is not a web server, so ASP.NET Core
// instrumentation is off; HttpClient + runtime instrumentation stay on.
builder.AddObservability("worker-jobs", options =>
{
    options.InstrumentAspNetCore = false;
    options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
    options.Meters.Add(WorkerTelemetry.MeterName);
});

var host = builder.Build();
host.Run();

src/Worker.Jobs/Telemetry/WorkerTelemetry.cs

using System.Diagnostics;
using System.Diagnostics.Metrics;

namespace Worker.Jobs.Telemetry;

public sealed class WorkerTelemetry : IDisposable
{
    public const string ActivitySourceName = "Worker.Jobs";
    public const string MeterName = "Worker.Jobs";

    public static readonly ActivitySource ActivitySource = new(ActivitySourceName);

    private readonly Meter _meter = new(MeterName, "1.0.0");
    private readonly Counter<long> _jobRuns;
    private readonly Histogram<double> _jobDuration;
    private readonly Counter<long> _itemsProcessed;

    public WorkerTelemetry()
    {
        _jobRuns = _meter.CreateCounter<long>("worker.job.runs", unit: "{run}",
            description: "Number of background job executions, tagged by job name and outcome.");
        _jobDuration = _meter.CreateHistogram<double>("worker.job.duration", unit: "ms",
            description: "Duration of background job executions.");
        _itemsProcessed = _meter.CreateCounter<long>("worker.job.items_processed", unit: "{item}",
            description: "Items processed by background jobs.");
    }

    public Activity? StartActivity(string name) => ActivitySource.StartActivity(name, ActivityKind.Internal);

    public void RecordRun(string jobName, bool success, TimeSpan duration, int itemsProcessed)
    {
        var tags = new TagList { { "job.name", jobName }, { "success", success } };
        _jobRuns.Add(1, tags);
        _jobDuration.Record(duration.TotalMilliseconds, tags);
        if (itemsProcessed > 0) _itemsProcessed.Add(itemsProcessed, new TagList { { "job.name", jobName } });
    }

    public void Dispose() => _meter.Dispose();
}

src/Worker.Jobs/Jobs/InventoryReconciliationJob.cs

using System.Diagnostics;
using System.Net.Http;
using System.Net.Http.Json;
using Worker.Jobs.Telemetry;

namespace Worker.Jobs.Jobs;

/// <summary>A periodic job. Each run starts a root span, calls the backend over an instrumented
/// HttpClient (→ worker-jobs → backend-api → db in SigNoz), logs, and records run metrics.</summary>
public sealed class InventoryReconciliationJob(
    IHttpClientFactory httpClientFactory,
    WorkerTelemetry telemetry,
    IConfiguration configuration,
    ILogger<InventoryReconciliationJob> logger) : BackgroundService
{
    private const string JobName = "inventory-reconciliation";

    private readonly TimeSpan _interval =
        TimeSpan.FromSeconds(Math.Clamp(configuration.GetValue("Worker:IntervalSeconds", 15), 1, 3600));

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        logger.LogInformation("{JobName} starting; interval {IntervalSeconds}s", JobName, _interval.TotalSeconds);

        try { await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken); }   // let the backend come up
        catch (OperationCanceledException) { return; }

        // Run once immediately, then start the timer so its clock begins AFTER the warm-up run.
        await RunOnceAsync(stoppingToken);

        using var timer = new PeriodicTimer(_interval);
        while (await WaitForNextTickAsync(timer, stoppingToken))
        {
            await RunOnceAsync(stoppingToken);
        }
    }

    private async Task RunOnceAsync(CancellationToken ct)
    {
        using var activity = telemetry.StartActivity($"job.{JobName}");
        activity?.SetTag("job.name", JobName);
        var stopwatch = Stopwatch.StartNew();

        try
        {
            var client = httpClientFactory.CreateClient("backend");
            var stats = await client.GetFromJsonAsync<ProductStats>("/api/products/stats", ct);
            var count = stats?.TotalCount ?? 0;

            activity?.SetTag("job.items", count);
            logger.LogInformation("Reconciled {ProductCount} products (inventory value {InventoryValue})",
                count, stats?.InventoryValue);

            telemetry.RecordRun(JobName, success: true, stopwatch.Elapsed, itemsProcessed: count);
        }
        catch (OperationCanceledException) when (ct.IsCancellationRequested)
        {
            return;   // graceful shutdown -- not a failure
        }
        catch (Exception ex)
        {
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            logger.LogError(ex, "{JobName} run failed", JobName);
            telemetry.RecordRun(JobName, success: false, stopwatch.Elapsed, itemsProcessed: 0);
        }
    }

    private static async Task<bool> WaitForNextTickAsync(PeriodicTimer timer, CancellationToken ct)
    {
        try { return await timer.WaitForNextTickAsync(ct); }
        catch (OperationCanceledException) { return false; }
    }

    private sealed record ProductStats(int TotalCount, int TotalQuantity, decimal InventoryValue, decimal AveragePrice);
}

Wrapping up

The lesson is the same across every case: the Collector is the seam. Apps you own export OTLP; apps you don't own get adapted at the Collector -- a filelog receiver for text, a resource processor to give files an identity, a json_parser for structured lines, OTLP/HTTP for anything that can POST. Stamp a consistent service.name / service.namespace, and every job -- C#, Python, PowerShell, or a decade-old batch script -- shows up side by side in a dashboard you host yourself.

For the rest of the series: Part 1 -- Blazor Server observability (which also carries the shared foundation code) and Part 2 -- full-stack observability for a C# API with Postgres/SQL Server, or jump back to the series index.

💼Open for consulting

I take on consulting and delivery work across .NET and React — on my own or alongside a trusted group of senior engineers I work with. Together we can build, untangle and modernize your software:

  • Building ASP.NET / Blazor / C# / WPF apps with Postgres / ClickHouse
  • Untangling, refactoring & modernizing legacy ASP.NET, C#, Blazor and WPF into a modern stack (modular monolith C# + React)
  • Cloud & on-premise DevOps: Azure DevOps, CI/CD pipelines and automation
  • Observability & analytics — in the cloud and on-premise
  • On-premise migrations
  • Scaling up delivery with experienced .NET, backend and React engineers, plus technical leadership