- Published on
Part 3 — On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz
Part 3 — On-Prem Observability for Background Jobs with OpenTelemetry and SigNoz
Bring every cron script, legacy EXE, PowerShell task, and Python job into one dashboard you host yourself.
Part 3 of 4 in a series on observing a .NET system with OpenTelemetry and SigNoz (series index). It reuses the Collector, Compose file, and SigNoz install from Part 1.
Web apps are the easy part of observability. The hard part is everything else that keeps your business running after hours: a nightly cron job you inherited, a Python ETL (extract, transform, load) job a data team owns, a PowerShell backup task, a 12-year-old exporter that writes nothing but export-20260529.txt. You can't dotnet add package your way out of those. And they're often exactly the jobs that fail silently at 3 a.m.
This post shows how to get all of them into SigNoz, running entirely on your own infrastructure. By the end you'll be able to observe a job no matter where it sits on the spectrum -- from "only writes text files" to "full OpenTelemetry SDK." Every job-specific script is included at the end; the shared Collector config, Compose file, and SigNoz install come from Part 1's appendix. There's no repo to clone.
What you'll learn
- Why on-prem observability is the right fit for background jobs
- How the OpenTelemetry Collector acts as a universal on-ramp for anything
- Case A: collect a job that only writes
.txtfiles (no code changes) - Case B: instrument a Python job with the OpenTelemetry SDK
- Case C: get telemetry out of PowerShell, which has no official SDK
- How they all show up side by side in SigNoz
First, the words you'll see
This post builds on the stack from Part 1, so a few terms come up again and again. If you read Part 1, this is review.
- OTLP -- OpenTelemetry's wire protocol. Your apps and jobs speak OTLP to a Collector, which forwards it to a backend like SigNoz. (Part 1 has the longer explanation.)
- The Collector -- a small service that takes telemetry in from many sources and forwards it to SigNoz. It is the one piece every job in this post funnels through.
- The
filelogreceiver -- the part of the Collector that tails text files and turns each line into a log record. It is how you collect a job that can't speak OTLP at all.
Why on-prem?
SigNoz is fully open-source and self-hosted, which matters more for background jobs than for almost anything else:
- Your data stays in your network. Batch jobs touch your most sensitive data -- financial exports, PII, backups. With self-hosted SigNoz, the telemetry about that work never leaves your infrastructure. That's a real advantage in regulated or air-gapped environments where shipping logs to a SaaS is a non-starter.
- No per-GB surprise bill. Jobs are noisy -- verbose logs, high-frequency runs. On a usage-priced SaaS, job logs are where the bill explodes. Self-hosted, the cost is the box it runs on.
- It works where the jobs work. Plenty of these jobs run on an on-prem server or a locked-down VM with no outbound internet. The Collector and SigNoz run right there next to them.
Everything in this post runs locally: the apps and the Collector in your network, and SigNoz storing data in its own ClickHouse database on your infrastructure.
The one idea: the Collector is the on-ramp
The OpenTelemetry Collector is a small service that ingests telemetry from many sources and forwards it to SigNoz. Apps you control speak OTLP to it directly. Apps you don't control get adapted at the Collector:
in-solution .NET worker ──OTLP──┐
Python job (OTEL SDK) ──OTLP────┤
PowerShell (OTLP/HTTP) ─────────┤──▶ OpenTelemetry Collector ──▶ SigNoz (on your infra)
legacy job (.txt files) ─(filelog reads the files)─┘
Think of these jobs on a maturity ladder, from "emits no telemetry at all" at the bottom to "full OpenTelemetry SDK" at the top. The Collector handles every rung:
- No telemetry, only log files → the Collector reads the files (its
filelogreceiver) and turns each line into a log record. This is Case A below. - Can write structured lines or POST a payload, but has no SDK → it writes JSON-lines files, or POSTs OTLP over HTTP. This is Case C, the PowerShell job.
- Has a real SDK (Python, Java, Go, Node) → it emits traces, metrics, and logs natively. This is Case B, the Python job.
Where a job sits on the ladder only changes how its signal gets in -- never where it lands. We start with Case A (text files), at the bottom of the ladder, because it is the hardest case and the most common. Then Case B (a real SDK) and Case C (no SDK, but it can POST).
The easy case (for contrast): an in-solution .NET worker
A job you own, written in .NET, is the easy case: observability is a one-liner.
A background worker calls the same AddObservability helper the web apps use, points it at its own WorkerTelemetry class of custom instruments, and turns off the web-server instrumentation. (Full source for AddObservability is in Part 1; for WorkerTelemetry, the appendix.)
builder.AddObservability("worker-jobs", options =>
{
options.InstrumentAspNetCore = false; // not a web server
options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
options.Meters.Add(WorkerTelemetry.MeterName);
});
That gives you custom spans, custom metrics, and -- because its HttpClient is instrumented -- automatic distributed traces (worker-jobs → backend-api → db). Everything below is what you do when you can't make that one call.
Case A -- a job that only writes .txt files
This is the job you'll meet most often: a legacy exporter, no SDK, no source you can change. It only appends human-readable lines:
2026-05-29 12:00:00 [INFO] run #3: wrote 161 records to dataset
2026-05-29 12:00:21 [ERROR] run #7: export failed: connection reset by peer
at LegacyExporter.Flush(batchId=7)
at LegacyExporter.Run()
You collect it with the Collector's filelog receiver, which tails the files and turns each entry into a log record. Here's the config, with each part explained:
receivers:
filelog/legacy:
include: [/var/log/legacy/*.txt] # 1. which files (glob handles daily rotation)
start_at: beginning
multiline: # 2. keep stack traces together
line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
operators:
- type: regex_parser # 3. split each entry into fields
regex: '(?s)^(?P<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<sev>\w+)\] (?P<msg>.*)$'
timestamp: { parse_from: attributes.ts, layout: '%Y-%m-%d %H:%M:%S' }
severity: { parse_from: attributes.sev }
- type: move # 4. put the message in the log body
from: attributes.msg
to: body
What each numbered part does:
includeis a glob, soexport-20260529.txt,…30.txt, … are all picked up -- no config change when the date rolls over.multiline.line_start_patternmarks a new entry only on a line starting with a timestamp -- so a four-line stack trace becomes one record instead of four useless ones.regex_parserpulls the timestamp, severity, and message out of each line: thetimestampblock uses the log's own time (not when the line was read), andseveritymapsINFO/WARN/ERRORto a level you can filter on. One gotcha: the leading(?s)lets.match across newlines in Go's regex engine -- without it the multiline errors fail to parse.movepromotes the clean message into the log body.
One more step: a file has no service name, so we stamp one with a resource processor and route it through its own logs pipeline:
processors:
resource/legacy:
attributes:
- { key: service.name, value: legacy-batch-job, action: upsert }
- { key: service.namespace, value: blazor-signoz, action: upsert }
service:
pipelines:
logs/filelog:
receivers: [filelog/legacy]
processors: [resource/legacy, batch]
exporters: [otlp/signoz]
(This receiver, processor, and pipeline are part of the full Collector config in Part 1's appendix.)
Now the file-only job groups under legacy-batch-job in SigNoz, right next to your real services -- that multiline error as a single, parsed record:
A single log record in SigNoz's Logs Explorer, from a job that only writes .txt files. The [ERROR] line and its three at … frames are grouped into one record, with log.file.name pointing back to the source file -- all done in the Collector, with no change to the job.
On-prem note -- don't lose your place. The demo uses
start_at: beginningon purpose, so you see the existing.txtlines the first time the Collector starts. But on its own that re-reads from the top on every restart (duplicates), whilestart_at: endskips anything written while the Collector was down (gaps). For production, add afile_storageextension so the read offsets survive restarts:extensions: file_storage: { directory: /var/lib/otelcol/storage } receivers: filelog/legacy: { include: [/var/log/legacy/*.txt], start_at: end, storage: file_storage }
Case B -- Python, with the OpenTelemetry SDK
When the job's language has a real SDK, use it. The Python ETL job produces traces, metrics, and logs identical in shape to the C# services, from just three packages:
opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-grpc
You build one Resource (the job's identity) and share it across the three providers, so everything groups under one service:
resource = Resource.create({"service.name": "python-etl-job", "service.namespace": "blazor-signoz"})
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")
You set up a meter_provider (for the python.etl.* metrics) and a logger_provider (for logs) in exactly the same shape -- the full script is in the appendix.
Then your work nests spans naturally, so SigNoz shows an extract → transform → load waterfall:
with tracer.start_as_current_span("python.etl.run"):
with tracer.start_as_current_span("extract"): ...
with tracer.start_as_current_span("transform"): ...
with tracer.start_as_current_span("load"): ...
One Python ETL run as a trace in SigNoz. The root python.etl.run span (484 ms) nests extract, transform, and load; this is one of the ~15% of runs that fail at load, so that span is red and the header reads Errors: 1. You see the failing stage and its timing without opening a log file on the box.
The one line short-lived jobs must not skip: batch processors buffer telemetry and flush on a timer. A job that exits normally drops whatever is still buffered -- your last spans and logs vanish. Always flush before exit:
finally: tracer_provider.shutdown(); meter_provider.shutdown(); logger_provider.shutdown()This is the #1 cause of "my cron job ran but I see nothing."
Don't want to touch the script? Use zero-code auto-instrumentation instead:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
OTEL_SERVICE_NAME=python-etl-job \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:5317 OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
opentelemetry-instrument python etl_job.py
A note on ports. The Collector listens on two:
5317for gRPC (a fast binary protocol) and5318for OTLP/HTTP. The protocol flag has to match the port. SetOTEL_EXPORTER_OTLP_PROTOCOL=grpcto go with5317. The auto-instrumentation distro defaults to OTLP/HTTP, so if you keep that default, point it at5318instead.
For a job that calls instrumented libraries (HTTP clients, DB drivers, web frameworks), that gives you traces and logs for free. A pure-compute job like this ETL stand-in gets logs but few spans until you add them by hand -- which is what the manual SDK above is for.
Case C -- PowerShell, which has no official SDK
PowerShell has no official OpenTelemetry SDK, so a small dot-sourceable helper (OtelExport.ps1, in the appendix) offers two pragmatic paths.
Option A -- write JSON-lines to a file (most robust). The script just appends one JSON object per line; the Collector's filelog receiver picks it up with a json_parser:
Write-JobLog -Message "Backed up $files files" -Level INFO -Attributes @{ files = $files }
# -> {"time":"2026-05-29T12:00:00...","level":"INFO","msg":"Backed up 231 files","files":231}
The Collector side is a second filelog receiver, just like the legacy one in Case A but with a json_parser instead of a regex -- add it next to filelog/legacy and wire it into the logs pipeline:
receivers:
filelog/powershell:
include: [/var/log/powershell/*.log]
start_at: beginning
operators:
- type: json_parser
timestamp: { parse_from: attributes.time, layout: '%Y-%m-%dT%H:%M:%S.%L%z' }
severity: { parse_from: attributes.level }
- type: move # promote the message to the log body
from: attributes.msg
to: body
This is the safest option: the script never blocks on the network, and if the Collector is down it catches up later. Best for batch jobs.
Option B -- POST OTLP/HTTP directly (real-time). PowerShell has no SDK, so Send-OtelLog and Send-OtelTrace build the OTLP payload by hand and POST it with Invoke-RestMethod to the Collector's HTTP port (:5318 on your host, mapped to 4318 in Docker). This is OTLP/HTTP with a JSON body -- sometimes called OTLP/JSON. You get real spans as the job runs. Hand-built JSON has four traps, and the helper handles each one:
- 64-bit timestamps must be quoted strings.
timeUnixNanois built as a string (([long]$ms * 1000000).ToString()), because JSON numbers can't safely hold int64. - Trace/span IDs are hex, not base64 -- 32 hex chars for the trace, 16 for the span.
- Enums are integers --
severityNumber(INFO = 9), spankind, statuscode. ConvertTo-Json -Depth 12-- the default depth of 2 silently truncates the nestedresourceLogs → scopeLogs → logRecordsstructure.
Which to use? File (Option A) for batch jobs where durability beats latency; OTLP/HTTP (Option B) when you want spans in real time. Doing both costs almost nothing -- the example job does exactly that.
SigNoz's Logs Explorer, filtered to service.name = powershell-backup-job. Each "Backed up N files" line is a structured record POSTed over OTLP/HTTP, with a filterable INFO severity and a files attribute -- from a language with no SDK.
And it is not just logs. The hand-rolled OTLP/JSON from Option B produces a genuine span, so the same script shows up in Traces:
A single trace in SigNoz: one backup span (463 ms) on powershell-backup-job, built by hand in Send-OtelTrace and POSTed as OTLP/JSON. A language with no SDK lands in the same Traces view as the C# services.
See it all in SigNoz
Bring the stack up, generate some traffic, and run the jobs (commands below). The result: a C# worker, a Python script, and a PowerShell script all land in the same Services list, because they share a service.namespace. The .txt-only legacy job emits no spans, so it shows up in Logs rather than the Services list:
SigNoz's Services page. python-etl-job and powershell-backup-job sit right next to backend-api, blazor-frontend, and worker-jobs. (The .txt-only legacy job appears in Logs, not here, since it emits no spans.)
Then:
- Logs -- filter
service.name = legacy-batch-job, then severityERROR, and open one: theconnection reset by peermessage and its threeat …frames are one grouped record. - Traces -- filter
service.name = python-etl-job, open apython.etl.runtrace, and see theextract → transform → loadwaterfall (about 15% of runs fail atload, on purpose). - Metrics -- chart
python.etl.rows_processedfor volume, andpython.etl.runsgrouped bysuccessfor the pass/fail split.
What it costs to keep
On-prem flips the cost model: there is no per-GB ingest bill, just disk you already own. The trade is that you decide how long data lives, and a chatty filelog pipeline can fill a disk if you let it.
The biggest lever is retention per signal. Each signal differs in value and volume, so give each its own lifetime instead of one blanket number -- in SigNoz, set it per signal under Settings → General:
- Logs are the highest-volume signal -- keep them short (say 15 days). The legacy
.txtpipeline alone can be noisy. - Traces are bursty; a week or two is usually plenty for incident forensics.
- Metrics are tiny once aggregated -- keep them longest (a quarter or more) for capacity trends and year-over-year comparisons.
The other lever is volume at the source: sample the noisy producers before they reach the Collector -- the legacy .txt pipeline is usually the loudest. Part 2 shows how. (Storage itself is rarely the constraint: ClickHouse compresses telemetry around 10x, so sizing is far cheaper than raw volume suggests.)
Cheat sheet -- which path for which job
| The job… | Use | You get |
|---|---|---|
Only writes .txt/log files, can't change it | Collector filelog receiver | Logs (with severity + multiline) |
| Writes structured lines, no SDK | JSON-lines file + json_parser | Logs with parsed fields |
| Can do an HTTP POST, no SDK | OTLP/HTTP via Invoke-RestMethod | Logs + real-time spans |
| Has a real OTEL SDK (Python, etc.) | The SDK + OTLP exporter | Traces + metrics + logs |
| Is your own .NET worker | AddObservability(...) | Everything, plus distributed traces |
The complete code
Everything for the jobs in this post. The shared AddObservability bootstrap, the full Collector config (which already includes the filelog/legacy receiver shown above), the docker-compose.yml, and the SigNoz install are in Part 1's appendix -- reuse them as-is.
Run it
# 1. Start SigNoz (one-time, self-hosted) -- full install in Part 1
git clone -b main https://github.com/SigNoz/signoz.git
cd signoz/deploy/docker && docker compose up -d # UI at http://localhost:8080
cd -
# 2. Start the stack (docker-compose.yml + collector from Part 1).
# The legacy .txt job runs by default and starts filling /var/log/legacy/*.txt
docker compose up -d --build
docker compose --profile jobs up -d --build # add the Python job
# 3. Run the PowerShell job on your host (PowerShell 7+), pointed at the collector's HTTP port:
$env:OTEL_EXPORTER_OTLP_ENDPOINT='http://localhost:5318'
pwsh ./backup-job.ps1
docker-compose services for the jobs
Add these two services to the docker-compose.yml from Part 1 (the legacy job runs by default; the Python job is behind a jobs profile):
# A job with NO OpenTelemetry awareness -- only writes .txt files. The collector's filelog
# receiver (in Part 1's collector config) reads them. Shares the job-logs volume with the collector.
legacy-job:
image: alpine:3.20
command: ['sh', '/opt/job/run-batch.sh']
environment: { LOG_DIR: /var/log/legacy, INTERVAL_SECONDS: '20' }
volumes:
- ./external-jobs/legacy-batch/run-batch.sh:/opt/job/run-batch.sh:ro
- job-logs:/var/log/legacy
networks: [blazorsignoz]
python-job:
build: { context: ./external-jobs/python }
profiles: ['jobs']
environment:
OTEL_SERVICE_NAME: python-etl-job
OTEL_EXPORTER_OTLP_ENDPOINT: http://otel-collector:4317
JOB_INTERVAL_SECONDS: '30'
depends_on: [otel-collector]
networks: [blazorsignoz]
external-jobs/legacy-batch/run-batch.sh
The stand-in legacy job. No SDK -- just .txt lines, including occasional multi-line stack traces.
#!/usr/bin/env sh
# A stand-in for a legacy batch job that has NO telemetry SDK and cannot be changed:
# it only appends human-readable lines to a .txt log file.
#
# Run locally: LOG_DIR=./out INTERVAL_SECONDS=5 ./run-batch.sh
set -eu
LOG_DIR="${LOG_DIR:-./out}"
INTERVAL="${INTERVAL_SECONDS:-20}"
mkdir -p "$LOG_DIR"
logfile() { echo "$LOG_DIR/export-$(date '+%Y%m%d').txt"; }
emit() {
level="$1"; shift
printf '%s [%s] %s\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$level" "$*" >> "$(logfile)"
}
emit INFO "legacy batch job started (pid $$)"
count=0
while true; do
count=$((count + 1))
records=$(( (count * 37) % 500 + 50 ))
emit INFO "run #$count: exporting nightly inventory snapshot"
emit INFO "run #$count: wrote $records records to dataset"
if [ $((count % 4)) -eq 0 ]; then
emit WARN "run #$count: 3 records skipped (failed validation)"
fi
# A multi-line error. The continuation lines do not start with a timestamp, so the
# collector's multiline rule attaches them to the [ERROR] entry as one record.
if [ $((count % 7)) -eq 0 ]; then
file="$(logfile)"
{
printf '%s [ERROR] run #%s: export failed: connection reset by peer\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$count"
printf ' at LegacyExporter.Flush(batchId=%s)\n' "$count"
printf ' at LegacyExporter.Run()\n'
printf ' at main()\n'
} >> "$file"
fi
sleep "$INTERVAL"
done
external-jobs/python/requirements.txt
opentelemetry-api>=1.29,<2
opentelemetry-sdk>=1.29,<2
opentelemetry-exporter-otlp-proto-grpc>=1.29,<2
external-jobs/python/etl_job.py
"""
A standalone Python ETL job instrumented with the OpenTelemetry SDK. It exports traces, metrics,
and logs over OTLP to the collector (which forwards to SigNoz).
Env vars:
OTEL_SERVICE_NAME default "python-etl-job"
OTEL_EXPORTER_OTLP_ENDPOINT default "http://localhost:5317" (the demo collector's host port)
JOB_INTERVAL_SECONDS 0 = run once and exit; >0 = loop forever
"""
import logging
import os
import random
import time
from opentelemetry import metrics, trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import Status, StatusCode
SERVICE_NAME = os.getenv("OTEL_SERVICE_NAME", "python-etl-job")
ENDPOINT = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:5317")
INTERVAL = int(os.getenv("JOB_INTERVAL_SECONDS", "0"))
resource = Resource.create(
{
"service.name": SERVICE_NAME,
"service.namespace": "blazor-signoz",
"service.instance.id": os.getenv("HOSTNAME", "local"),
}
)
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint=ENDPOINT, insecure=True)))
trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer("etl_job")
metric_reader = PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint=ENDPOINT, insecure=True),
export_interval_millis=5000,
)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
metrics.set_meter_provider(meter_provider)
meter = metrics.get_meter("etl_job")
rows_counter = meter.create_counter("python.etl.rows_processed", unit="{row}", description="Rows processed by the ETL job")
runs_counter = meter.create_counter("python.etl.runs", unit="{run}", description="ETL job executions, tagged by outcome")
logger_provider = LoggerProvider(resource=resource)
set_logger_provider(logger_provider)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter(endpoint=ENDPOINT, insecure=True)))
logging.basicConfig(
level=logging.INFO,
handlers=[LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider), logging.StreamHandler()],
)
log = logging.getLogger("etl_job")
def run_once(run_id: int) -> None:
with tracer.start_as_current_span("python.etl.run") as span:
span.set_attribute("etl.run", run_id)
log.info("ETL run %s starting", run_id)
with tracer.start_as_current_span("extract"):
time.sleep(random.uniform(0.05, 0.25))
rows = random.randint(100, 1000)
with tracer.start_as_current_span("transform"):
time.sleep(random.uniform(0.05, 0.25))
with tracer.start_as_current_span("load") as load_span:
time.sleep(random.uniform(0.05, 0.25))
if random.random() < 0.15:
load_span.set_status(Status(StatusCode.ERROR, "load failed"))
log.error("ETL run %s: load step failed", run_id)
runs_counter.add(1, {"success": "false"})
return
rows_counter.add(rows)
runs_counter.add(1, {"success": "true"})
span.set_attribute("etl.rows", rows)
log.info("ETL run %s finished: %s rows processed", run_id, rows)
def main() -> None:
run_id = 0
try:
if INTERVAL > 0:
log.info("Looping every %ss; exporting to %s", INTERVAL, ENDPOINT)
while True:
run_id += 1
run_once(run_id)
time.sleep(INTERVAL)
else:
run_once(1)
except KeyboardInterrupt:
pass
finally:
# Critical for short-lived jobs: flush batched telemetry before the process exits.
tracer_provider.shutdown()
meter_provider.shutdown()
logger_provider.shutdown()
if __name__ == "__main__":
main()
external-jobs/python/Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY etl_job.py .
ENV JOB_INTERVAL_SECONDS=30
ENTRYPOINT ["python", "etl_job.py"]
external-jobs/powershell/OtelExport.ps1
Dot-source this; it provides both PowerShell telemetry paths.
<#
OtelExport.ps1 -- minimal OpenTelemetry helpers for PowerShell (7+).
Send-OtelLog / Send-OtelTrace : POST OTLP/HTTP+JSON straight to a collector (host port 5318, mapped to 4318 in Docker).
Write-JobLog : append JSON-lines to a file for the collector's filelog receiver.
OTLP/JSON gotchas handled: timeUnixNano as quoted strings, hex trace/span ids, integer enums.
#>
function Get-OtelNano {
$ms = [DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()
return ([long]$ms * 1000000).ToString()
}
function New-OtelId {
param([int]$Bytes)
$buffer = New-Object byte[] $Bytes
[System.Security.Cryptography.RandomNumberGenerator]::Fill($buffer)
return (($buffer | ForEach-Object { $_.ToString('x2') }) -join '')
}
function ConvertTo-OtelAttributes {
param([hashtable]$Attributes)
$list = @()
foreach ($key in $Attributes.Keys) {
$list += @{ key = $key; value = @{ stringValue = [string]$Attributes[$key] } }
}
return , $list # leading comma forces an array even for 0/1 elements
}
function Send-OtelLog {
param(
[Parameter(Mandatory)][string]$Message,
[ValidateSet('TRACE', 'DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')][string]$Severity = 'INFO',
[string]$Service = 'powershell-job',
[hashtable]$Attributes = @{},
[string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
)
if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
$severityNumber = @{ TRACE = 1; DEBUG = 5; INFO = 9; WARN = 13; ERROR = 17; FATAL = 21 }[$Severity]
$payload = @{
resourceLogs = @(@{
resource = @{ attributes = @(
@{ key = 'service.name'; value = @{ stringValue = $Service } },
@{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
) }
scopeLogs = @(@{
scope = @{ name = 'powershell' }
logRecords = @(@{
timeUnixNano = (Get-OtelNano)
severityNumber = $severityNumber
severityText = $Severity
body = @{ stringValue = $Message }
attributes = (ConvertTo-OtelAttributes $Attributes)
})
})
})
}
$json = $payload | ConvertTo-Json -Depth 12 -Compress
try {
Invoke-RestMethod -Uri "$Endpoint/v1/logs" -Method Post -ContentType 'application/json' -Body $json | Out-Null
}
catch {
Write-Warning "OTLP log export failed: $($_.Exception.Message)"
}
}
function Send-OtelTrace {
param(
[Parameter(Mandatory)][string]$Name,
[int]$DurationMs = 100,
[ValidateSet('UNSET', 'OK', 'ERROR')][string]$Status = 'OK',
[string]$Service = 'powershell-job',
[hashtable]$Attributes = @{},
[string]$Endpoint = $env:OTEL_EXPORTER_OTLP_ENDPOINT
)
if (-not $Endpoint) { $Endpoint = 'http://localhost:5318' }
$endNano = [long]([DateTimeOffset]::UtcNow.ToUnixTimeMilliseconds()) * 1000000
$startNano = $endNano - ([long]$DurationMs * 1000000)
$statusCode = @{ UNSET = 0; OK = 1; ERROR = 2 }[$Status]
$payload = @{
resourceSpans = @(@{
resource = @{ attributes = @(
@{ key = 'service.name'; value = @{ stringValue = $Service } },
@{ key = 'service.namespace'; value = @{ stringValue = 'blazor-signoz' } }
) }
scopeSpans = @(@{
scope = @{ name = 'powershell' }
spans = @(@{
traceId = (New-OtelId 16)
spanId = (New-OtelId 8)
name = $Name
kind = 1 # INTERNAL
startTimeUnixNano = $startNano.ToString()
endTimeUnixNano = $endNano.ToString()
attributes = (ConvertTo-OtelAttributes $Attributes)
status = @{ code = $statusCode }
})
})
})
}
$json = $payload | ConvertTo-Json -Depth 12 -Compress
try {
Invoke-RestMethod -Uri "$Endpoint/v1/traces" -Method Post -ContentType 'application/json' -Body $json | Out-Null
}
catch {
Write-Warning "OTLP trace export failed: $($_.Exception.Message)"
}
}
function Write-JobLog {
param(
[Parameter(Mandatory)][string]$Message,
[string]$Level = 'INFO',
[string]$Path = './out/powershell-job.log',
[hashtable]$Attributes = @{}
)
$dir = Split-Path -Parent $Path
if ($dir -and -not (Test-Path $dir)) { New-Item -ItemType Directory -Path $dir -Force | Out-Null }
$entry = [ordered]@{ time = (Get-Date).ToString('o'); level = $Level; msg = $Message }
foreach ($key in $Attributes.Keys) { $entry[$key] = $Attributes[$key] }
($entry | ConvertTo-Json -Compress) | Add-Content -Path $Path
}
external-jobs/powershell/backup-job.ps1
An example job that uses both paths -- belt and suspenders.
. "$PSScriptRoot/OtelExport.ps1"
$ErrorActionPreference = 'Stop'
$service = 'powershell-backup-job'
$logFile = Join-Path $PSScriptRoot 'out/powershell-job.log'
$start = Get-Date
Send-OtelLog -Service $service -Severity INFO -Message 'Backup job started' -Attributes @{ host = $env:COMPUTERNAME }
Write-JobLog -Path $logFile -Level INFO -Message 'Backup job started (file path)'
try {
Start-Sleep -Milliseconds 400
$files = Get-Random -Minimum 50 -Maximum 500
Send-OtelLog -Service $service -Severity INFO -Message "Backed up $files files" -Attributes @{ files = $files }
Write-JobLog -Path $logFile -Level INFO -Message "Backed up $files files" -Attributes @{ files = $files }
$duration = [int]((Get-Date) - $start).TotalMilliseconds
Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status OK -Attributes @{ files = $files }
Write-Host "Backup complete: $files files in ${duration}ms"
}
catch {
$message = $_.Exception.Message
Send-OtelLog -Service $service -Severity ERROR -Message "Backup failed: $message"
Write-JobLog -Path $logFile -Level ERROR -Message "Backup failed: $message"
$duration = [int]((Get-Date) - $start).TotalMilliseconds
Send-OtelTrace -Service $service -Name 'backup' -DurationMs $duration -Status ERROR
throw
}
The in-solution .NET worker (the easy case)
For completeness, here is the worker that gets full telemetry from the shared bootstrap. It's a plain Microsoft.NET.Sdk.Worker app with a <ProjectReference> to Shared.Telemetry.
src/Worker.Jobs/Program.cs
using Shared.Telemetry;
using Worker.Jobs.Jobs;
using Worker.Jobs.Telemetry;
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddSingleton<WorkerTelemetry>();
var backendBaseUrl = builder.Configuration["Backend:BaseUrl"] ?? "http://localhost:5081";
builder.Services.AddHttpClient("backend", client => client.BaseAddress = new Uri(backendBaseUrl));
builder.Services.AddHostedService<InventoryReconciliationJob>();
// Same shared bootstrap as the web apps. A worker is not a web server, so ASP.NET Core
// instrumentation is off; HttpClient + runtime instrumentation stay on.
builder.AddObservability("worker-jobs", options =>
{
options.InstrumentAspNetCore = false;
options.ActivitySources.Add(WorkerTelemetry.ActivitySourceName);
options.Meters.Add(WorkerTelemetry.MeterName);
});
var host = builder.Build();
host.Run();
src/Worker.Jobs/Telemetry/WorkerTelemetry.cs
using System.Diagnostics;
using System.Diagnostics.Metrics;
namespace Worker.Jobs.Telemetry;
public sealed class WorkerTelemetry : IDisposable
{
public const string ActivitySourceName = "Worker.Jobs";
public const string MeterName = "Worker.Jobs";
public static readonly ActivitySource ActivitySource = new(ActivitySourceName);
private readonly Meter _meter = new(MeterName, "1.0.0");
private readonly Counter<long> _jobRuns;
private readonly Histogram<double> _jobDuration;
private readonly Counter<long> _itemsProcessed;
public WorkerTelemetry()
{
_jobRuns = _meter.CreateCounter<long>("worker.job.runs", unit: "{run}",
description: "Number of background job executions, tagged by job name and outcome.");
_jobDuration = _meter.CreateHistogram<double>("worker.job.duration", unit: "ms",
description: "Duration of background job executions.");
_itemsProcessed = _meter.CreateCounter<long>("worker.job.items_processed", unit: "{item}",
description: "Items processed by background jobs.");
}
public Activity? StartActivity(string name) => ActivitySource.StartActivity(name, ActivityKind.Internal);
public void RecordRun(string jobName, bool success, TimeSpan duration, int itemsProcessed)
{
var tags = new TagList { { "job.name", jobName }, { "success", success } };
_jobRuns.Add(1, tags);
_jobDuration.Record(duration.TotalMilliseconds, tags);
if (itemsProcessed > 0) _itemsProcessed.Add(itemsProcessed, new TagList { { "job.name", jobName } });
}
public void Dispose() => _meter.Dispose();
}
src/Worker.Jobs/Jobs/InventoryReconciliationJob.cs
using System.Diagnostics;
using System.Net.Http;
using System.Net.Http.Json;
using Worker.Jobs.Telemetry;
namespace Worker.Jobs.Jobs;
/// <summary>A periodic job. Each run starts a root span, calls the backend over an instrumented
/// HttpClient (→ worker-jobs → backend-api → db in SigNoz), logs, and records run metrics.</summary>
public sealed class InventoryReconciliationJob(
IHttpClientFactory httpClientFactory,
WorkerTelemetry telemetry,
IConfiguration configuration,
ILogger<InventoryReconciliationJob> logger) : BackgroundService
{
private const string JobName = "inventory-reconciliation";
private readonly TimeSpan _interval =
TimeSpan.FromSeconds(Math.Clamp(configuration.GetValue("Worker:IntervalSeconds", 15), 1, 3600));
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
logger.LogInformation("{JobName} starting; interval {IntervalSeconds}s", JobName, _interval.TotalSeconds);
try { await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken); } // let the backend come up
catch (OperationCanceledException) { return; }
// Run once immediately, then start the timer so its clock begins AFTER the warm-up run.
await RunOnceAsync(stoppingToken);
using var timer = new PeriodicTimer(_interval);
while (await WaitForNextTickAsync(timer, stoppingToken))
{
await RunOnceAsync(stoppingToken);
}
}
private async Task RunOnceAsync(CancellationToken ct)
{
using var activity = telemetry.StartActivity($"job.{JobName}");
activity?.SetTag("job.name", JobName);
var stopwatch = Stopwatch.StartNew();
try
{
var client = httpClientFactory.CreateClient("backend");
var stats = await client.GetFromJsonAsync<ProductStats>("/api/products/stats", ct);
var count = stats?.TotalCount ?? 0;
activity?.SetTag("job.items", count);
logger.LogInformation("Reconciled {ProductCount} products (inventory value {InventoryValue})",
count, stats?.InventoryValue);
telemetry.RecordRun(JobName, success: true, stopwatch.Elapsed, itemsProcessed: count);
}
catch (OperationCanceledException) when (ct.IsCancellationRequested)
{
return; // graceful shutdown -- not a failure
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
logger.LogError(ex, "{JobName} run failed", JobName);
telemetry.RecordRun(JobName, success: false, stopwatch.Elapsed, itemsProcessed: 0);
}
}
private static async Task<bool> WaitForNextTickAsync(PeriodicTimer timer, CancellationToken ct)
{
try { return await timer.WaitForNextTickAsync(ct); }
catch (OperationCanceledException) { return false; }
}
private sealed record ProductStats(int TotalCount, int TotalQuantity, decimal InventoryValue, decimal AveragePrice);
}
Wrapping up
The lesson is the same across every case: the Collector is the seam. Apps you own export OTLP directly. Apps you don't own get adapted at the Collector, with whichever mechanism fits the job:
- a
filelogreceiver for plain text, - a
resourceprocessor to give those files an identity, - a
json_parserfor structured lines, - and OTLP/HTTP for anything that can POST.
Stamp a consistent service.name and service.namespace, and every job -- C#, Python, PowerShell, or a decade-old batch script -- shows up side by side in a dashboard you host yourself.
Next, Part 4 -- instrument now, collect later goes under the hood to the instrumentation every post here relies on -- ActivitySource, Meter, and ILogger -- and shows you can write it all, and read it back, before committing to any backend. Or revisit Part 1 -- Blazor Server observability (which carries the shared foundation code) and Part 2 -- the C# API with Postgres or SQL Server, or jump back to the series index.
💼Open for consulting
I take on consulting and delivery work across .NET and React — on my own or alongside a trusted group of senior engineers I work with. Together we can build, untangle and modernize your software:
- Building ASP.NET / Blazor / C# / WPF apps with Postgres / ClickHouse
- Untangling, refactoring & modernizing legacy ASP.NET, C#, Blazor and WPF into a modern stack (modular monolith C# + React)
- Cloud & on-premise DevOps: Azure DevOps, CI/CD pipelines and automation
- Observability & analytics — in the cloud and on-premise
- On-premise migrations
- Scaling up delivery with experienced .NET, backend and React engineers, plus technical leadership