Problem
I’ve noticed on demo machines that sometimes Telegraf doesn’t start on the first try, and this seems to not happen on most of my production servers, but they have a lot more memory and CPU power. So I figured I would write a quick blog post and provide a way to set up a way to get the service to start when the machine is rebooted. This is a known issue that a user has offered a bounty to get it fixed so if you know some Go and have time, please check out the issue on Github.
Solution
The solution is relatively simple. I’ve created a PowerShell script to run in a loop to start the service (it usually starts on the second try) and sleeps for 90 seconds between attempts. I’ve edited the Install-Telegraf.ps1 file provided in the presentation Collecting Performance Metrics to auto-create a folder to hold the script that will autostart the job and create a Start-Telegraf.ps1 file to run when the server starts up to loop until the service starts. NOTE: The script below assumes you will be copying files from a network location to your servers, you will need to make some adjustments if that is not how you install it.
$servers = @( 'server1', 'server2' ) $servers | % { Write-Host "$($_)..." Write-Host "..Create folders and copy files..." New-Item -Path "\\$($_)\c$\Program Files\telegraf" -ItemType Directory -Force New-Item -Path "\\$($_)\c$\DBOps" -ItemType Directory -Force Copy-Item -Path "\\server\telegraf\telegraf.*" -Destination "\\$($_)\c$\Program Files\telegraf\" -Force Copy-Item -Path "\\server\telegraf\Start-Telegraf.ps1" -Destination "\\$($_)\c$\DBops\Start-Telegraf.ps1" -Force Invoke-Command -ComputerName $_ -ScriptBlock { Write-Host '..Install service...' Stop-Service -Name telegraf -ErrorAction SilentlyContinue & "c:\program files\telegraf\telegraf.exe" --service install -config "c:\program files\telegraf\telegraf.conf" SC.EXE Config telegraf Start=Delayed-Auto Start-Service -Name telegraf Start-Sleep 90 # Make sure it starts $service = Get-Service | Where-Object {$_.Status -eq "Running" -and $_.Name -eq "telegraf"} While($service.count -eq 0) { Start-Service -Name "telegraf" Start-Sleep 90 $service = Get-Service | Where-Object {$_.Status -eq "Running" -and $_.Name -eq "telegraf"} } Write-Host '..Setup job to mark sure it autostarts...' #Create job to start job on startup $trigger = New-JobTrigger -AtStartup -RandomDelay 00:00:30 Register-ScheduledJob -Trigger $trigger -FilePath C:\DBOps\Start-Telegraf.ps1 -Name Start-Telegraf } }
$service = Get-Service | Where-Object {$_.Status -eq "Running" -and $_.Name -eq "telegraf"} while ($service.count -eq 0) { Start-Service -Name "telegraf" start-sleep 90 $service = Get-Service | Where-Object {$_.Status -eq "Running" -and $_.Name -eq "telegraf"} }
The developers of Telegraf are looking into this issue on Windows, but until it is identified, I needed a way to make sure my demo machines would start the service without me having to do it manually.