Sr. Dashboard Engineer (Grafana/Prometheus)
- Sr. Dashboard Engineer (Grafana/Prometheus)
Project Location 100% remote (must be in the US)
Project Duration- 6-12+ months
Visa-:
Job Description-:
NOTES FROM CLIENT: They need to have great communications skills because they are working with several teams. I am running into candidates that don't have the OpenStack and OpenShift experience which is needed for the exporter piece. We are using Grafana for the dashboards and Prometheus for the data acquisition.
Key Responsibilities:
Create and maintain Grafana dashboards to monitor OpenStack and Webscale environments.
Integrate data from various sources, including Prometheus, node-exporter, and custom exporters.
Collaborate with operations teams to understand requirements and develop custom dashboards.
Set up PromQL-based alerts and integrate with existing monitoring tools.
Optimize dashboard performance and ensure quick load times.
Generate daily reports on cluster issues and status using VictoriaMetrics.
Required Skillsets:
Primary: Grafana, PromQL (Prometheus)
Additional: OpenStack, Kubernetes, Linux systems performance, GitOps
Soft Skills: Strong communication and teamwork skills to work effectively with customer operations teams.
Technical Expertise:
Experience with data collectors like filebeat-exporter, lldp-exporter, ethtool-exporter, and redfish-exporter.
Ability to create detailed dashboards and alerts based on existing thresholds and custom requirements.
Proficiency in scripting languages like Python or shell scripting for automation tasks.
Work with following in Grafana to build required dashboards:
Create operations summary dashboard with consolidated OpenStack/Webscale metrics/highlights
Logs from elastic
Alerts
Key metrics outside thresholds
Use data from lldp, ethtool, node-exporter, OSP/OCP apis, harvest
Edge: using sevOne today - convert edge to node-exporter
Core: Try to align with Edge, but take custom requirements
Create dashboard joining system metrics with storage metrics
Link to application metadata - namespace/pod/vm/etc
OpenStack and WebScale:
openstack: libvirt-exporter, openstack-exporter, harvest
openshift: kubelet (namespace/pod/pvc-iscsi,nfs-qtree/flexvol), trident, harvest
Create promql based alerts from existing thresholds, icinga shell/python scripts and other requirements.
CoreDNS
Identify key metrics to combine in a single dashboard in order to quickly identify issues with a cluster.
From the above; allow the user to drill down further into the clusters with issues.
Create "sub-metrics" (recording rules, static data sources, etc) from large datasets to enable a faster load time on dashboards
Create a report that emails out daily with current cluster issues/status
Victoriametrics report generation capability (separate from Grafana)
- Sr. Dashboard Engineer (Grafana/Prometheus)
Project Location 100% remote (must be in the US)
Project Duration- 6-12+ months
Visa-:
Job Description-:
NOTES FROM CLIENT: They need to have great communications skills because they are working with several teams. I am running into candidates that don't have the OpenStack and OpenShift experience which is needed for the exporter piece. We are using Grafana for the dashboards and Prometheus for the data acquisition.
Key Responsibilities:
Create and maintain Grafana dashboards to monitor OpenStack and Webscale environments.
Integrate data from various sources, including Prometheus, node-exporter, and custom exporters.
Collaborate with operations teams to understand requirements and develop custom dashboards.
Set up PromQL-based alerts and integrate with existing monitoring tools.
Optimize dashboard performance and ensure quick load times.
Generate daily reports on cluster issues and status using VictoriaMetrics.
Required Skillsets:
Primary: Grafana, PromQL (Prometheus)
Additional: OpenStack, Kubernetes, Linux systems performance, GitOps
Soft Skills: Strong communication and teamwork skills to work effectively with customer operations teams.
Technical Expertise:
Experience with data collectors like filebeat-exporter, lldp-exporter, ethtool-exporter, and redfish-exporter.
Ability to create detailed dashboards and alerts based on existing thresholds and custom requirements.
Proficiency in scripting languages like Python or shell scripting for automation tasks.
Work with following in Grafana to build required dashboards:
Create operations summary dashboard with consolidated OpenStack/Webscale metrics/highlights
Logs from elastic
Alerts
Key metrics outside thresholds
Use data from lldp, ethtool, node-exporter, OSP/OCP apis, harvest
Edge: using sevOne today - convert edge to node-exporter
Core: Try to align with Edge, but take custom requirements
Create dashboard joining system metrics with storage metrics
Link to application metadata - namespace/pod/vm/etc
OpenStack and WebScale:
openstack: libvirt-exporter, openstack-exporter, harvest
openshift: kubelet (namespace/pod/pvc-iscsi,nfs-qtree/flexvol), trident, harvest
Create promql based alerts from existing thresholds, icinga shell/python scripts and other requirements.
CoreDNS
Identify key metrics to combine in a single dashboard in order to quickly identify issues with a cluster.
From the above; allow the user to drill down further into the clusters with issues.
Create "sub-metrics" (recording rules, static data sources, etc) from large datasets to enable a faster load time on dashboards
Create a report that emails out daily with current cluster issues/status
Victoriametrics report generation capability (separate from Grafana)