Monitoring Pdftools Conversion Service Logs with ELK Stack

What is the Pdftools Conversion Service?

The Pdftools Conversion Service is a scalable, on-premise or cloud-deployable solution (EC2, Azure) for automated document conversion and processing. It supports multiple integration options, including file system, email, and REST API. Some of the key features include converting various document formats (DOCX, XLSX, PDF, images, HTML) into archive-ready PDF/A, support for Linux (with Docker) and Windows, and scalability with platforms like OpenShift and Kubernetes.


For more information visit the Pdftools Conversion Service Product Summary or have a look at the Getting Start Guides. Feel free to contact us for more information.

Important Note: The code samples provided here are for demonstration purposes and learning. While we strive to maintain these examples, they are not officially supported. For production use, please refer to our official documentation.

Introduction

This tutorial guides you through the process of integrating the Elastic Search ELK stack to monitor logs from the Pdftools Conversion Service. It includes a quick setup for immediate use and a more detailed example tailored to the structure of Conversion Service logs.

Prerequisites

Quick Start Example

Step 1: Collect Logs from Conversion Service

Windows

# Example log location
C:\Program Files\Pdftools\ConversionService\Logs\ConversionService-Service.log

Docker

# Example log command
sudo docker exec <container_name> tail -f /var/log/convsrv/ConversionService-Service.log

Step 2: Configure Logstash to Parse Conversion Service Logs

Create a Logstash configuration file conversion-service-logstash.conf:

input {
  file {
    path => "/path/to/ConversionService-Service.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    separator => ","
    columns => ["Timestamp", "Level", "ProcessName", "Message", "Exception", "JobID", "TaskID", "RemoteID"]
  }
  mutate {
    convert => { "Level" => "string" }
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "conversion-service-logs"
  }
  stdout {
    codec => rubydebug
  }
}

Step 3: Start Logstash

Run Logstash with the above configuration:

bin/logstash -f conversion-service-logstash.conf

Step 4: Visualize Logs in Kibana

  1. Open Kibana.
  2. Create an index pattern for conversion-service-logs.
  3. Visualize logs using Kibana dashboards (e.g., filter by Level to view ERROR or FATAL messages).

Detailed Example: Leveraging Conversion Service Log Structure

The Conversion Service logs includes multiple properties such as Timestamp, Level, ProcessName, Message, Exception, JobID, TaskID, and RemoteID - see Conversion Service Log Properties. These can be used to filter and visualize logs in Kibana.

Enhanced Logstash Configuration

For structured parsing and advanced use, include additional processing in your Logstash configuration:

filter {
  csv {
    separator => ","
    columns => ["Timestamp", "Level", "ProcessName", "Message", "Exception", "JobID", "TaskID", "RemoteID"]
  }

  if [Level] == "ERROR" or [Level] == "FATAL" {
    mutate {
      add_field => { "alert" => "Critical issue detected" }
    }
  }

  date {
    match => ["Timestamp", "yyyy-MM-dd HH:mm:ss"]
  }
}

Advanced Visualization in Kibana

For creating more advanced visualizations in Kibana, refer to Kibana Guide on Visualizations.

  1. Custom Dashboard:
    • Create a heatmap showing error frequency over time.
    • Add pie charts to show log distribution by ProcessName.
  2. Alerts:
    • Set alerts for ERROR or FATAL messages using Kibana’s alerting feature.
  3. Detailed Steps for Advanced Visualization:
    • Use the Kibana Lens feature to create a time-series visualization of error logs.
    • Combine filters to focus on critical severities like ERROR and FATAL.

Example for Debugging Specific Job

A “job” is a task submitted to the Conversion Service which can contain one or more files. When jobs are submitted to the Conversion Service, the logs will look something like this:

2024-12-11 16:40:53 2024-12-11 15:40:53.0595,INFO,job3_ekdh4fjuwiv,"Added file ""example.webp"" (id ""2dffplrzb0r-0-vyrbh5cy24q"") to job ""job3_ekdh4fjuwiv""",,::ffff:172.19.0.3
2024-12-11 16:40:53 2024-12-11 15:40:53.0668,INFO,job3_ekdh4fjuwiv,"Starting job ""job3_ekdh4fjuwiv""",,::ffff:172.19.0.3
2024-12-11 16:40:53 2024-12-11 15:40:53.6953,INFO,job3_ekdh4fjuwiv,"Job ""job3_ekdh4fjuwiv"" has finished",,::ffff:172.19.0.3

This means you can use a query like the following to find all logs related to a specific JobID:

GET /conversion-service-logs/_search
{
  "query": {
    "match": {
      "JobID": "12345"
    }
  }
}

Similar tracking can be done using the REST endpoint for the Conversion Service - “Get Job Info” although this only shows the current status of the job.