English

Share with friends

Note

If you happen to have your Virtual Private Cloud (VPC) logs or Load Balancer (ELB) logs getting collected in your AWS S3 and you want a unified platform to visualize them? In this tutorial, you will configure how you can send any logs to the SigNoz otel collector endpoint from your S3 bucket using the Lambda function.

Send your AWS VPC, ELB, and Lambda logs to SigNoz HTTP Endpoint cover image

Here's a quick summary of what we'll do in this detailed article.

Have some data in your S3 bucket → make a lambda function in AWS → add the trigger to the lambda function (your S3 bucket, set the trigger as PUT/All object creation) → add required policy so you can run the function → Load the objects from S3 → perform log parsing and convert them to JSON dictionary → send them to the SigNoz otel-collector endpoint.

Assumptions

  1. You have already installed SigNoz using HELM (or Docker) or using the SigNoz Cloud edition.
  2. You have an AWS account with admin privileges.

Creating / Configuring your S3 bucket

Create an S3 bucket if you don't already have one and upload some data to it. It's fairly easy to do this, create an S3 bucket and upload some .json, .csv, .log, or any format you wish (even .log.gz or .log.zip).

Keep in mind, we'll be converting all logs to JSON before sending so you might need to do some additional preprocessing of logs.

If you don't already have a S3 bucket, here's how to make one.

  • Sign in to your AWS Management Console.
  • Navigate to your Amazon S3 service.
  • Type a unique bucket name, select your region, and add any additional settings if you need them (such as versioning, logging, etc.).
Create AWS S3 bucket
Create AWS S3 bucket

Give your bucket a name and keep the settings to default, works for most of the parts.

Fill up AWS S3 bucket details
Fill up AWS S3 bucket details and object ownership

When your bucket gets created, click on upload from the UI and upload your files into the bucket.

If you have configured VPC logging, the VPC logs will automatically be saved in .gz format in your S3 bucket.

In this tutorial, I'll be assuming you already have some data in your S3 bucket to start with and you need to send your AWS VPC Flow logs via HTTP to a backend service, in this case, we are using the SigNoz self-host version and SigNoz cloud version.

The steps to send ELB Logs, and Lambda Logs will be the same, you just have to make the following changes:

  • Select the right S3 bucket.
  • use the right headers for each log type.
Note

Since we'll parse out data from s3, we assume all the data is of the same format.

Example - We have a S3 bucket for VPC flow logs, then all the data inside a bucket will be of that same format only (eg, if a file is .csv, all files will be .csv only, why? Can't I have different file formats? Sure you can have, it just adds a couple of extra steps and different parsing functions).

The general header(table) format of VPC logs are:

vpc_headers= ["version","account-id","interface-id","srcaddr","dstaddr","srcport",
"dstport","protocol","packets","bytes","start","end","action","log-status"]

Here's a sample VPC log file content with file name as 9016606xxxxx_vpcflowlogs_eu-west-1_fl-0dxxxxx9c1f82967d_2024061xxxxx0Z_b3xxx37d.log.gz

version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
2 9016606xxxxx eni-00xxxxx10cfc77a72 xx.82.47.48 10.0.12.216 34888 123 17 1 40 1718806751 1718806806 REJECT OK
2 9016606xxxxx eni-00xxxxx10cfc77a72 xx.75.212.75 10.0.12.216 34752 25836 6 1 40 1718806751 1718806806 REJECT OK
2 9016606xxxxx eni-00xxxxx10cfc77a72 xx2.70.149.34 10.0.12.216 51989 8099 6 1 40 1718806751 1718806806 REJECT OK
2 9016606xxxxx eni-00xxxxx10cfc77a72 xx.189.91.157 10.0.12.216 123 40995 17 1 76 1718806751 1718806806 ACCEPT OK

For ELB logs, the log format is of the following type:

elb_headers= ["type","time","elb","client_port","target_port",
"request_processing_time","target_processing_time","response_processing_time",
"elb_status_code","target_status_code","received_bytes","sent_bytes","request",
"user_agent","ssl_cipher","ssl_protocol","target_group_arn","trace_idd",
"domain_name","chosen_cert_arn","matched_rule_priority",
"request_creation_time","actions_executed","redirect_url",
"error_reason","target_port_list","target_status_code_list",
"classification","classification_reason"]

The general header(table) format of Lambda logs is:

lambda_headers= ["time","type","record", "tracing"]
// a single lambda log record
{
   "time": "2020-11-12T14:55:06.780Z",
   "type": "platform.report",
   "record": {
     "requestId": "49e64413-fd42-47ef-b130-6fd16f30148d",
     "metrics": {
       "durationMs": 4.96,
       "billedDurationMs": 100,
       "memorySizeMB": 128,
       "maxMemoryUsedMB": 87,
       "initDurationMs": 792.41
     },
     "tracing": {
       "type": "X-Amzn-Trace-Id",
       "value": "Root=1-5fad4cc9-70259536495de84a2a6282cd;Parent=67286c49275ac0ad;Sampled=1"
     }
   }
}

You can flatten the JSON to make it:

{
   "time": "2020-11-12T14:55:06.780Z",
   "type": "platform.report",
   "record_requestId": "49e64413-fd42-47ef-b130-6fd16f30148d",
   "record_metrics_durationMs": 4.96,
   "record_metrics_billedDurationMs": 100,
   "record_metrics_memorySizeMB": 128,
   "record_metrics_maxMemoryUsedMB": 87,
   "record_metrics_initDurationMs": 792.41,
   "record_tracing_type": "X-Amzn-Trace-Id",
   "record_tracing_value": "Root=1-5fad4cc9-70259536495de84a2a6282cd;Parent=67286c49275ac0ad;Sampled=1"
}
Note

These headers are just for name sake, you can change them if you wish to but it is advisable. You might have noticed in elb_headers, I have mentioned trace_idd and not trace_id in the elb_headers. This is not a typo, ELB logs have trace_id as the header name but payload structure SigNoz recommends is shown below:

{
   "timestamp": "<uint64>",
   "trace_id": "<hex string>",
   "span_id": "<hex string>",
   "trace_flags": "<int>",
   "severity_text": "<string>",
   "severity_number": "<int>",
   "attributes": "<map>",
   "resources": "<map>",
   "body": "<string>",
}

Source - SigNoz send Logs over HHTP

trace_id here is of the format <hex string> but we'll send everything as string JSON for simplicity and less data processing and parsing, hence the change in field name from trace_id to trace_idd. [If you use trace_id and send the trace_id as a string, you'll get a 400 error, this can resolved by further logs formatting]

Understanding how lambda functions work

For the scope of this article, understand that if you successfully attach your lambda function with the s3 bucket and configured it correctly (we'll talk about that in a bit), any new additions / deletion / copy / PUT, etc requests made to the s3 bucket will trigger the lambda function the code written in the lambda function will get executed.

Creating a lambda function

Follow along to create a Lambda Function:-

Step 1: Go to your AWS console and search for AWS Lambda, go to Functions as shown in the screenshot below, and click on Create Function.

Create Lambda Function
Create Lambda Function

Step 2: Choose the Author from scratch checkbox and proceed to fill in the function name.

Configure Lambda function
Configure Lambda function

Step 3: Choose Python 3.x as the Runtime version, x86_64 as Architecture (mostly), and keep other settings as default.

Choose runtime version and execution role
Choose runtime version and execution role

Step 4: Select Create a new role with basic Lambda permissions for now, we'll require more permissions down the lane. So for now, select this option.

Lambda function created
Congrats, Lambda function created!

Once you are done configuring the lambda function, you'll see it get created here. Feel free to move around and see what all the things AWS Lambda offers. You can also see if EKS is really better than ECS to deploy SigNoz here.

Configuring Policies for Lambda function

As said in Step 4 previously, we need extra permissions to access the S3 Bucket for execution of our Lambda code, follow along to set it up.

Step 1: Scroll down from your Lambda page, you'll find a few tabs there. Go to Configurations and select Permissions from the left sidebar.

Go to Permission's tab and click Execution Role name
Go to Permission's tab and click Execution Role name

Step 2: Click on the Execution Role name's link, it will take us to the AWS IAM page. Here, we'll be adding a policy to get full S3 access.

Step 3: Once here, click on the Add permissions button and select Attach policies from the drop-down list.

Attach Policy for the Lambda function access
Attach Policy for the Lambda function access

Step 4: Search “S3” and you'll find a policy named AmazonS3FullAccessselect that and proceed.

Select AmazonS3FullAccess
Select AmazonS3FullAccess
Note

Giving full S3 access might be okay for testing purposes, do consult with your admin before running your lambda function with full S3 access permission in production.

Congrats, you are just done with one of the major hurdles in running your code. Now, let's add a trigger.

Adding Triggers

You need to use the Lambda console to build a trigger so that your function can be called immediately by another AWS service (S3, in our case). A trigger is a resource you set up to enable your function to be called by another AWS service upon the occurrence of specific events or conditions.

A function may have more than one trigger. Every trigger functions as a client, independently calling your method, and Lambda transfers data from a single trigger to each event it passes to your function.

To set up our trigger, follow along:-

Step 1: Click on the + Add triggerbutton from the Lambda console.

Add a trigger to Lambda function
Add a trigger to Lambda function

Step 2: Select S3 from the first drop-down of the AWS services list. Pick your S3 bucket for the second field.

Step 3: For the Event types field, you can select any number of options you wish. The trigger will occur depending on what option(s) you choose here. By default, the All object create events will be selected.

Select Event Types
Select Event Types

Verify the settings and click on the Add button at the bottom right to add this trigger. You could add more than one trigger, but we'll stick to just one for now.

Adding Request Layer

We will be using Python's request module and by default, you do not get the 'requests' module by default in the Lambda. Why? See the attached image and visit this link to know what are the Upcoming changes to the Python SDK in AWS Lambda.

Upcoming changes to the Python SDK in AWS Lambda
Upcoming changes to the Python SDK in AWS Lambda

So, to use the requests module, we need to add it explicitly as a layer and then only we can use it. There might be some alternatives, but we'll stick with what works and tests.

Anyways, refer to the below-attached steps to create a zip of the request module and add it as a layer to make it work on AWS lambda. Steps for which are described ahead.

Stackoverflow answer to make Python's request package layer
Stackoverflow answer to make Python's request package layer

Link to Stackoverflow answer to make Python's request package layer

The commands you'd need:

mkdir python
cd python

pip install --target . requests
zip -r dependencies.zip ../python

Here's the screenshot of the dependencies.zip file which we'll be uploading to AWS soon.

Local setup dependencies file photo
Local setup dependencies file

Step 1: To upload your zip file, go to AWS Lambda > Layers and click on Create Layer. [Not inside your specific Lambda function, just the landing page of AWS Lambda].

Create a new layer to be later on used in our Lambda function
Create a new layer to be later on used in our Lambda function

Step 2: You'll be redirected to the Layer Configurations page, here, give a name to your layer, an optional description, select Upload a .zip file , click on Upload, and locate the requirements.zip file.

Step 3: Select your desired architecture and pick Python 3.x as your runtime. Hit Create. Your layer has now been created. Now let's connect it to our Lambda function which we created to send logs to SigNoz.

Upload the zip file you created above and hit Create
Upload the zip file you created above and hit Create

Step 4: Go to your Lambda function, scroll down to the Layers section and on the right of it, you'll find a button that says Add a layer to click on.

Click on Add a layer
Click on Add a layer

Step 5: Pick Custom layers from the checkbox and select your custom layer from the given drop-down below and then click on the button Add.

Select layer configurations
Select layer configurations and click Add

Well, that's it. You have successfully added a requests module to run in your code area. Without adding this layer, you'd get a 'request module not found error'.

Setting up Endpoint URL for SigNoz Self Host

If you are using SigNoz Cloud, move to the next section on SigNoz Cloud.

Now, to see your logs in the SigNoz dashboard, we need to send logs to SigNoz's otel-collector endpoint and for that, we need to do a few configurations. Refer to the below documentation to do the configs.

Doc - Sending Logs to SigNoz over HTTP

Only after you are done with the changes that the above docs say, proceed further with this article. If you are using HELM as an installation method, I have added the exact configs in the next section. Still, I'd suggest you give the doc a read before moving on (Helm users).

Since a pod is the smallest deployable item in Kubernetes and is thought to be easily replaceable, we should communicate with the controller, which will handle the pod, rather than the pod directly.

If you have self-hosted your SigNoz installations, we'll need the pod IP of the running SigNoz otel collector, and since the IP might change whenever the pod goes down, we need something static, something permanent to make sure you are not changing the IP in the code URL everytime the pod goes down. Read over Stackoverflow as this is a standard Kubernetes problem. But we have a simple fix for it.

Here's a quick summary of what the Sending Logs to SigNoz over HTTP doc says.

  1. Expose Port
  2. Add and include receiver
  3. Construct the cURL request
Note

This doc above has instructions if you have installed SigNoz via Docker, and thus the file changes might be a bit different if you installed SigNoz via Helm. We'll cover them too here.

curl --location 'http://<IP>:8082' \
--header 'Content-Type: application/json' \
--data '[
{
   "trace_id": "000000000000000018c51935df0b93b9",
   "span_id": "18c51935df0b93b9",
   "trace_flags": 0,
   "severity_text": "info",
   "severity_number": 4,
       "attributes": {
       "method": "GET",
       "path": "/api/users"
       },
   "resources": {
       "host": "myhost",
       "namespace": "prod"
       },
   "message": "This is a log line"
   }
]'

<IP> is the IP of the system where your collector is running.

To know more details about <IP>, checkout this troubleshooting.

We'll be hardcoding the IP of the SigNoz Otel collector by enabling ingress for the signoz-otel-collector.

Changing values.yaml file of SigNoz installed via HELM

Feel free to follow the above-mentioned doc if you installed SigNoz via Docker. This section is particularly for users who use a HELM.

SigNoz Helm Repo Installation Docs

Docs to install SigNoz on AWS using Helm at an AWS Elastic Kubernetes Service (EKS) cluster

Step 1: Locate otelCollector in your values.yaml file

# ~ line 1213
otelCollector:
 name: "otel-collector"
 image:
   registry: docker.io
   repository: signoz/signoz-otel-collector
   tag: 0.88.4
   pullPolicy: IfNotPresent

Step 2: Under '# Configuration for ports', add the following config (if not already):

Configuration for ports
Configuration for ports
logsjson:
     # -- Whether to enable service port for logsjson
     enabled: true
     # -- Container port for logsjson
     containerPort: 8082
     # -- Service port for logsjson
     servicePort: 8082
     # -- Node port for logsjson
     nodePort: ""
     # -- Protocol to use for logsjson
     protocol: TCP

Step 3: Configure ingress for OtelCollector and add 8082 as the port. The host : signoz-otelcol.example.us is a subdomain that you need to create in your respective AWS route 53 service. We have covered that too in the following sections.

Configure ingress for SigNoz
Configure ingress for SigNoz
Note

We will not be using port forwarding method which is recommended by the SigNoz team.

ingress:
   # -- Enable ingress for OtelCollector
 enabled: true
   # -- Ingress Class Name to be used to identify ingress controllers
 className: "alb"
   # -- Annotations to OtelCollector Ingress
 annotations:
   alb.ingress.kubernetes.io/scheme: internet-facing
   alb.ingress.kubernetes.io/target-type: ip
   kubernetes.io/ingress.class: alb
   alb.ingress.kubernetes.io/group.name: monitoring
   alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":8082}]'
   
     # cert-manager.io/cluster-issuer: letsencrypt-prod
     # nginx.ingress.kubernetes.io/ssl-redirect: "true"
     # nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
     # kubernetes.io/ingress.class: nginx
     # kubernetes.io/tls-acme: "true"
   # -- OtelCollector Ingress Host names with their path details
 hosts:
   - host: signoz-otelcol.example.us
     paths:
       - path: /*
         pathType: ImplementationSpecific
         port: 8082

Step 4: Add nodeSelectors and tolerations.

Add nodeSelectors and tolerations
Add nodeSelectors and tolerations

Read about about nodeSelectors and tolerations

# -- Node selector for settings for OtelCollector pod
 nodeSelector:
   Purpose: "General-1B"
 # -- Toleration labels for OtelCollector pod assignment
 tolerations:
   - key: "Purpose"
     operator: "Equal"
     value: "General-1B"
     effect: "NoSchedule"

Step 5: Under '# Configurations for OtelCollector', add the following if not added already:

Configurations for OtelCollector
Configurations for OtelCollector
httplogreceiver/json:
 # endpoint specifies the network interface and port which will receive data
 endpoint: 0.0.0.0:8082
 source: json

Step 6: Check if the pipelines are configured correctly or not.

Check the pipelines in values.yaml
Check the pipelines in values.yaml
logs:
 receivers: [otlp, httplogreceiver/heroku, httplogreceiver/json, syslog]
 processors: [batch]
 exporters: [clickhouselogsexporter]

This is all the changes you need to do to your values.yamlfile. For the host domain part, write a name for which you'll be creating a Route 53 afterwards. After making these changes, upgrade helm using the command:

helm upgrade <release_name> . -n <namespace_where_signoz_is_installed> -f values.yaml

or use an extended command suggested by the SigNoz team:

helm -n platform upgrade \
   --create-namespace --install \
   my-release signoz/signoz \
   -f override-values.yaml

Create a URL for Otel Collector.

Now lets create a record a.k.a subdomain, to which we'll be sending the logs from our Lambda function via HTTP request.

Step 1: Go to your AWS console and search for Route 53 next, click on Hosted zones, select a zone of your wish (or create a zone if you don't have one).

Select a Zone to Host the application
Select a Zone to Host the application

Step 2: Click on the Create record button.

Click on create record
Click on Create record

Step 3: Specify the Record name of your choice.

Step 4: Select Record type as A - Routes traffic to an IPv4 address and some AWS resources.

Step 5: Enable Alias

Step 6: Select Alias to Application and Classic Load Balancer to the below configs and then choose a region as well a Load Balancer from the respective drop-downs. Create a Load Balancer if you need a different one. You might wana know more about different AWS Load Balancers

Step 7: Hit Create records after verifying the details.

Create a record
Enter the record informations and record type

After the record is successfully created, copy the route address and ARN for future copy pasting.

(optional) Step 8: If you want to add an SSL certificate to your otel-collector record, feel free to use AWS Certificate Manager (ACM). This is an optional step. It does not impact any of the things we discussed above or will discuss further in this article.

(optional) Step 9: Go to AWS Certificate Manager and click on Request certificate. Select Request a public certificate and hit Next,

Request a public cretificate
Request a public cretificate

(optional) Step 10: Paste the record name which you just created in the above Route 53 service here. Fill in other fields as per your requirement and proceed.

Add your domain name
Add your domain name here

Your certificate will be issued. You might want to add the ARN to this certificate, so keep it handy.

Certificate status window
Certificate status window

Once all this is done, now let's talk about the actual code which will send the logs to SigNoz.

SigNoz Cloud Setup and Config

If you are using the cloud version of SigNoz, you'll need these things only:

This HTTP URL is the endpoint to which SigNoz accepts incoming JSON data.

Replace {REGION} with in / us / uk, etc whichever region you have in the SigNoz Cloud's settings page.

The Lambda Function

Now finally we arrive at the most important section of this article, the code.

Here's the complete code with comments:

import json
import gzip
import boto3
import requests
import shlex

# Create an S3 client
s3 = boto3.client('s3')

# Function to convert a log line into a JSON object
def convert_log_line_to_json(line):
   # Define the headers to be used for the JSON keys
   headers = ["type", "time", "elb", "client:port", "target:port", "request_processing_time", "target_processing_time", "response_processing_time", "elb_status_code", "target_status_code", "received_bytes", "sent_bytes", "request","user_agent", "ssl_cipher", "ssl_protocol", "target_group_arn", "trace_idd", "domain_name", "chosen_cert_arn", "matched_rule_priority", "request_creation_time", "actions_executed", "redirect_url", "error_reason", "target:port_list", "target_status_code_list", "classification", "classification_reason"]

   # Split the log line using shell-like syntax (keeping quotes, etc.)
   res = shlex.split(line, posix=False)

   # Create a dictionary by zipping headers and log line parts
   return dict(zip(headers, res))

# Lambda function handler
def lambda_handler(event, context):
   # S3 bucket name
   bucket_name = abcd-loadbalancer-logs'

   # List all objects in the specified S3 bucket
   response = s3.list_objects_v2(Bucket=bucket_name)

   # Iterate through each object in the bucket
   for obj in response['Contents']:
       # Check if the object is a gzipped log file
       if obj['Key'].endswith('.log.gz'):
           file_key = obj['Key']

           # Download the gzipped file content
           file_obj = s3.get_object(Bucket=bucket_name, Key=file_key)
           file_content = file_obj['Body'].read()

           # Decompress the gzipped content
           decompressed_content = gzip.decompress(file_content)

           # Convert bytes to string
           json_data = str(decompressed_content, encoding='utf-8')

           # Split the string into lines
           lines = json_data.strip().split('\n')

           # Convert the list of strings into a JSON-formatted string
           result = json.dumps(lines, indent=2)

           # Load the JSON-formatted string into a list of strings
           list_of_strings = json.loads(result)

           # Convert each log line string into a JSON object
           json_data = [convert_log_line_to_json(line) for line in list_of_strings]
          
           # add req_headers to the code only if you are sending logs data to SigNoz cloud,
           # remove req_headers line if you have self-hosted SigNoz
           req_headers = {
                   # 'signoz-access-token': '<SIGNOZ_INGESTION_KEY>', # uncomment this line if you are using signoz cloud
                    'Content-Type': 'application/json'
               }

           # Specify the HTTP endpoint for sending the data to self-hosted signoz
           http_url = 'http://signoz-otelcol.company.us:8082'  # Replace with your actual URL

           # if using SigNoz cloud, use the below URL
           http_url = 'https://ingest.in.signoz.cloud:443/logs/json/'  # Replace with your actual URL

           # Send the JSON data to the specified HTTP endpoint
           response = requests.post(http_url, json=json_data, headers=req_headers)

           # Print information about the sent data and the response received
           print(f"Sent data to {http_url}. Response: {response.status_code}")

Let's break down the code in smaller chunks to make you understand.

  1. Import Required Modules:

    import json
    import gzip
    import boto3
    import requests
    import shlex
    

    Import necessary Python modules:

    • json: for JSON encoding and decoding.
    • gzip: for decompressing gzipped content.
    • boto3: AWS SDK for Python, used for interacting with AWS services.
    • requests: for making HTTP requests.
    • shlex: for splitting shell-like syntax.
  2. Create S3 Client: Create an S3 client using the AWS SDK (boto3).

    s3 = boto3.client('s3')
    
  3. Define Function to Convert Log Line to JSON:

    def convert_log_line_to_json(line):
        # ...
    

    Define a function convert_log_line_to_json that takes a log line as input and converts it into a dictionary with predefined headers.

  4. Lambda Function Handler:

    def lambda_handler(event, context):
        # ...
    

    Define the Lambda function handler, which is the entry point for AWS Lambda execution.

  5. Specify S3 Bucket Name:

    bucket_name = 'abc-loadbalancer-logs'
    

    Set the S3 bucket name where the log files are stored.

  6. List Objects in the S3 Bucket:

    response = s3.list_objects_v2(Bucket=bucket_name)
    

    Use the S3 client to list all objects in the specified bucket.

  7. Iterate Through Objects:

    for obj in response['Contents']:
        # ...
    

    Iterate through each object in the S3 bucket.

  8. Check if Object is a Log File:

    if obj['Key'].endswith('.log.gz'):
        # ...
    

    Check if the object is a gzipped log file based on its file extension.

  9. Download and Decompress Log File:

    file_key = obj['Key']
    file_obj = s3.get_object(Bucket=bucket_name, Key=file_key)
    file_content = file_obj['Body'].read()
    decompressed_content = gzip.decompress(file_content)
    

    Retrieve the gzipped log file from S3, read its content, and decompress it.

  10. Convert Bytes to String:

json_data = str(decompressed_content, encoding='utf-8')

Convert the decompressed content from bytes to a UTF-8 encoded string. 11. Split String into Lines:

lines = json_data.strip().split('\\n')

Split the string into a list of lines based on the newline character. 12. Convert List of Strings to JSON-formatted String:

result = json.dumps(lines, indent=2)

Convert the list of strings into a nicely formatted JSON string. 13. Load JSON-formatted String into List of Strings:

list_of_strings = json.loads(result)

Load the JSON-formatted string back into a list of strings. 14. Convert Each Log Line to JSON Object:

json_data = [convert_log_line_to_json(line) for line in list_of_strings]

Use the convert_log_line_to_json function to convert each log line string into a JSON object. 15. Specify HTTP Endpoint:

http_url = 'http://signoz-otelcol.example.us:8082'

Set the HTTP endpoint where the JSON data will be sent. 16. Send JSON Data to HTTP Endpoint:

response = requests.post(http_url, json=json_data)

Use the requests library to send a POST request to the specified HTTP endpoint with the JSON data. 17. Print Information about the Sent Data:

print(f"Sent data to {http_url}. Response: {response.status_code}")

Print information about the sent data and the received HTTP response status code.

This code essentially downloads gzipped log files from an S3 bucket, decompresses them, converts the log lines into JSON objects, and sends the resulting JSON data to a specified HTTP endpoint.

Here's how a raw, unprocessed ELB log line looks like:

https 2024-01-01T23:58:03.391277Z app/abc-prod-alb/0b46e552ds5b44da 35.244.22.76:41802 192.1.0.114:80 0.000 1.077 0.000 200 200 1430 923 "POST https://api.abcs.com:4463/suporodv2/v1/get-result/ HTTP/1.1" "SFDC-Callout/59.0" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:ap-south-1:8429181216651:targetgroup/ecs-Sabc-P-Private-SU-API/02f1623fsddec2691ce "Root=1-65343518a-72123e913f71cb2e20213a3ea9" "api.example-sabcs.com" "session-reused" 98 2024-01-01T23:58:02.313000Z "forward" "-" "-" "192.1.1.114:80” "200" "-" "-"

You can match each field with the corresponding header in the code.

The above code is to send ELB logs to the SigNoz endpoint.

Note

This is a working code but be careful copy pasting the entire code, it might lead to ingestion of huge amounts of data if not configured correctly.

Other than the above explanation and the code comments, in a nutshell, what the this code does is:

Sends the parsable content of ENTIRE S3 bucket whenever the lambda function gets triggered. It gets triggered by the condition you set above. Let's mention that again here.

Step 3: For the Event types field, you can select any number of options you wish. The trigger will occur depending on what option(s) you choose here. By default, the All object create events will be selected.

Let's say you add something to your S3 bucket, it may / may not trigger this lambda function or if you have setup your s3 as if it automatically stores all your ELB/VPC logs, segregated in different folders, so whenever any new log gets added, the function will get triggered and send all the S3 data.

You can learn more about different kind of storage AWS provides, just like S3.

This is obviously not what everyone expects, ideal case would be to have a mass log transfer once the first connection is made to SigNoz otel-collector (which then later get stored in gp2/gp3 storageClass of EBS), and then send logs lines of only the recently logged one.

To achieve this functionality, you need to add a few conditions to the code.

  1. Assuming all standard log lines have a timestamp field
  2. Parse and select the timestamp field from the log line and add it before the response = requests.post(http_url, json=json_data) line as an if-else condition to only send logs that are x days older (say 3 days).

So, the function now will first check the log timestamp and only send those logs which are 3 days older (say) or even a few hours old.

Let's consider the below pseudo code for better understanding:

from datetime import datetime, timedelta

# Your given timestamp
given_timestamp_str = "2024-01-01T23:58:02.231919Z"
given_timestamp = datetime.fromisoformat(given_timestamp_str.replace('Z', '+00:00'))

# Current time
current_time = datetime.utcnow()

# Calculate the time difference
time_difference = current_time - given_timestamp

# Check if the time difference is less than 3 days
if time_difference < timedelta(days=3):
   # Run your specific function here
   print("Running the specific function.")
# ADD THE response = requests.post(http_url, json=json_data) LINE HERE

else:
   print("Time difference exceeds 3 days. Function will not run.")

Feel free to modify any part of the code if you wish to.

Running the code locally.

If you want to run the entire setup locally in your laptop for testing purposes. Here's the reference code for you:

import os
import gzip
import json
import requests
import shlex

def convert_log_line_to_json(line):
   headers= ["type","time","elb","client_port","target_port","request_processing_time","target_processing_time","response_processing_time","elb_status_code","target_status_code","received_bytes","sent_bytes","request","user_agent","ssl_cipher","ssl_protocol","target_group_arn","trace_idd","domain_name","chosen_cert_arn","matched_rule_priority","request_creation_time","actions_executed","redirect_url","error_reason","target_port_list","target_status_code_list","classification","classification_reason"]
   res = shlex.split(line, posix = False)
  
   # data = line.strip().split()
   return dict(zip(headers, res))

def process_log_file(file_path):
   with gzip.open(file_path, 'r') as f:
       log_data = f.read().decode('utf-8')
       # print(log_data)
      
       lines = log_data.strip().split('\n')
       result = json.dumps(lines, indent=2)
      
       list_of_strings = json.loads(result)
       # print(log_data)
      
       json_data = [convert_log_line_to_json(line) for line in list_of_strings]
       # print(json_data)   
      
       print(json.dumps(json_data, indent=2))
       # print(json_data[0]["time"])

       req_headers = {
                   # 'signoz-access-token': '<SIGNOZ_INGESTION_KEY>', # uncomment this line if you are using signoz cloud
                    'Content-Type': 'application/json'
               }

       # Specify the HTTP endpoint for sending the data to self-hosted signoz
       http_url = 'http://signoz-otelcol.company.us:8082'  # Replace with your actual URL

       # if using SigNoz cloud, use the below URL
       http_url = 'https://ingest.in.signoz.cloud:443/logs/json/'  # Replace with your actual URL

       # Send the JSON data to the specified HTTP endpoint
       response = requests.post(http_url, json=json_data, headers=req_headers)

       # Print information about the sent data and the response received
       print(f"Sent data to {http_url}. Response: {response.status_code}")

def main():
   root_folder = '<folder_name>'

   for root, _, files in os.walk(root_folder):
       for file in files:
           if file.endswith('.log.gz'):
               file_path = os.path.join(root, file)
               process_log_file(file_path)

if __name__ == '__main__':
   main()
Note

Add some log files to the folder, it can be nested as well, the code will check all subfolders and find the log files that end with the extension .log.gz. You can change it to match whichever file you want.

Here are some local testing results:

Local testing results after running the code with print statements of log data
Local testing results after running the code with print statements of log data

Testing your Lambda function

After you are done writing your code, it's time to deploy it and test if it actually works. It will be good to uncomment the print(...) lines I added in the code as comments to see at each point how the logs are getting formatted.

But before that, you might want to increase the timeout setting for your lambda function as this data transfer from S3 to external endpoint can take a few minutes and by default, lambda function times out in 3 seconds.

To increase the timeout time, go to ConfigurationGeneral Configuration → Edit button and set it under 10 mins. Usually the code executes between 1-4 minutes at max.

Increasing Timeout Setting
Increasing Timeout Setting

After you are done with increasing the timeout setting, go to your lambda code editor and click on the test button dropdown option. Configure a new test case (new event) as S3 PUT and click save.

Create an event to test the code
Creating an event to test the lambda code

Now you are all set. Whenever you make any change in the code and want to test it out, deploy the code first (works as Save button) and then click on the test button after it's fully deployed.

Attached below is a screenshot of sending VPC logs (not ELB) to the SigNoz endpoint.

Screenshot of Lambda Code
Screenshot of Lambda Code

Test Case and Output

If you are able to successfully send the logs, this is how they'll get sent (ps - this is the output as I have printed the json formatted data just to see what gets sent).

Execution Result on AWS Lambda Output Window
Execution Result on AWS Lambda Output Window

Visualize the logs in SigNoz

You'll immediately see a flood of logs in the SigNoz logs section. You can even switch to Live monitoring of Logs. Just click on any log line to see the details.

A sample log line of the logs sent from AWS Lambda
A sample log line of the logs sent from AWS Lambda to the SigNoz Dashboard

Conclusion

To sum it up, the code we looked at is like a handy tool for dealing with log files on Amazon's cloud storage (S3). Imagine it as a smart assistant that grabs zipped-up log files, opens them up, and turns the messy log lines into neat, organized bits of information in a language computers easily understand (JSON format).

Visualize your logs, process them, set processing pipelines for the logs and much more can be done in SigNoz. Feel free to explore all the possibilities in the SigNoz dashboard.

This guide has been a very very long one, and I have had first hand experience setting all this up which I wrote above and configured, in a production server while SigNoz was deployed on an EKS cluster. The idea was to get rid of Loki, Grafana and the tons of resources they were consuming and replace all with SigNoz. It worked great at the end after all the hassle.

Share with friends

Priyansh Khodiyar's profile

Written by Priyansh Khodiyar

Priyansh is the founder of UnYAML and a software engineer with a passion for writing. He has good experience with writing and working around DevOps tools and technologies, APMs, Kubernetes APIs, etc and loves to share his knowledge with others.

Further Reading

Life is better with cookies 🍪

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt out if you wish. Cookie Policy