opentelemetry-sqlcommenter

Specifications

Introduction

This section defines the SQL commenter algorithm which augments a SQL statement with a comment containing serialized key value pairs that are retrieved from the various ORMs and frameworks in your programming language and environment of choice.

A preview of the result can be seen as per exhibit

SELECT * FROM FOO /*action='%2Fparam*d',controller='index,'framework='spring',
traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7'*/

Read along to see how you can conform to the specification and produce similar output.

Format

The final comment SHOULD be affixed to the final SQL statement in the format

<SQL STATEMENT> /*<ATTRIBUTE_KEY_VALUE_PAIRS>*/

Comment escaping

Comments within SQL comments are of the format

If a comment already exists within a SQL statement, we MUST NOT mutate that statement.

Separator

Each key value pair MUST be separated by a comma “,” so for example, given

Values: [<FIELD_1>, <FIELD_2>, <FIELD_3>, ...]

Expected concatenation result: <FIELD_1>,<FIELD_2>,<FIELD_3>,<FIELD_N...>

Meta characters

Meta characters such as ' should be escaped with a slash \. That creates the following algorithm:

Algorithm

algorithm(value):
    escaped := value.escape_with_slash_any_of(')

    return escaped

Key Value Serialization

Key serialization

  1. URL encode the key e.g. given route parameter, that’ll become route%20parameter

Which produces the following algorithm:

Algorithm

key_serialization(key):
    encoded := url_encode(key)
    meta_escaped := escape_meta_characters(encoded)

    return meta_escaped

Value serialization

  1. URL encode the value e.g. given /param first, that SHOULD become %2Fparam%20first
  2. Escape meta-characters within the raw value; a single quote ' becomes \'
  3. SQL escape the value by placing it within two single quotes e.g.

    DROP should become 'DROP'

    FOO 'BAR should become 'FOO%20\'BAR'

And when generalized into an algorithm:

Algorithm

value_serialization(value):
    encoded := url_encode(value)
    meta_escaped := escape_meta_characters(encoded)
    final := sql_escape_with_single_quotes(meta_escaped)

    return final

and running the algorithm on the following table will produce

value url_encode(value) sql_escape_with_single_quotes
DROP TABLE FOO DROP%20TABLE%20FOO 'DROP%20TABLE%20FOO'
/param first %2Fparam%20first '%2Fparam%20first'
1234 1234 '1234'

Key Value format

Given a key value pair (key, value):

  1. Run the Key serialization algorithm on key
  2. Run the Value serialization algorithm on value
  3. Using an equals sign =, concatenate the result from 1. and 2. to give

    <SERIALIZED_KEY>=<SERIALIZED_VALUE> gotten from:

    serialize_key(key)=serialize_value(value)

Thus given for example the following key value pairs

key value pair serialized_key serialized_value Final
route=/polls 1000 route '%2Fpolls%201000' route='%2Fpolls%201000'
name='DROP TABLE FOO' route '%2Fpolls%201000' route='%2Fpolls%201000'
name''="DROP TABLE USERS'" name='' DROP%20TABLE%20USERS' name=''=’DROP%20TABLE%20USERS'’

Sorting

With a list of serialized key=value pairs, sort them by lexicographic order.

Algorithm

sort(key_value_pairs):
    sorted = lexicographically_sort(key_value_pairs)

    return sorted

Exhibit

Thus

    sort([
        traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
        tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7',
        route='%2Fparam*d',
        controller='index',
    ])

produces

 [
        controller='index',
        route='%2Fparam*d',
        traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
        tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7',
 ]

Concatenation

After all the keys and values have been serialized and sorted, they MUST be joined by a comma ,.

If no values are present, concatenate MUST return the empty value ''

Algorithm

concatenate(key_value_pairs):

    if len(key_value_pairs) == 0:
        return ''

    return ','.join(key_value_pairs)

Exhibit

Therefore

    concatenate([
        controller='index',
        route='%2Fparam*d',
        traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
        tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7',
    ])

produces

controller='index',route='%2Fparam*d',traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7'

Affix comment

After serialization, sorting, concatenation, the final form MUST be placed between /* and */

Algorithm

affix_comment(sql, concatenated):
    if is_empty(concatenated):
        return sql // Do NOT modify the SQL if concatenated is blank.

    affixed := sql + '/*' + concatenated + '*/'

    return affixed

Exhibit

for example given

affix_comment('SELECT * from FOO', '')

produces

SELECT * from FOO
affix_comment('SELECT * from FOO', "route='%2Fparam*d'")

produces

SELECT * from FOO /*route='%2Fparam*d'*/

SQL commenter

Wrapping all the steps together, we thus have the following algorithm

sql_commenter(sql, attributes):
    if contains_sql_comment(sql):
        return sql # DO NOT mutate a statement with an already present comment.

    serialized_key_value_pairs := []

    for each attribute in attributes:
        serialized := serialize_key_value_pair(attribute)
        if serialized:
            serialized_key_value_pairs.append(serialized)

    sorted := sort(serialized_key_value_pairs)
    concatenated := concatenate(sorted)
    final := affix_comment(sql, concatenated)

    return final

Exhibit

Running sql_commenter on an ORM integration that extracts the respective attributes:

sql_commenter('SELECT * FROM FOO', [
        tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7',
        traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
        framework='spring',
        action='%2Fparam*d',
        controller='index',
])

finally produces

SELECT * FROM FOO /*action='%2Fparam*d',controller='index,'framework='spring',
traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7'*/

Parsing

Parsing is the step to reverse sql-commenter and extract the key value attributes.

It’ll follow the following steps:

  1. Find the last comment so search for and strip out /* and */
  2. Split the comment by comma ,
  3. Split each key='value' pair so extract key and 'value'
    • 3.1. For key, unescape_meta_characters then url_decode
    • 3.2. For value, sql_unescape/trim the ' at the beginning and end of 'value' -> value
      • 3.2.1. Unescape the meta characters in value
      • 3.2.2. URL Decode the value

Algorithm

parse(sql_with_comment):
    if !contains_sql_comment(sql_with_comment):
        return sql_with_comment, null

    // Since we now have a SQL comment, let's extract the serialized attributes.
    sql_stmt, serialized_attrs := extract_sql_commenter(sql_with_comment)

    if is_empty(serialized_attrs):
        return sql_stmt, null

    attrs := {}
    kv_splits := split_by_comma(serialized_attrs)
    for kv in kv_splits:
        e_key, e_value := split_by_equals(kv)
        key := decode_key(e_key)
        value := decode_value(e_value)

        attrs[key] = value

    // Some attributes such as traceparent, tracestate, sampled
    // might need need some grouping and reconstruction.
    final := deconstruct_and_group_attributes(attrs)

    return sql_stmt, final

Exhibit

Given the value from SQLCommenter exhibit

SELECT * FROM FOO /*action='%2Fparam*d',controller='index,'framework='spring',
traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7'*/

Running parse on the value

sql, attributes = parse(`SELECT * FROM FOO
/*action='%2Fparam*d',controller='index,'framework='spring',
traceparent='00-5bd66ef5095369c7b0d1f8f4bd33716a-c532cb4098ac3dd2-01',
tracestate='congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7'*/`)

produces

sql: SELECT * FROM FOO
attributes: {
    controller: 'index',
    framework: 'spring',
    action: '/param*d',
    trace: {
        sampled: true,
        span_id: 'c532cb4098ac3dd2',
        trace_id: '5bd66ef5095369c7b0d1f8f4bd33716a',
        trace_state: [{'congo': 't61rcWkgMzE'}, {'rojo': '00f067aa0ba902b7'}],
    },
}

References

Resource URL
URL Encoding https://en.wikipedia.org/wiki/Percent-encoding
Comments within SQL comments https://docs.oracle.com/cd/B12037_01/server.101/b10759/sql_elements006.htm