To get some records you want from the database or to store data into the database, you should use some constant values (literals) when you write the SQL query. PostgreSQL does some works when it reads those constant values on query parsing and analyzing phase to process the query correctly. Oh, please don’t say that “What about prepared statements?” here. The data type of parameters in prepared statements is related to this subject but it is beyond the scope of this posting. In this posting, we will see what kinds of constant values in PostgreSQL offer and how PostgreSQL determines the data types of those values at the source code level.
Kinds of Constant Values
Although PostgreSQL offers various kinds of constant values, we will focus on some types frequently used in most cases. Those are integer, float, string, and boolean. You can see others at AexprConst rule in the gram.y file.
PostgreSQL’s scanner (scan.l file) recognizes many kinds of tokens but numbers, strings, and boolean values are our only concern. Suppose that the scanner returns <a, b> pair (here, a is token ID and b is C language type of the value) for each token, it returns <ICONST, long> for a series of digits, <FCONST, C-string> for decimals or reals, <SCONST, C-string> for arbitrary strings, and <TRUE_P/FALSE_P, none> for boolean values.
Here are some notable points.
- In ICONST case, the scanner tries to convert the digits (an integer) into a long If the size of the digits is not fit in long, it returns <FCONST, C-string> instead. This is what process_integer_literal() function does. We will see how this case is handled properly in a few sections.
- There is no notion of negative numbers in the scanner. Actually, a negative sign is treated as an operator and it will be applied later (doNegate() function) on grammar parsing phase.
So far, the scanner knows unsigned integer (ICONST), unsigned floats (FCONST), strings (SCONST), and boolean values (TRUE_P/FALSE_P).
If PostgreSQL’s parser meets one of the tokens above, it will generate an A_Const parse node for the token. The simplified version of A_Const looks like below;
Here, type field has T_Integer for ICONST, T_Float for FCONST and T_String for SCONST. And you can easily infer what ival or str field has.
What about boolean values? This is an intersting point. They are actually stored as a T_String and str field has “t” or “f” for TRUE_P or FALSE_P respectively. Then the parser wraps the A_Const with a typecast to “bool“. makeBoolAConst() function does all these processes. We will look more in depth later how those strings become true “bool” value.
At this point, there are T_Integer, T_Float, and T_String.
When a user requests an SQL query, PostgreSQL parses the query and makes a resulting parse tree. Then, PosrgreSQL analyzes the parse tree to check some semantic errors and transform it into a Query tree which is an input for the PostgreSQL’s query planner.
A_Const we’ve seen above is one of the possible nodes in the parse tree. And let’s assume that it is in the parse tree. When PostgreSQL analyzes the parse tree, A_Const is transformed into Const which is an expression node. The simplified version of Const looks like this;
consttype field has a PostgreSQL’s OID for data types and constvalue field has a datum (PostgreSQL’s binary format for a value). The actual transform of T_Integer, T_Float, and T_String is done by make_const() function. Let’s look closely at how they are transformed one by one.
T_Integer is transformed into a Const of INT4OID and the actual integer value. It’s super simple.
T_Float is little complicated but let’s break it down. Normally, it is transformed into a Const of NUMERICOID and a numeric datum through numeric_in() input function. For your reference, input function is a function which converts a C-string into a datum of the specific type. In this case, str field in A_Const is converted into a numeric datum through numeric_in(). Wait, do you remember that the scanner generates FCONST for an integer which is not fit in long instead of ICONST? It means that even if the type field of A_Const is T_Float, the actual value of it might be an integer. So, before transform a T_Float to a numeric Const, make_const() tries to convert str field to an integer first. If the conversion succeeds, the result of the transform will be a Const of INT8OID and the actual integer value.
T_String is the hardest case in here. To say the conclusion first, it is transformed into a Const of UNKNOWNOID and a C-string. What? We already know that there is a data type for this use in PostgreSQL, “text”(TEXTOID). Here is why PostgreSQL does this. A C-string can be converted into any data type through input functions. It means that if an unknown Const is given in an expression, PostgreSQL can try to determine the right data type of the Const considering the expression context. The example queries below show the evalueation of two expressions. One for an unknown Const, and another for a “text“.
You can see that the first query is successfully executed but the second one is not. Notice that the data type of the result of the expression ‘1’ + 1 is integer.
Finally, the case of boolean values is left. If you’re reading this line, I believe that there is no doubt you already know the answer. (Hint: typecast unknown Const to “bool“)
We’ve briefly looked how PostgreSQL treats some basic constant values. You can see the source codes for detailed implementation of those logics if you’d like to. I hope this posting can be a good starting point to do that and help you to write neat and clean SQL queries for PostgreSQL.
BITNINE GLOBAL INC., THE COMPANY SPECIALIZING IN GRAPH DATABASE
비트나인, 그래프 데이터베이스 전문 기업