Given the following SQL:
select distinct
next_account(5) as `rick and morty`,
count (*) /* WENEEDReplaceThis select */
from table_with t
where x>10 and x<100 or x == 0 and code = 'stat\'ic'
cluster by col1
The sql-beautify extension produces the following:
SELECT distinct next_acCOUNT(5) AS `rick AND morty`,
count (*) /* WE'stat\'This
SELECT */
FROM table_
WITH t
WHERE x > 10
AND x < 100 or x = = 0
AND code = undefinedic' cluster by col1
Which demonstrates the following issues:
- block-comments
/*..*/ are not supported
- quoted identifiers
`blah` are not supported
- escaping inside strings
"foo\"bar" is not supported
- identifiers containing certain reserved words get messed up
- using the special text
NEEDReplace anywhere in the SQL will mess up the formatting
- some operators (e.g.
==) get broken up with whitespace
- not all keywords get properly capitalized (e.g.
AND operator is capitalized but or is not).
count(*) gets capitalized (and aggressively so), but count (*) does not.
- some Hive-specific language constructs like
CLUSTER BY aren't recognized at all.
The implementation of this extension consist basically of a series of string-replace operations. This approach is unfortunately fundamentally flawed. No amount of additional fixes will save it.
At the very minimum one should tokenize the SQL (this is the approach that sql-formatter uses). Better yet would be to use a parser (an approach used by prettier-plugin-sql-cst and SQLFluff).
Given the following SQL:
The sql-beautify extension produces the following:
Which demonstrates the following issues:
/*..*/are not supported`blah`are not supported"foo\"bar"is not supportedNEEDReplaceanywhere in the SQL will mess up the formatting==) get broken up with whitespaceANDoperator is capitalized butoris not).count(*)gets capitalized (and aggressively so), butcount (*)does not.CLUSTER BYaren't recognized at all.The implementation of this extension consist basically of a series of string-replace operations. This approach is unfortunately fundamentally flawed. No amount of additional fixes will save it.
At the very minimum one should tokenize the SQL (this is the approach that sql-formatter uses). Better yet would be to use a parser (an approach used by prettier-plugin-sql-cst and SQLFluff).