Skip to content

Several fundamental problems #62

@nene

Description

@nene

Given the following SQL:

select distinct
  next_account(5) as `rick and morty`,
  count (*) /* WENEEDReplaceThis select */
from table_with t
where x>10 and x<100 or x == 0 and code = 'stat\'ic'
cluster by col1

The sql-beautify extension produces the following:

SELECT  distinct next_acCOUNT(5) AS `rick AND morty`,
        count (*) /* WE'stat\'This
SELECT  */
FROM table_
WITH t
WHERE x > 10
AND x < 100 or x =  = 0
AND code = undefinedic' cluster by col1

Which demonstrates the following issues:

  • block-comments /*..*/ are not supported
  • quoted identifiers `blah` are not supported
  • escaping inside strings "foo\"bar" is not supported
  • identifiers containing certain reserved words get messed up
  • using the special text NEEDReplace anywhere in the SQL will mess up the formatting
  • some operators (e.g. ==) get broken up with whitespace
  • not all keywords get properly capitalized (e.g. AND operator is capitalized but or is not).
  • count(*) gets capitalized (and aggressively so), but count (*) does not.
  • some Hive-specific language constructs like CLUSTER BY aren't recognized at all.

The implementation of this extension consist basically of a series of string-replace operations. This approach is unfortunately fundamentally flawed. No amount of additional fixes will save it.

At the very minimum one should tokenize the SQL (this is the approach that sql-formatter uses). Better yet would be to use a parser (an approach used by prettier-plugin-sql-cst and SQLFluff).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions