CASSANALYTICS-128: Add flag to allow bulk write to indexed tables by jmckenzie-dev · Pull Request #181 · apache/cassandra-analytics

jmckenzie-dev · 2026-03-11T19:20:30Z

Patch by Josh McKenzie; reviewed by TBD for CASSANALYTICS-128

Patch by Josh McKenzie; reviewed by YYYY for CASSANALYTICS-128

sarankk

Thanks Josh, Lgtm. 1 minor nit. Would it be useful to test writer with this flag in integration test too?

sarankk · 2026-03-11T20:07:57Z

cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java

    public final int commitThreadsPerInstance;
    public final double importCoordinatorTimeoutMultiplier;
    public boolean quoteIdentifiers;
+    public boolean skipSecondaryIndexCheck;


Suggested change

public boolean skipSecondaryIndexCheck;

public final boolean skipSecondaryIndexCheck;

yifan-c · 2026-03-11T20:20:02Z

cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/WriterOptions.java

+     * with the understanding that the secondary indexes will NOT be updated by the bulk write and must be
+     * rebuilt separately after the job completes.
+     */
+    SKIP_SECONDARY_INDEX_CHECK,


How about defining it in the spark conf, instead of write options?

The rationale is that the toggle is to for advanced use case and not directly related to the write behavior. There is another existing spark conf, org.apache.cassandra.spark.bulkwriter.BulkSparkConf#SKIP_CLEAN that skips cleaning up SSTable when job fails.

Admittedly, there are some existing inconsistency of where to have conf and where to have write options in the project.

Would putting this in BulkSparkConf force it session-wide? i.e. a user would lose the ability to reason about and easily configure this setting on a per-table / per operation basis vs. the public exposure of it via WriterOptions?

My intuition right now is that this is something that probably should have been enabled by default all this time w/a configurable guardrail to turn it off, so while I'm sympathetic to the idea of taking a small step from "don't allow it" to "allow it but make it hard to use", the risk of this being easily accessible for users is that they'll bulk write to a table that will then have a long running index building operation happen in the background. Which, other than "load on node" and "application might read a partial index if you don't have automation that clearly delineates when a bulk insert and reindex finish from application accessing it", doesn't represent a structural or correctness risk to the data.

jmckenzie-dev · 2026-03-11T21:01:14Z

Would it be useful to test writer with this flag in integration test too?

Eh, I didn't because ultimately it's a validation error inside the TableSchema construction so basically, if we can get past that point then we should be good to go.

The rest of the integration path would effectively be testing whether bulk import on cassandra with a table that has legacy SecondaryIndexes on it works correctly. Which, to be completely honest, I'm not super confident of (dug around a bit and ImportTest.java doesn't seem to have coverage). BUT - that's a core C* test coverage problem and not an analytics / sidecar problem.

So maybe a follow up JIRA to add test coverage in C* proper for legacy 2i on imports would be reasonable?

jmckenzie-dev · 2026-03-11T21:41:27Z

CI: https://app.circleci.com/pipelines/github/jmckenzie-dev/cassandra-analytics?branch=allow_write_2i

jyothsnakonisa

looks good to me

jyothsnakonisa · 2026-03-12T10:40:57Z

cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java

        this.ttl = MapUtils.getOrDefault(options, WriterOptions.TTL.name(), null);
        this.timestamp = MapUtils.getOrDefault(options, WriterOptions.TIMESTAMP.name(), null);
        this.quoteIdentifiers = MapUtils.getBoolean(options, WriterOptions.QUOTE_IDENTIFIERS.name(), false, "quote identifiers");
+        this.skipSecondaryIndexCheck = MapUtils.getBoolean(options, WriterOptions.SKIP_SECONDARY_INDEX_CHECK.name(), false, "skip secondary index check");


Should the default value be true?

jyothsnakonisa · 2026-03-12T10:44:30Z

cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/TableSchema.java

+        // process today. 2i and SAI have different ergonomics here regarding if stale data is served during index build;
+        // ultimately we want the bulk writer to also write native SAI index files alongside sstables but until
+        // then, this is allowable and fine for users who Know What They're Doing.
+        if (!skipSecondaryIndexCheck)


Can you please add a test if possible that check this behavior, basically the if block

jmckenzie-dev added 2 commits March 11, 2026 15:18

CASSANALYTICS-128: Add flag to allow bulk write to indexed tables

75510aa

Patch by Josh McKenzie; reviewed by YYYY for CASSANALYTICS-128

Add CHANGES.txt

48bf502

sarankk reviewed Mar 11, 2026

View reviewed changes

yifan-c reviewed Mar 11, 2026

View reviewed changes

make new flag final

bf7c00d

long line checkstyle failure

2f81a3c

jyothsnakonisa approved these changes Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANALYTICS-128: Add flag to allow bulk write to indexed tables#181

CASSANALYTICS-128: Add flag to allow bulk write to indexed tables#181
jmckenzie-dev wants to merge 4 commits intoapache:trunkfrom
jmckenzie-dev:allow_write_2i

jmckenzie-dev commented Mar 11, 2026

Uh oh!

sarankk left a comment

Uh oh!

sarankk Mar 11, 2026

Uh oh!

yifan-c Mar 11, 2026

Uh oh!

jmckenzie-dev Mar 11, 2026

Uh oh!

jmckenzie-dev commented Mar 11, 2026 •

edited

Loading

Uh oh!

jmckenzie-dev commented Mar 11, 2026

Uh oh!

jyothsnakonisa left a comment

Uh oh!

jyothsnakonisa Mar 12, 2026

Uh oh!

jyothsnakonisa Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	public boolean skipSecondaryIndexCheck;
	public final boolean skipSecondaryIndexCheck;

Conversation

jmckenzie-dev commented Mar 11, 2026

Uh oh!

sarankk left a comment

Choose a reason for hiding this comment

Uh oh!

sarankk Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

yifan-c Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

jmckenzie-dev commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmckenzie-dev commented Mar 11, 2026

Uh oh!

jyothsnakonisa left a comment

Choose a reason for hiding this comment

Uh oh!

jyothsnakonisa Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jyothsnakonisa Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jmckenzie-dev commented Mar 11, 2026 •

edited

Loading