Skip to content

Latest commit

Β 

History

History
306 lines (226 loc) Β· 9.39 KB

File metadata and controls

306 lines (226 loc) Β· 9.39 KB

πŸ“Œ Complete Guide to Collectors (java.util.stream.Collectors)

Collectors are one of the most useful parts of the Streams API. This guide covers why they exist, how they work, the Collector interface anatomy, built-in collectors with examples, parallel collection behavior, custom collectors, and best practices.


πŸ”Ή Why Collectors Exist & What They Solve

  • Streams process sequences of elements, but usually you want a result container at the end (e.g., List, Map, String, aggregated value).
  • Collectors provide a reusable, composable way to perform mutable reduction:
    • Accumulate elements into a mutable container.
    • Combine partial results (for parallel streams).
    • Transform the container into the final result.
  • Compared to reduce(), collectors are:
    • More efficient for mutable accumulation (e.g., lists, maps).
    • Composable (downstream collectors for grouping, mapping, etc.).

πŸ”Ή Collector Interface Anatomy

public interface Collector<T, A, R> {
  Supplier<A> supplier();                 // Create an empty result container
  BiConsumer<A, T> accumulator();         // Add an element to the container
  BinaryOperator<A> combiner();           // Merge two containers (parallel)
  Function<A, R> finisher();              // Convert container to final result
  Set<Characteristics> characteristics(); // Hints: CONCURRENT, UNORDERED, IDENTITY_FINISH
}

Type Parameters:

  • T β†’ Stream element type.
  • A β†’ Mutable accumulator type (intermediate container).
  • R β†’ Final result type.

➑️ Collector.of(...) is a factory for creating custom collectors.


πŸ”Ή How collect() Works (High-Level)

  1. supplier() β†’ creates empty accumulator.
  2. Each element β†’ accumulator() mutates the container.
  3. If parallel β†’ multiple accumulators created, merged with combiner().
  4. finisher() converts accumulator β†’ final result.
  • Sequential: one accumulator, one finisher.
  • Parallel: multiple accumulators, merged, then finished.

πŸ”Ή Collector Characteristics

  • IDENTITY_FINISH β†’ A == R, no finishing step required.
  • UNORDERED β†’ Result does not depend on stream order.
  • CONCURRENT β†’ Accumulator is thread-safe, can be used across threads.

➑️ These characteristics help optimize concurrency and ordering.


πŸ”Ή Mutable vs Immutable Reduction

  • reduce() β†’ best for immutable reductions (summing, min/max).
  • collect() β†’ best for mutable reductions (lists, maps, sets).

πŸ”Ή Common Built-in Collectors with Examples

1. toList()

List<String> list = stream.collect(Collectors.toList());
  • Produces a List (usually ArrayList).
  • Mutability not guaranteed (use toUnmodifiableList() for unmodifiable).

2. toSet()

Set<String> set = stream.collect(Collectors.toSet());
  • Produces a Set (usually HashSet).
  • No order guarantee.

3. toMap()

Map<Integer, String> map = stream.collect(Collectors.toMap(
    String::length, s -> s, (a, b) -> a + "," + b));
  • Duplicate keys throw IllegalStateException unless merge function provided.
  • Overloads allow custom map supplier.

4. joining()

String csv = stream.collect(Collectors.joining(", "));
String withBrackets = stream.collect(Collectors.joining(", ", "[", "]"));
  • Concatenates CharSequences efficiently using StringBuilder.

5. counting()

Long count = stream.collect(Collectors.counting());
  • Returns boxed Long.
  • Equivalent to stream.count() but fits into collector pipelines.

6. summingInt / summingLong / summingDouble

Integer total = stream.collect(Collectors.summingInt(String::length));
  • Returns boxed primitive wrapper type.

7. averagingInt / averagingDouble

Double avg = stream.collect(Collectors.averagingInt(String::length));
  • Always returns Double.

8. maxBy / minBy

Optional<String> max = stream.collect(Collectors.maxBy(Comparator.naturalOrder()));
  • Returns Optional<T>.

9. groupingBy

Map<Integer, List<String>> grouped =
    stream.collect(Collectors.groupingBy(String::length));

Map<Integer, Set<String>> groupedSet =
    stream.collect(Collectors.groupingBy(String::length, Collectors.toSet()));
  • Extremely powerful with downstream collectors.
  • groupingByConcurrent() for concurrent grouping.

10. partitioningBy

Map<Boolean, List<Integer>> parts =
    nums.stream().collect(Collectors.partitioningBy(n -> n % 2 == 0));
  • Specialized case of grouping β†’ partitions into true/false buckets.

11. mapping

Set<Integer> lengths = stream.collect(
    Collectors.mapping(String::length, Collectors.toSet()));
  • Transforms before downstream collector.

12. reducing

Optional<Integer> sum = stream.collect(Collectors.reducing(Integer::sum));
Integer sum2 = stream.collect(Collectors.reducing(0, String::length, Integer::sum));
  • General-purpose reduction inside collector framework.

13. collectingAndThen

List<String> immutable = stream.collect(
    Collectors.collectingAndThen(Collectors.toList(), Collections::unmodifiableList));
  • Post-process collector result (e.g., wrap in unmodifiable list).

πŸ”Ή Examples (Compact)

List<String> strings = List.of("a", "bb", "ccc");

// toList
List<String> list = strings.stream().collect(Collectors.toList());

// toMap with merge
Map<Integer, String> map = strings.stream()
    .collect(Collectors.toMap(String::length, s -> s, (a, b) -> a + "|" + b));

// groupingBy with mapping
Map<Integer, Set<String>> grouped = strings.stream()
    .collect(Collectors.groupingBy(String::length,
             Collectors.mapping(Function.identity(), Collectors.toSet())));

// counting
Long count = strings.stream().collect(Collectors.counting());

// joining
String joined = strings.stream().collect(Collectors.joining(", "));

πŸ”Ή Parallel Streams & Collectors (Simplified Internals)

  • Sequential: one accumulator β†’ accumulate β†’ finisher.
  • Parallel:
    • Stream splits into chunks.
    • Each subtask has its own accumulator.
    • Partial accumulators merged with combiner.
    • Finisher applied to result.
  • Concurrent collectors (e.g., groupingByConcurrent) may accumulate into one shared container.

πŸ”Ή Writing Your Own Collector

Example: Join strings with commas.

Collector<String, StringBuilder, String> joiner = Collector.of(
    StringBuilder::new,
    (sb, s) -> { if (sb.length() > 0) sb.append(","); sb.append(s); },
    (sb1, sb2) -> { if (sb1.length() > 0 && sb2.length() > 0) sb1.append(","); sb1.append(sb2); return sb1; },
    StringBuilder::toString
);

String result = Stream.of("a", "b", "c").collect(joiner);

πŸ”Ή Downstream Collectors & Composition

  • groupingBy(classifier, downstream) β†’ e.g., groupingBy(fn, counting()).
  • mapping(mapper, downstream) β†’ transform before collecting.
  • collectingAndThen(downstream, finisher) β†’ apply final transformation.
Map<Integer, Long> sizeCounts = stream.collect(
    Collectors.groupingBy(String::length, Collectors.counting()));

πŸ”Ή Performance Tips & Best Practices

  • For parallel β†’ use groupingByConcurrent / toConcurrentMap.
  • Prefer built-in collectors (highly optimized).
  • For predictable sizes, use toCollection(() -> new ArrayList<>(expectedSize)).
  • Use toUnmodifiableList() (Java 10+) for immutability.
  • Avoid shared mutable state unless using concurrent collectors.

πŸ”Ή Common Pitfalls

  • toMap() β†’ throws on duplicates unless merge function provided.
  • Not all collectors are thread-safe.
  • Ordering may or may not be preserved.
  • Misusing CONCURRENT β†’ ensure thread-safe accumulator.

πŸ”Ή Quick Cheat-Sheet

Collector Returns Example
toList() List<T> stream.collect(toList())
toSet() Set<T> stream.collect(toSet())
toMap() Map<K,V> toMap(String::length, s->s, (a,b)->a+","+b)
joining() String Collectors.joining(", ")
counting() Long Collectors.counting()
summingInt() Integer summingInt(String::length)
averagingInt() Double averagingInt(...)
maxBy/minBy Optional<T> maxBy(Comparator.naturalOrder())
groupingBy Map<K, List<T>> groupingBy(String::length)
partitioningBy Map<Boolean, List<T>> partitioningBy(predicate)
mapping Transform + downstream mapping(String::length, toSet())
reducing Custom reduction reducing(0, String::length, Integer::sum)
collectingAndThen Post-process result collectingAndThen(toList(), unmodifiableList())

πŸ”Ή When to Use Which

  • Build collections β†’ toList, toSet, toMap.
  • Grouping β†’ groupingBy, with downstreams.
  • Partitioning β†’ partitioningBy.
  • Statistics β†’ counting, summingInt, averagingInt, maxBy.
  • String building β†’ joining.
  • Post-process β†’ collectingAndThen.
  • Parallel performance β†’ groupingByConcurrent, toConcurrentMap.