Skip to content

ReadOnlySpan<char> Replace extensions #29758

@daveaglick

Description

@daveaglick

It would be useful to have Replace() methods for ReadOnlySpan<char> similar to the ones for string. This was briefly mentioned in #22434 but then the Replace API was removed from the proposal for needing more work. As far as I can tell, it hasn't been revisited yet.

Here's my proposal to get things started:

public static class MemoryExtensions
{
    public static ReadOnlySpan<char> Replace(
        this ReadOnlySpan<char> str,
        ReadOnlySpan<char> oldValue,
        ReadOnlySpan<char> newValue);

    // If buffer is null or too small a new internal one will be created with the correct size and bufferCount = 0, return value is the created buffer
    // If buffer is the correct size it will be filled and bufferCount = buffer.Length (how much was filled), return value is the filled buffer
    // If buffer is too big it will be filled and bufferCount = how much was filled, return value is a slice of the filled buffer
    public static ReadOnlySpan<char> Replace(
        this ReadOnlySpan<char> str,
        ReadOnlySpan<char> oldValue,
        ReadOnlySpan<char> newValue,
        Span<char> buffer,
        out int bytesWritten)
}

I think this covers almost all use cases including those where the user wants to provide their own refillable buffer to reduce allocations. These two APIs have the following features:

  • Provides an overload that works almost exactly like the string version.
  • The overload that takes a buffer can be used when the user doesn't want the Replace method to allocate an internal buffer.
  • If the provided buffer isn't large enough, a new one will be allocated ensuring success. I prefer this over a try pattern where the replace call might fail - I don't think failing string operations is a user-friendly option.
  • The bytesWritten value can be checked to see if the provided buffer was used or filled (and by how much) and the caller can act accordingly.

Question: would we also want a public static ReadOnlySpan<char> Replace(this ReadOnlySpan<char> str, char oldChar, char newChar)? If so, that's probably easy enough to implement separately since we don't need to calculate the final buffer size or look ahead to determine matching.

I've been working out a naive implementation alongside the API definition to clarify my thoughts on the API surface. Here's what I've got so far - I include it for reference and as a point of discussion (and also to get any tips on how to further reduce allocations or improve performance, it's not quite as light or fast as the OOTB string.Replace() though I suspect there are still some optimizations to be had):

Expand
public static class SpanExtensions
{
    public static ReadOnlySpan<char> Replace(
    this ReadOnlySpan<char> str,
    ReadOnlySpan<char> oldValue,
    ReadOnlySpan<char> newValue) =>
    Replace(str, oldValue, newValue, null, out int _);

    public static ReadOnlySpan<char> Replace(
        this ReadOnlySpan<char> str,
        ReadOnlySpan<char> oldValue,
        ReadOnlySpan<char> newValue,
        Span<char> buffer,
        out int bytesWritten)
    {
        if (str == null)
        {
            throw new ArgumentNullException(nameof(str));
        }
        if (oldValue == null)
        {
            throw new ArgumentNullException(nameof(oldValue));
        }
        if (oldValue.Length == 0)
        {
            throw new ArgumentException("String cannot be of zero length", nameof(oldValue));
        }

        // When they're both equal just return the original
        bytesWritten = 0;
        if (newValue != null && oldValue.Equals(newValue, StringComparison.Ordinal))
        {
            return str;
        }

        // Get the replacement positions// Figure out the replacement positions
        // Partly lifted from logic in StringBuilder.Replace()
        int[] replacements = null;
        int replacementsCount = 0;
        int strIndex = 0;
        for (int c = 0; c < str.Length; c++)
        {
            if (str[c] == oldValue[strIndex])
            {
                strIndex++;
            }
            if (strIndex == oldValue.Length)
            {
                if (replacements == null)
                {
                    replacements = new int[5];
                }
                else if (replacementsCount > replacements.Length)
                {
                    // From StringBuilder.Replace() - grow by 1.5x but more in the begining
                    int[] newReplacements = new int[replacements.Length * 3 / 2 + 4];
                    Array.Copy(replacements, newReplacements, replacements.Length);
                    replacements = newReplacements;
                }
                replacements[replacementsCount++] = c - strIndex + 1;
                strIndex = 0;
            }
        }

        // Return the original string if no replacements were performed
        if (replacementsCount == 0)
        {
            return str;
        }

        // Size the buffer
        if (buffer == null || buffer.Length < str.Length + ((newValue.Length - oldValue.Length) * replacementsCount))
        {
            // Create a new buffer because the passed in one wasn't the right size
            buffer = new char[str.Length + ((newValue.Length - oldValue.Length) * replacementsCount)];
        }
        else
        {
            // Use the existing buffer and set the count
            bytesWritten = str.Length + ((newValue.Length - oldValue.Length) * replacementsCount);
        }

        // Perform replacements
        strIndex = 0;
        int charsIndex = 0;
        for (int c = 0; c < replacementsCount; c++)
        {
            // Copy up to the next replacement
            while (strIndex < replacements[c])
            {
                buffer[charsIndex++] = str[strIndex++];
            }

            // Perform the replacement (use strIndex to temporarily hold value index)
            strIndex = 0;
            while (strIndex < newValue.Length)
            {
                buffer[charsIndex++] = newValue[strIndex++];
            }
            strIndex = replacements[c] + oldValue.Length;
        }

        // No more replacements, copy the rest of the string
        while (strIndex < str.Length)
        {
            buffer[charsIndex++] = str[strIndex++];
        }

        // Slice the buffer if we didn't use the whole thing
        return charsIndex == bytesWritten ? buffer : buffer.Slice(0, charsIndex);
    }
}

cc @davkean who expressed an interest in this API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-System.Memory

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions