-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
It would be useful to have Replace() methods for ReadOnlySpan<char> similar to the ones for string. This was briefly mentioned in #22434 but then the Replace API was removed from the proposal for needing more work. As far as I can tell, it hasn't been revisited yet.
Here's my proposal to get things started:
public static class MemoryExtensions
{
public static ReadOnlySpan<char> Replace(
this ReadOnlySpan<char> str,
ReadOnlySpan<char> oldValue,
ReadOnlySpan<char> newValue);
// If buffer is null or too small a new internal one will be created with the correct size and bufferCount = 0, return value is the created buffer
// If buffer is the correct size it will be filled and bufferCount = buffer.Length (how much was filled), return value is the filled buffer
// If buffer is too big it will be filled and bufferCount = how much was filled, return value is a slice of the filled buffer
public static ReadOnlySpan<char> Replace(
this ReadOnlySpan<char> str,
ReadOnlySpan<char> oldValue,
ReadOnlySpan<char> newValue,
Span<char> buffer,
out int bytesWritten)
}I think this covers almost all use cases including those where the user wants to provide their own refillable buffer to reduce allocations. These two APIs have the following features:
- Provides an overload that works almost exactly like the
stringversion. - The overload that takes a buffer can be used when the user doesn't want the
Replacemethod to allocate an internal buffer. - If the provided buffer isn't large enough, a new one will be allocated ensuring success. I prefer this over a try pattern where the replace call might fail - I don't think failing string operations is a user-friendly option.
- The
bytesWrittenvalue can be checked to see if the provided buffer was used or filled (and by how much) and the caller can act accordingly.
Question: would we also want a public static ReadOnlySpan<char> Replace(this ReadOnlySpan<char> str, char oldChar, char newChar)? If so, that's probably easy enough to implement separately since we don't need to calculate the final buffer size or look ahead to determine matching.
I've been working out a naive implementation alongside the API definition to clarify my thoughts on the API surface. Here's what I've got so far - I include it for reference and as a point of discussion (and also to get any tips on how to further reduce allocations or improve performance, it's not quite as light or fast as the OOTB string.Replace() though I suspect there are still some optimizations to be had):
Expand
public static class SpanExtensions
{
public static ReadOnlySpan<char> Replace(
this ReadOnlySpan<char> str,
ReadOnlySpan<char> oldValue,
ReadOnlySpan<char> newValue) =>
Replace(str, oldValue, newValue, null, out int _);
public static ReadOnlySpan<char> Replace(
this ReadOnlySpan<char> str,
ReadOnlySpan<char> oldValue,
ReadOnlySpan<char> newValue,
Span<char> buffer,
out int bytesWritten)
{
if (str == null)
{
throw new ArgumentNullException(nameof(str));
}
if (oldValue == null)
{
throw new ArgumentNullException(nameof(oldValue));
}
if (oldValue.Length == 0)
{
throw new ArgumentException("String cannot be of zero length", nameof(oldValue));
}
// When they're both equal just return the original
bytesWritten = 0;
if (newValue != null && oldValue.Equals(newValue, StringComparison.Ordinal))
{
return str;
}
// Get the replacement positions// Figure out the replacement positions
// Partly lifted from logic in StringBuilder.Replace()
int[] replacements = null;
int replacementsCount = 0;
int strIndex = 0;
for (int c = 0; c < str.Length; c++)
{
if (str[c] == oldValue[strIndex])
{
strIndex++;
}
if (strIndex == oldValue.Length)
{
if (replacements == null)
{
replacements = new int[5];
}
else if (replacementsCount > replacements.Length)
{
// From StringBuilder.Replace() - grow by 1.5x but more in the begining
int[] newReplacements = new int[replacements.Length * 3 / 2 + 4];
Array.Copy(replacements, newReplacements, replacements.Length);
replacements = newReplacements;
}
replacements[replacementsCount++] = c - strIndex + 1;
strIndex = 0;
}
}
// Return the original string if no replacements were performed
if (replacementsCount == 0)
{
return str;
}
// Size the buffer
if (buffer == null || buffer.Length < str.Length + ((newValue.Length - oldValue.Length) * replacementsCount))
{
// Create a new buffer because the passed in one wasn't the right size
buffer = new char[str.Length + ((newValue.Length - oldValue.Length) * replacementsCount)];
}
else
{
// Use the existing buffer and set the count
bytesWritten = str.Length + ((newValue.Length - oldValue.Length) * replacementsCount);
}
// Perform replacements
strIndex = 0;
int charsIndex = 0;
for (int c = 0; c < replacementsCount; c++)
{
// Copy up to the next replacement
while (strIndex < replacements[c])
{
buffer[charsIndex++] = str[strIndex++];
}
// Perform the replacement (use strIndex to temporarily hold value index)
strIndex = 0;
while (strIndex < newValue.Length)
{
buffer[charsIndex++] = newValue[strIndex++];
}
strIndex = replacements[c] + oldValue.Length;
}
// No more replacements, copy the rest of the string
while (strIndex < str.Length)
{
buffer[charsIndex++] = str[strIndex++];
}
// Slice the buffer if we didn't use the whole thing
return charsIndex == bytesWritten ? buffer : buffer.Slice(0, charsIndex);
}
}cc @davkean who expressed an interest in this API.