buffer: refactor buffer to be dynamically-sized by cataldor · Pull Request #53 · MusicPlayerDaemon/libmpdclient

cataldor · 2020-03-27T14:26:19Z

Allow the internal buffer of libmpdclient to grow dynamically in size
as needed by mpd_{async/sync}_send_command.
Previously, the buffer was limited to 4KiB.

Address issue reported downstream by Debian[1] and upstream by kaliko
(@mxjeff)[2]
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953110
[2] #51

Hello,
this is a solution to issue #51. It requires refactoring the buffer source code.
I will provide another solution in case you want to stick with the hardcoded limit.

@mxjeff

Allow the internal buffer of libmpdclient to grow dynamically in size as needed by mpd_{async/sync}_send_command. Previously, the buffer was limited to 4KiB. Address issue reported downstream by Debian[1] and upstream by kaliko (@mxjeff)[2] [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953110 [2] MusicPlayerDaemon#51

cataldor · 2020-03-27T14:27:39Z

src/sync.c

@@ -100,27 +100,7 @@ bool
 mpd_sync_send_command_v(struct mpd_async *async, const struct timeval *tv0,


PS: struct timeval is not used here because `mpd_sync_io' is not called anymore;
i have left it since all other functions of mpd_sync receive the timeval as well.
If you prefer that i remove it, let me know.

Since this is an internal API, we can remove the parameter easily.
But do we want this? Does this function really need dynamic buffer sizes? What is your reason to refactor the output buffer to be dynamic?

The refactoring is to allow libmpdclient to be limited in its message size only by MPD. If i'm right, currently MPD supports up to 8KiB messages.

You didn't answer my question.

The reasons were: (i) At least one user would like to send messages > 4KiB (issue #51), and (ii) this code would also adapt in case MPD changes the limit of received messages.
You could also just bump the data size to 8KiB and call it a day (that is possible with that other PR, although, today, the limit is still 4KiB)
Since with this patch we can make room for the message, i don't see the need anymore for mpd_sync_io; so i would remove the tv0 parameter.

#51 does not request being able to send large requests. He complains about the lousy error handling if he tries to send a huge malformed request.

From the debian bug report:
"Ideally, libmpdclient should support requests of arbitrary size (eventually
reaching the server limit), but if that is not feasible, at least a proper
error reporting would be nice."

I think for the majority of cases 4KiB is more than sufficient; The reason i submitted the other PR is in case the changes i'm proposing here are deemed undesirable

True, that's where the Debian bug report deviates from #51, but we were talking about #51.
But even the Debian bug report doesn't explain how such a large request might be useful. Just like #51, the Debian bug report describes a malformed request, and supporting malformed requests is not libmpdclient's goal.

He also mentions this feature request in #51. The title is about the hard coded limit.

True, the test seems to be about fuzzing, but from the MPD protocol page i didn't find any mention of a limit to the find/search command. In theory, you could concatenate multiple find requests in a single huge line like: ((artist == 'FOO') AND (album == 'BAR')) AND ...

MaxKellermann · 2020-04-01T15:08:59Z

src/async.c

-	mpd_buffer_init(&async->input);
-	mpd_buffer_init(&async->output);
+	if (mpd_buffer_init(&async->input) == NULL) {
+		async->output.data = NULL;


This is an obscure way of fake-initializing the buffer to allow mpd_async_free() (the mpd_buffer API doesn't document that this is legal). I don't like obscure code like that.
If you refactor this to never fail (like before) and do the first allocation lazily, you can avoid this fragile and cumbersome code.

In that case, we would need to do the allocation in mpd_buffer_room(); currently, it returns size_t which would make the OOM case awkward (return (size_t)-1 for instance). We could change that function to write the size in a pointer. Would you be okay with any of these solutions?

Why would mpd_buffer_room() need to do the allocation? It is a simple read-only function which gives information about the buffer state.

The idea was to initialize in a single place; i realize now that read functions does not use the mpd_buffer_room()
I will update the code to allocate in mpd_buffer_read() and mpd_buffer_write()

Updated code in commit 154d1a3.
This resulted in an increase of code complexity as the code checks for OOM events in more places.

MaxKellermann · 2020-04-01T15:09:49Z

src/async.c

 	assert(async->fd != MPD_INVALID_SOCKET);
 	assert(!mpd_error_is_defined(&async->error));

+	if (!mpd_buffer_make_room(&async->input, min_size)) {


What if you're receiving data from a server which pushes a terabyte to you? DoS bug.

This is my understanding, maybe i'm missing something.
I did not change the reading functions to read all received data. It can still fail for large messages. Details below:

mpd_async_read is only used when (i) you are going to read the buffer or (ii) writing/flushing output data. In addition, mpd_buffer_make_room will only increase the size if there wasn't enough space available.

For (i), you will increase the buffer by 256 bytes (if needed) and read data out. If the message was larger than 256 bytes, an error will occur in mpd_async_recv_line. Even if you divide the terabyte message in chunks of 256 bytes, mpd_async_recv_line will continually free buffer area, so more allocations will not happen.

For (ii), the flush/write will stop once output data is no longer available. Output data is generated locally. Assuming you wrote 32MB of output data [1] (13 loops on mpd_sync_send_command_v) and the input buffer was always full, the new input buffer size is only 7424 bytes.

You can increase the input buffer size by calling mpd_async_recv_raw; but this is done deliberately.

[1] MPD will not accept this, but lets say it does for this example.

If you prefer, we could provide in the meson configure the options for the first and limit allocation sizes for the buffer.

I think you're wrong. You assume that the message will be delivered in chunks of 256 bytes, but why do you believe that is true?
What if a rogue server sends a single line which is 1 TB long?

I see two cases if you send a single 1 TB line
(i) if you are expecting a line, mpd_async_recv_line will not be able to read the 1 TB. The maximum size it can increase the input buffer is min_size, which is currently 256 bytes. If it does not find \n in the input buffer, an error is set.

(ii) You can ask mpd_async_recv_raw to receive a 1TB. But this is the user choice. I know Linux does not normally respect this, but this would be the ENOMEM case.

~~Case (ii) is not currently doing this because i forgot to change it. It will only receive up to the current size available.~~ mpc_async_recv_raw() is actually working properly: it reads the available input buffer size.

MaxKellermann · 2020-04-01T15:10:06Z

src/async.c

-mpd_async_send_command_v(struct mpd_async *async, const char *command,
-			 va_list args)
+static bool
+quote_vargs(struct mpd_buffer *buffer, char *start, char **end_pos,


What does this function do?
Documentation missing!

fixed in 14a9563

MaxKellermann · 2020-04-01T15:10:32Z

src/async.c

-
-	memcpy(dest, command, length);
-	p = dest + length;
+	/**


Don't use Doxygen-style comments unless you're documenting an API.

fixed in 14a9563

MaxKellermann · 2020-04-01T15:11:25Z

src/buffer.h

+	/** size of buffer */
+	size_t data_len;


Badly named variable. This is not the "length" of "data". There is rarely ever that much data in the buffer!

You are right. I've renamed to data_size. Fixed in 3cf9285

But that's still a bad name. Your data is not this size. You don't have that much data.

Could we call it data_max_size then?

That's still ambiguous - reading such a name, I'd think that's the configured limit. What about capacity?

changed in ef706f9

MaxKellermann · 2020-04-01T15:13:10Z

src/buffer.h

+	buffer->data_len = std_data_len;
+	buffer->data = malloc(buffer->data_len);
+	if (buffer->data != NULL)
+		memset(buffer->data, 0, buffer->data_len);


Why memset()? It wasn't here before. Why is this suddenly necesarry?

It is not necessary; it is just a precaution to initialize the data to a known state
(removed in 5ee2021)

MaxKellermann · 2020-04-01T15:13:55Z

src/buffer.h

+ * Free the buffer area.
+ */
+static inline void
+mpd_buffer_end(struct mpd_buffer *buffer)


I'd prefer _deinit() to match with _init().
Not only does this not match, it can also suggest doing a different thing, e.g. obtain a pointer to the end of the buffer/data.

No problem. Fixed in e76558a

MaxKellermann · 2020-04-01T15:14:18Z

src/buffer.h

+		buffer->data = malloc(newsize);
+		if (buffer->data == NULL)
+			return false;
+		memset(buffer->data, 0, newsize);


Why memset()?

For the same reason it is used in mpd_buffer_init: initialize data to a known state
(removed in 5ee2021)

MaxKellermann · 2020-04-01T15:14:24Z

src/buffer.h

+		/* clear region not committed and new region */
+		memset(mpd_buffer_write(buffer), 0,
+		       newsize - (buffer->data_len - mpd_buffer_room(buffer)));


Remove stale (non committed) data from the buffer

I don't understand.

This is to keep the data in a known state.
We always initialize it to zero.
If the user called mpd_buffer_make_room (either directly or indirectly), it may be that it wrote some data and realized that it did not have enough space for it (function quote_vargs); thus, that area is reverted to the zero state. Otherwise, if no non-committed data is present, this memset will do nothing (last parameter is zero).

But what's the point of that? Why is "zero state" any more special than "random uncommitted data"? Zero is an arbitrary value just like any other.
I don't like that we discuss this here, in a PR and in a commit that has nothing to do with this. You're sneaking in an unrelated change.

okay, I will remove the code related to the known state
(removed in 5ee2021)

MaxKellermann · 2020-04-01T15:15:39Z

src/sync.c

@@ -100,27 +100,7 @@ bool
 mpd_sync_send_command_v(struct mpd_async *async, const struct timeval *tv0,


Since this is an internal API, we can remove the parameter easily.
But do we want this? Does this function really need dynamic buffer sizes? What is your reason to refactor the output buffer to be dynamic?

Describe the functionality of quote_vargs Remove doxygen-style comment from the same function

Rename the buffer size to data_size instead of data_len

Rename mpd_buffer_end() to mpd_buffer_deinit() so that mpd_buffer_init() matches with the latter

Change call from mpd_buffer_end() to mpd_buffer_deinit() due to API change

Do not initialize the buffer to zero; data in the buffer is tracked through the buffer API

Make sure the buffer is only initialized when it is used; in other words, when one of these two functions are called: mpd_buffer_read() or mpd_buffer_write()

Make sure data_allocated is initialized when calling malloc() for buffer

MaxKellermann · 2020-04-03T13:20:36Z

src/buffer.h

 	unsigned char *data;

+	/** buffer allocated */
+	bool data_allocated;


What's the point of this? And what's the point of setting data_size to a non-zero value when nothing has been allocated? I'm puzzled about how complicated you chose to make this commit.

So that we can do lazy allocation.
That variable can be removed if we compare buffer->data == NULL, but i thought you would not like that

the objective of setting data_size to non-zero was to avoid having two returns in mpd_buffer_room(), and having an additional if in mpd_buffer_make_room() for having or not allocated yet. The buffer will be allocated once the function calls mpd_buffer_write (or mpd_buffer_make_room() if the requested size is over DEFAULT_BUFFER_SIZE)

You can do lazy allocation by setting data_size (or capacity) to zero, and allocate whenever you need more than the current capacity. No special cases.

The argument in the second paragraph, I don't get it.

Here:

This resulted in an increase of code complexity as the code checks for OOM events in more places.

This design choice of yours is causing this complexity. Making special cases, adding more unnecessary states to the buffer object, all of which need to be checked by every caller.

I have refactor the code with your suggestions. Could you take another look please?
the commit is ef706f9

MaxKellermann · 2020-04-03T13:22:09Z

src/async.c


-	nbytes = recv(async->fd, mpd_buffer_write(&async->input), room,
-		      MSG_DONTWAIT);
+	write_p = mpd_buffer_write(&async->input);


Why can mpd_buffer_write() fail, after mpd_buffer_room() has already verified that buffer space is available?
I'm puzzled.

Because mpd_buffer_room() does not initialize the data, that is the responsibility of mpd_buffer_write()

This design was your idea. And I say it's overly complicated, complex and thus error prone.

Simplify the design of the lazy-initialized buffer The buffer may be initialized in three functions: mpd_async_send_command_v(), mpd_async_recv_line(), and mpd_async_recv_raw()

cataldor commented Mar 27, 2020

View reviewed changes

MaxKellermann requested changes Apr 1, 2020

View reviewed changes

cataldor added 7 commits April 3, 2020 12:41

async: describe the internal function quote_vargs + fix internal comment

14a9563

Describe the functionality of quote_vargs Remove doxygen-style comment from the same function

buffer: properly name the buffer size

3cf9285

Rename the buffer size to data_size instead of data_len

buffer: properly match deinitialize function

e76558a

Rename mpd_buffer_end() to mpd_buffer_deinit() so that mpd_buffer_init() matches with the latter

async: call mpd_buffer_deinit()

a36af8d

Change call from mpd_buffer_end() to mpd_buffer_deinit() due to API change

buffer: remove memset calls

5ee2021

Do not initialize the buffer to zero; data in the buffer is tracked through the buffer API

buffer: lazy initialize the buffer data

154d1a3

Make sure the buffer is only initialized when it is used; in other words, when one of these two functions are called: mpd_buffer_read() or mpd_buffer_write()

buffer: missing data_allocated initialization in mpd_buffer_make_room()

6e9c709

Make sure data_allocated is initialized when calling malloc() for buffer

MaxKellermann reviewed Apr 3, 2020

View reviewed changes

async: buffer: refactor buffer code to simplify lazy-initialized buffer

ef706f9

Simplify the design of the lazy-initialized buffer The buffer may be initialized in three functions: mpd_async_send_command_v(), mpd_async_recv_line(), and mpd_async_recv_raw()

This was referenced Feb 10, 2021

libmpdclient errors with MPD response lines longer than 4KiB joshkunz/ashuffle#89

Open

MPD_ERROR_MALFORMED on response lines longer than 4KiB #69

Open

MaxKellermann mentioned this pull request Aug 5, 2021

Make buffer size configurable at compile time and set it to 8192 #74

Closed

		@@ -100,27 +100,7 @@ bool
		mpd_sync_send_command_v(struct mpd_async async, const struct timeval tv0,

Conversation

cataldor commented Mar 27, 2020

Address issue reported downstream by Debian[1] and upstream by kaliko (@mxjeff)[2] [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953110 [2] #51

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cataldor Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cataldor Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cataldor Apr 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Address issue reported downstream by Debian[1] and upstream by kaliko
(@mxjeff)[2]
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953110
[2] #51

cataldor Apr 3, 2020 •

edited

Loading

cataldor Apr 3, 2020 •

edited

Loading

cataldor Apr 3, 2020 •

edited

Loading

cataldor Apr 3, 2020 •

edited

Loading