From 89a86a6f5bb51113c3f1909baf82fdda52262dcc Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:57:56 +0000 Subject: [PATCH 1/6] Initial plan From d2f424638ed21971dd2c189ffbe763ef369391ef Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 03:03:35 +0000 Subject: [PATCH 2/6] Move Cluster Concepts section under Solr Concepts in reference guide Co-authored-by: epugh <22395+epugh@users.noreply.github.com> --- .../configuration-guide/pages/configuration-files.adoc | 2 +- .../modules/configuration-guide/pages/coreadmin-api.adoc | 2 +- .../modules/deployment-guide/deployment-nav.adoc | 1 - .../modules/deployment-guide/pages/cloud-screens.adoc | 2 +- .../modules/deployment-guide/pages/installing-solr.adoc | 4 ++-- .../pages/solr-control-script-reference.adoc | 2 +- .../pages/user-managed-distributed-search.adoc | 2 +- .../pages/user-managed-index-replication.adoc | 2 +- .../modules/getting-started/getting-started-nav.adoc | 1 + .../pages/cluster-types.adoc | 0 .../modules/getting-started/pages/introduction.adoc | 2 +- .../modules/getting-started/pages/solr-admin-ui.adoc | 6 +++--- .../modules/getting-started/pages/solr-glossary.adoc | 2 +- .../modules/query-guide/pages/result-grouping.adoc | 2 +- .../modules/query-guide/pages/spell-checking.adoc | 2 +- 15 files changed, 16 insertions(+), 16 deletions(-) rename solr/solr-ref-guide/modules/{deployment-guide => getting-started}/pages/cluster-types.adoc (100%) diff --git a/solr/solr-ref-guide/modules/configuration-guide/pages/configuration-files.adoc b/solr/solr-ref-guide/modules/configuration-guide/pages/configuration-files.adoc index 63b5eb3f8f51..44c0e0c502b0 100644 --- a/solr/solr-ref-guide/modules/configuration-guide/pages/configuration-files.adoc +++ b/solr/solr-ref-guide/modules/configuration-guide/pages/configuration-files.adoc @@ -96,7 +96,7 @@ The Files screen in the Admin UI lets you browse & view configuration files (suc .The Files Screen image::configuration-files/files-screen.png[Files screen,height=400] -If you are using xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper. +If you are using xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud], the files displayed are the configuration files for this collection stored in ZooKeeper. In user-managed clusters or single-node installations, all files in the `conf` directory are displayed. The configuration files shown may or may not be used by the collection as use of the file depends on how they are referenced in either `solrconfig.xml` or your schema. diff --git a/solr/solr-ref-guide/modules/configuration-guide/pages/coreadmin-api.adoc b/solr/solr-ref-guide/modules/configuration-guide/pages/coreadmin-api.adoc index 261caa7370f5..7b6434d4a386 100644 --- a/solr/solr-ref-guide/modules/configuration-guide/pages/coreadmin-api.adoc +++ b/solr/solr-ref-guide/modules/configuration-guide/pages/coreadmin-api.adoc @@ -18,7 +18,7 @@ // specific language governing permissions and limitations // under the License. -The Core Admin API is primarily used under the covers by the xref:collections-api.adoc[] when running a xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud] cluster. +The Core Admin API is primarily used under the covers by the xref:collections-api.adoc[] when running a xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud] cluster. SolrCloud users should not typically use the CoreAdmin API directly, but the API may be useful for users of user-managed clusters or single-node installations for core maintenance operations. diff --git a/solr/solr-ref-guide/modules/deployment-guide/deployment-nav.adoc b/solr/solr-ref-guide/modules/deployment-guide/deployment-nav.adoc index 55301601e3e0..04dfeb766436 100644 --- a/solr/solr-ref-guide/modules/deployment-guide/deployment-nav.adoc +++ b/solr/solr-ref-guide/modules/deployment-guide/deployment-nav.adoc @@ -30,7 +30,6 @@ *** xref:docker-faq.adoc[] * Scaling Solr -** xref:cluster-types.adoc[] ** SolrCloud Clusters *** xref:solrcloud-shards-indexing.adoc[] *** xref:solrcloud-recoveries-and-write-tolerance.adoc[] diff --git a/solr/solr-ref-guide/modules/deployment-guide/pages/cloud-screens.adoc b/solr/solr-ref-guide/modules/deployment-guide/pages/cloud-screens.adoc index a4f23232d45f..70e1c12c08c4 100644 --- a/solr/solr-ref-guide/modules/deployment-guide/pages/cloud-screens.adoc +++ b/solr/solr-ref-guide/modules/deployment-guide/pages/cloud-screens.adoc @@ -21,7 +21,7 @@ This screen provides status information about each collection & node in your clu .Only Visible When using SolrCloud [NOTE] ==== -The "Cloud" menu option is only available when Solr is running xref:cluster-types.adoc#solrcloud-mode[SolrCloud]. +The "Cloud" menu option is only available when Solr is running xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud]. User-managed clusters or single-node installations will not display this option. ==== diff --git a/solr/solr-ref-guide/modules/deployment-guide/pages/installing-solr.adoc b/solr/solr-ref-guide/modules/deployment-guide/pages/installing-solr.adoc index 15cc9898a026..4fc9b6cea35d 100644 --- a/solr/solr-ref-guide/modules/deployment-guide/pages/installing-solr.adoc +++ b/solr/solr-ref-guide/modules/deployment-guide/pages/installing-solr.adoc @@ -59,7 +59,7 @@ A very good blog post that discusses the issues to consider is https://lucidwork One thing to note when planning your installation is that a hard limit exists in Lucene for the number of documents in a single index: approximately 2.14 billion documents (2,147,483,647 to be exact). In practice, it is highly unlikely that such a large number of documents would fit and perform well in a single index, and you will likely need to distribute your index across a cluster before you ever approach this number. -If you know you will exceed this number of documents in total before you've even started indexing, it's best to plan your installation with xref:cluster-types.adoc#solrcloud-mode[SolrCloud] as part of your design from the start. +If you know you will exceed this number of documents in total before you've even started indexing, it's best to plan your installation with xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud] as part of your design from the start. == Package Installation @@ -197,7 +197,7 @@ Currently, the available examples you can run are: techproducts, schemaless, and See the section xref:solr-control-script-reference.adoc#running-with-example-configurations[Running with Example Configurations] for details on each example. .Going deeper with SolrCloud -NOTE: Running the `cloud` example demonstrates running multiple nodes of Solr using xref:cluster-types.adoc#solrcloud-mode[SolrCloud] mode. +NOTE: Running the `cloud` example demonstrates running multiple nodes of Solr using xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud] mode. For more information on starting Solr in SolrCloud mode, see the section xref:getting-started:tutorial-solrcloud.adoc[]. === Check if Solr is Running diff --git a/solr/solr-ref-guide/modules/deployment-guide/pages/solr-control-script-reference.adoc b/solr/solr-ref-guide/modules/deployment-guide/pages/solr-control-script-reference.adoc index a7348059cc2e..25740a026258 100644 --- a/solr/solr-ref-guide/modules/deployment-guide/pages/solr-control-script-reference.adoc +++ b/solr/solr-ref-guide/modules/deployment-guide/pages/solr-control-script-reference.adoc @@ -339,7 +339,7 @@ For more information about starting Solr in SolrCloud mode, see also the section `bin/solr start --user-managed` starts Solr in User Managed mode (AKA Standalone mode). This was the default mode up until Solr 10x. -For more information about the different modes, see the section xref:deployment-guide:cluster-types.adoc[]. +For more information about the different modes, see the section xref:getting-started:cluster-types.adoc[]. ==== Running with Example Configurations diff --git a/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-distributed-search.adoc b/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-distributed-search.adoc index 460039639cab..cb59d1349081 100644 --- a/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-distributed-search.adoc +++ b/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-distributed-search.adoc @@ -18,7 +18,7 @@ When using traditional index sharding, you will need to consider how to query your documents. -It is highly recommended that you use xref:cluster-types.adoc#solrcloud-mode[SolrCloud] when needing to scale up or scale out. +It is highly recommended that you use xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud] when needing to scale up or scale out. The setup described below is legacy and was used prior to the existence of SolrCloud. SolrCloud provides for a truly distributed set of features with support for things like automatic routing, leader election, optimistic concurrency and other sanity checks that are expected out of a distributed system. diff --git a/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-index-replication.adoc b/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-index-replication.adoc index ea3f0f376747..2b8a453694a5 100644 --- a/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-index-replication.adoc +++ b/solr/solr-ref-guide/modules/deployment-guide/pages/user-managed-index-replication.adoc @@ -43,7 +43,7 @@ Configuring replication is therefore similar to any normal request handler. .Replication In SolrCloud [NOTE] ==== -Although there is no explicit concept of leader or follower nodes in a xref:cluster-types.adoc#solrcloud-mode[SolrCloud cluster], the `ReplicationHandler` discussed on this page is still used by SolrCloud as needed to support "shard recovery" – but this is done in a peer to peer manner. +Although there is no explicit concept of leader or follower nodes in a xref:getting-started:cluster-types.adoc#solrcloud-mode[SolrCloud cluster], the `ReplicationHandler` discussed on this page is still used by SolrCloud as needed to support "shard recovery" – but this is done in a peer to peer manner. When using SolrCloud, the `ReplicationHandler` must be available via the `/replication` path. Solr does this implicitly unless overridden explicitly in your `solrconfig.xml`. diff --git a/solr/solr-ref-guide/modules/getting-started/getting-started-nav.adoc b/solr/solr-ref-guide/modules/getting-started/getting-started-nav.adoc index 095f679f93b6..562ba636558e 100644 --- a/solr/solr-ref-guide/modules/getting-started/getting-started-nav.adoc +++ b/solr/solr-ref-guide/modules/getting-started/getting-started-nav.adoc @@ -23,6 +23,7 @@ ** xref:solr-indexing.adoc[] ** xref:searching-in-solr.adoc[] ** xref:relevance.adoc[] +** xref:cluster-types.adoc[] ** xref:solr-glossary.adoc[] * xref:solr-tutorial.adoc[] diff --git a/solr/solr-ref-guide/modules/deployment-guide/pages/cluster-types.adoc b/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc similarity index 100% rename from solr/solr-ref-guide/modules/deployment-guide/pages/cluster-types.adoc rename to solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc diff --git a/solr/solr-ref-guide/modules/getting-started/pages/introduction.adoc b/solr/solr-ref-guide/modules/getting-started/pages/introduction.adoc index c29091ac2a6f..35b4183ba3ad 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/introduction.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/introduction.adoc @@ -40,7 +40,7 @@ Any platform capable of HTTP can talk to Solr. Several xref:deployment-guide:client-apis.adoc[] are provided for use in common programming languages. In addition to providing a network accessible engine for Lucene based document retrieval, Solr provides the ability to scale beyond the limitations of a single machine. -Indexes can be sharded and replicated for performance and reliability, using either one of two xref:deployment-guide:cluster-types.adoc[]. +Indexes can be sharded and replicated for performance and reliability, using either one of two xref:cluster-types.adoc[]. The most scalable option uses https://zookeeper.apache.org/[Apache Zookeeper^TM^] to coordinate management activities across the cluster. The older approach requires no supporting infrastructure, however instances are managed directly by administrators. Solr scaling and high availability features are so effective that some of the largest and most famous internet sites use Solr. diff --git a/solr/solr-ref-guide/modules/getting-started/pages/solr-admin-ui.adoc b/solr/solr-ref-guide/modules/getting-started/pages/solr-admin-ui.adoc index d23d3c12192f..d635a1d545bd 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/solr-admin-ui.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/solr-admin-ui.adoc @@ -33,7 +33,7 @@ The left-side of the screen is a menu under the Solr logo that provides the navi The first set of links are for system-level information and configuration and provide access to xref:deployment-guide:configuring-logging.adoc#logging-screen[Logging Screen], xref:deployment-guide:collections-core-admin.adoc[], and xref:deployment-guide:jvm-settings.adoc#java-properties-screen[Java Properties Screen], among other things. At the end of this information is at least one pulldown listing Solr cores configured for this instance. -On xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud] nodes, an additional pulldown list shows all collections in this cluster. +On xref:cluster-types.adoc#solrcloud-mode[SolrCloud] nodes, an additional pulldown list shows all collections in this cluster. Clicking on a collection or core name shows secondary menus of information for the specified collection or core, such as a xref:indexing-guide:schema-browser-screen.adoc[], xref:configuration-guide:configuration-files.adoc#files-screen[Files Screen], xref:deployment-guide:plugins-stats-screen.adoc[], and a xref:query-guide:query-screen.adoc[] on indexed data. The left-side navigation appears on every screen, while the center changes to the detail of the option selected. @@ -98,7 +98,7 @@ image::solr-admin-ui/schema-designer.png[image] .Only Visible When Using SolrCloud [NOTE] ==== -The Schema Designer is only available on Solr instances running xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud]. +The Schema Designer is only available on Solr instances running xref:cluster-types.adoc#solrcloud-mode[SolrCloud]. ==== == Collection-Specific Tools @@ -108,7 +108,7 @@ In the left-hand navigation bar, you will see a pull-down menu titled Collection .Only Visible When Using SolrCloud [NOTE] ==== -The Collection Selector pull-down menu is only available on Solr instances running xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud]. +The Collection Selector pull-down menu is only available on Solr instances running xref:cluster-types.adoc#solrcloud-mode[SolrCloud]. User-managed clusters or single-node installations will not display this menu, instead the Collection specific UI pages described in this section will be available in the <>. ==== diff --git a/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc b/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc index 9c5986f70d04..ae786a799a4e 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc @@ -184,7 +184,7 @@ In SolrCloud, a logical partition of a single <>. Every shard consists of at least one physical <>, but there may be multiple Replicas distributed across multiple <> for fault tolerance. See also <>. -[[solrclouddef]]xref:deployment-guide:cluster-types.adoc#solrcloud-mode[SolrCloud]:: +[[solrclouddef]]xref:cluster-types.adoc#solrcloud-mode[SolrCloud]:: Umbrella term for a suite of functionality in Solr which allows managing a <> of Solr <> for scalability, fault tolerance, and high availability. [[schema]]xref:indexing-guide:schema-elements.adoc[Solr Schema (managed-schema.xml or schema.xml)]:: diff --git a/solr/solr-ref-guide/modules/query-guide/pages/result-grouping.adoc b/solr/solr-ref-guide/modules/query-guide/pages/result-grouping.adoc index cc3b824fdce1..1c9192dabc82 100644 --- a/solr/solr-ref-guide/modules/query-guide/pages/result-grouping.adoc +++ b/solr/solr-ref-guide/modules/query-guide/pages/result-grouping.adoc @@ -383,7 +383,7 @@ This is because one result for "memory" did not have a price assigned to it. == Distributed Result Grouping Caveats -Grouping is supported for xref:deployment-guide:cluster-types.adoc#solrcloud-mode[distributed searches], with some caveats: +Grouping is supported for xref:getting-started:cluster-types.adoc#solrcloud-mode[distributed searches], with some caveats: * Currently `group.func` is not supported in any distributed searches * `group.ngroups` and `group.facet` require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. diff --git a/solr/solr-ref-guide/modules/query-guide/pages/spell-checking.adoc b/solr/solr-ref-guide/modules/query-guide/pages/spell-checking.adoc index fea8e94c5bfc..9b122bbfaf92 100644 --- a/solr/solr-ref-guide/modules/query-guide/pages/spell-checking.adoc +++ b/solr/solr-ref-guide/modules/query-guide/pages/spell-checking.adoc @@ -580,7 +580,7 @@ s|Required |Default: none |=== + Specifies the shards in your distributed indexing configuration. -For more information about distributed indexing, see xref:deployment-guide:cluster-types.adoc[]. +For more information about distributed indexing, see xref:getting-started:cluster-types.adoc[]. `shards.qt`:: + From c807443dd949c6934464fc55969ffae4948fe979 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Sun, 29 Mar 2026 08:35:19 -0400 Subject: [PATCH 3/6] place holder changelog, will update if we make any big moves --- .../SOLR-18179-move-cluster-concepts-section.yml | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 changelog/unreleased/SOLR-18179-move-cluster-concepts-section.yml diff --git a/changelog/unreleased/SOLR-18179-move-cluster-concepts-section.yml b/changelog/unreleased/SOLR-18179-move-cluster-concepts-section.yml new file mode 100644 index 000000000000..103e0aabb961 --- /dev/null +++ b/changelog/unreleased/SOLR-18179-move-cluster-concepts-section.yml @@ -0,0 +1,7 @@ +title: Move cluster concepts into Getting Started in Ref Guide +type: changed +authors: + - name: Eric Pugh +links: + - name: SOLR-18179 + url: https://issues.apache.org/jira/browse/SOLR-18179 From 6c87297cac3a816d4fb04e670f882eb2c5fd5593 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Sun, 29 Mar 2026 09:00:42 -0400 Subject: [PATCH 4/6] Bring in new concept of servers and nodes, rework definitions to try and be more explicit --- .../getting-started/pages/cluster-types.adoc | 119 ++++++++++++------ 1 file changed, 82 insertions(+), 37 deletions(-) diff --git a/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc b/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc index 44a03d3e805a..ca72dec85614 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/cluster-types.adoc @@ -16,7 +16,7 @@ // specific language governing permissions and limitations // under the License. -A Solr cluster is a group of servers (_nodes_) that each run Solr. +A Solr cluster is a group of servers that each run one or more Solr _nodes_. There are two general modes of operating a cluster of Solr nodes. One mode provides central coordination of the Solr nodes (<>), while the other allows you to operate a cluster without this central coordination (<>). @@ -29,38 +29,83 @@ First let's cover a few general concepts and then outline the differences betwee == Cluster Concepts +=== Servers and Nodes + +A _server_ is the hardware or virtual machine that hosts Solr software. +A _node_ is an instance of a running Solr process that services search and indexing requests. +Large servers may run multiple Solr nodes, though typically one node per server is most common. + === Shards -In both cluster modes, a single logical index can be split across nodes as _shards_. -Each shard contains a subset of overall index. +In both cluster modes, a logical collection of documents can be divided across nodes as _shards_. +Each shard represents a logical slice of the overall collection and contains a subset of the documents. -The number of shards dictates the theoretical limit to the number of documents that can be indexed to Solr. -It also determines the amount of parallelization possible for an individual search request. +The number of shards determines the theoretical limit to the number of documents that can be stored. +It also dictates the amount of parallelization possible for an individual search request. === Replicas -In order to provide some failover for each shard, each shard can be copied as a _replica_. -A replica has the same configuration as the shard and any other replicas for the same index. +A shard is a logical concept—a slice of your collection. +A _replica_ is the physical manifestation of that logical shard. +It is the actual running instance that holds and serves the documents belonging to that shard. + +A shard must have at least one replica to exist physically. +If you have one shard with one physical copy, you have one replica. +If you add redundancy by creating additional copies of that shard, you have multiple replicas—each is equally a replica, including the first one. -It's possible to have replicas without having created shards. -In this case, each replica would be a full copy of the entire index, instead of being only a copy of a part of the entire index. +IMPORTANT: There is no "original shard" separate from its replicas. +The replicas ARE how the shard exists. +This is why we say "a shard with 2 replicas" has 2 total physical copies, not an original plus 2 additional copies. -The number of replicas determines the level of fault tolerance the entire cluster has in the event of a node failure. +All replicas of the same shard contain the same subset of documents and share the same configuration. + +The number of replicas determines the level of fault tolerance the cluster has in the event of a node failure. It also dictates the theoretical limit on the number of concurrent search requests that can be processed under heavy load. -=== Leaders +=== Leaders and Followers -Once replicas have been created, a _leader_ must be identified. -The responsibility of the leader is to be a source-of-truth for each replica. -When updates are made to the index, they are first processed by the leader and then by each replica (the exact mechanism for how this happens varies). +Among the replicas for a given shard, one replica is designated as the _leader_. +The leader serves as the source-of-truth for its shard. +When document updates are made, they are first processed by the leader replica and then propagated to the other replicas (the exact mechanism varies by cluster mode). -The replicas which are not leaders are _followers_. +The replicas which are not leaders are called _followers_. === Cores -Each replica, whether it is a leader or a follower, is called a _core_. +In Solr's implementation, each replica is represented as a _core_. +The term "core" is primarily an internal implementation detail—when you create a replica, Solr creates a core to represent it. Multiple cores can be hosted on any one node. +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will help clarify discussions about Solr's architecture. + +=== Collections and Indexes + +A _collection_ is the complete logical set of searchable documents that share a schema and configuration. +In SolrCloud mode (described below), a collection encompasses all the shards and their replicas. + +An _index_ refers to the physical data structures written to disk by Apache Lucene. +Each core (replica) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + +This creates a clear hierarchy from logical concepts to physical storage: + +[source,text] +---- +Collection (logical grouping of all searchable documents) + └─> Shard 1 (logical partition) + │ └─> Replica 1 / Core 1 (physical instance) + │ │ └─> Lucene Index (disk structures) + │ └─> Replica 2 / Core 2 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Shard 2 (logical partition) + └─> Replica 1 / Core 3 (physical instance) + │ └─> Lucene Index (disk structures) + └─> Replica 2 / Core 4 (physical instance) + └─> Lucene Index (disk structures) +---- + +In this example, a collection is divided into 2 shards, each shard has 2 replicas for redundancy, and each replica maintains its own Lucene index on disk. + == SolrCloud Mode SolrCloud mode (also called "SolrCloud") uses Apache ZooKeeper to provide the centralized cluster management that is its main feature. @@ -69,45 +114,45 @@ ZooKeeper tracks each node of the cluster and the state of each core on each nod In this mode, configuration files are stored in ZooKeeper and not on the file system of each node. When configuration changes are made, they must be uploaded to ZooKeeper, which in turn makes sure each node knows changes have been made. -SolrCloud introduces an additional concept, a _collection_. -A collection is the entire group of cores that represent an index: the logical shards and the physical replicas for each shard. -Collections all share the same configurations (schema, `solrconfig.xml`, etc.). -This is an additional centralization of the cluster management, as operations can be performed on the entire collection at one time. +SolrCloud manages collections as first-class entities. +A collection represents the entire group of shards and replicas that together provide access to a corpus of documents. +Collections share the same configurations (schema, `solrconfig.xml`, etc.). +This centralization of cluster management means that operations can be performed on the entire collection at one time. -When changes are made to configurations, a single command to reload the collection would automatically reload each individual core that is a member of the collection. +When changes are made to configurations, a single command to reload the collection will automatically reload each individual core (replica) that is a member of the collection. Sharding is handled automatically, simply by telling Solr during collection creation how many shards you'd like the collection to have. -Index updates are then generally balanced between each shard automatically. +Document updates are then generally balanced between each shard automatically. Some degree of control over what documents are stored in which shards is also available, if needed. ZooKeeper also handles load balancing and failover. Incoming requests, either to index documents or for user queries, can be sent to any node of the cluster and ZooKeeper will route the request to an appropriate replica of each shard. -In SolrCloud, the leader is flexible, with built-in mechanisms for automatic leader election in case of failure in the leader. -This means another core can become the leader, and from that point forward it is the source-of-truth for all replicas. +In SolrCloud, the leader is flexible, with built-in mechanisms for automatic leader election in case the current leader fails. +This means another replica can become the leader, and from that point forward it is the source-of-truth for all other replicas of that shard. As long as one replica of each relevant shard is available, a user query or indexing request can still be satisfied when running in SolrCloud mode. == User-Managed Mode -Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for to be performed manually or with local scripts. +Solr's user-managed mode requires that cluster coordination activities that SolrCloud normally uses ZooKeeper for be performed manually or with local scripts. -If the corpus of documents is too large for a single-sharded index, the logic to create shards is entirely left to the user. +If the corpus of documents is too large for a single shard, the logic to create multiple shards is entirely left to the user. There are no automated or programmatic ways for Solr to create shards during indexing. -Routing documents to shards are handled manually, either with a simple hashing system or a simple round-robin list of shards that sends each document to a different shard. +Routing documents to shards is handled manually, either with a simple hashing system or a simple round-robin list of shards that sends each document to a different shard. Document updates must be sent to the right shard or duplicate documents could result. -In user-managed mode, the concept of leader and follower becomes critical. -Identifying which node will host the leader replica and which host(s) will be replicas dictate how each node is configured. -In this mode, all index updates are sent to the leader only. -Once the leader has completed indexing, the replica will request the index updates and copy them from the leader. +In user-managed mode, the distinction between leader and follower replicas becomes critical. +Identifying which node will host the leader replica and which host(s) will have follower replicas dictates how each node is configured. +In this mode, all document updates are sent to the leader replica only. +Once the leader has completed indexing, each follower replica will request the index updates and copy them from the leader. -Load balancing is achieved with an external tool or process, unless request traffic can be managed by the leader or one of its replicas alone. +Load balancing is achieved with an external tool or process, unless request traffic can be managed by the leader or one of its follower replicas alone. -If the leader goes down, there is no built-in failover mechanism. -A replica could continue to serve queries if the queries were specifically directed to it. -Changing a replica to serve as the leader would require changing `solrconfig.xml` configurations on all replicas and reloading each core. +If the leader replica goes down, there is no built-in failover mechanism. +A follower replica could continue to serve queries if the queries were specifically directed to it. +Promoting a follower replica to serve as the leader would require changing `solrconfig.xml` configurations on all replicas and reloading each core. -User-managed mode has no concept of a collection, so for all intents and purposes each Solr node is distinct from other nodes. -Only some configuration parameters keep each node from behaving as independent entities. +User-managed mode has no concept of a collection as a managed entity, so for all intents and purposes each Solr core is configured and managed independently. +Only configuration parameters keep related cores from behaving as completely independent entities. \ No newline at end of file From a7735c70c71f2ea9bc9725fe62f2b71b2e0ccec1 Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Sun, 29 Mar 2026 09:04:58 -0400 Subject: [PATCH 5/6] expand/update/add to glossary based on the solr cluster concepts --- .../getting-started/pages/solr-glossary.adoc | 53 ++++++++++++++----- 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc b/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc index ae786a799a4e..a2c36dd8b5f1 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc @@ -50,22 +50,31 @@ A cluster may contain many collections. See also <>. [[collection]]Collection:: -In Solr, one or more <> grouped together in a single logical index using a single configuration and Schema. +The complete logical set of searchable documents that share a schema and configuration. + -In <> a collection may be divided up into multiple logical shards, which may in turn be distributed across many nodes. +In <>, a collection may be divided up into multiple logical <>, which may in turn be distributed across many <> for scalability and fault tolerance. +Each collection encompasses all the shards and their <>. + -Single-node installations and user-managed clusters use instead the concept of a <>. -"Collection" is most frequently used in the SolrCloud context, but as it represents a "logical index", the term may be used to refer to individual cores in a user-managed cluster as well. +Single-node installations and user-managed clusters do not manage collections as first-class entities; instead they work directly with individual <>. + [[defcommit]]Commit:: To make document changes permanent in the index. In the case of added documents, they would be searchable after a _commit_. [[core]]Core:: -An individual Solr instance (represents a logical index). -Multiple cores can run on a single node. +In Solr's implementation, a core is the physical instance that represents a <>. +When you create a replica, Solr creates a core to represent it. +Multiple cores can be hosted on a single <>. +Each core maintains exactly one Lucene <> on disk. ++ +NOTE: The term "core" can be confusing because in everyday English it implies something central and singular, but in Solr it actually refers to one of potentially many replicas distributed across the cluster. +In most contexts, thinking of "core" as synonymous with "replica" will clarify discussions about Solr's architecture. ++ See also <>. +[[corpus]]Corpus:: +The set of documents available for indexing, irrespective of whether or not they are currently indexed in Solr. + [[corereload]]Core reload:: To re-initialize a Solr core after changes to the schema file, `solrconfig.xml` or other configuration files. @@ -96,6 +105,11 @@ The arrangement of search results into categories based on indexed terms. [[field]]Field:: The content to be indexed/searched along with metadata defining how the content should be processed by Solr. +[[follower]]Follower:: +A <> that is not the <> for its <>. +Follower replicas receive index updates from the leader replica and serve queries. +See also <>. + [[SolrGlossary-I]] === I @@ -105,6 +119,10 @@ It is calculated as the number of total Documents divided by the number of Docum See http://en.wikipedia.org/wiki/Tf-idf and {lucene-javadocs}/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html[the Lucene TFIDFSimilarity javadocs] for more info on TF-IDF based scoring and Lucene scoring in particular. See also <>. +[[index]]Index:: +The physical data structures written to disk by Apache Lucene. +Each <> (<>) maintains exactly one Lucene index on disk, containing the actual inverted indexes, stored fields, and other data structures that enable search. + [[invertedindex]]Inverted index:: A way of creating a searchable index that lists every word and the documents that contain those words, similar to an index in the back of a book which lists words and the pages on which they can be found. When performing keyword searches, this method is considered more efficient than the alternative, which would be to create a list of documents paired with every word used in each document. @@ -114,8 +132,8 @@ Since users search using terms they expect to be in documents, finding the term === L [[leader]]Leader:: -A single <> for each <> that takes charge of coordinating index updates (document additions or deletions) to other replicas in the same shard. -This is a transient responsibility assigned to a node via an election, if the current Shard Leader goes down, a new node will automatically be elected to take its place. +A single <> for each <> that serves as the source-of-truth and coordinates index updates (document additions or deletions) to the <> replicas in the same shard. +This is a transient responsibility assigned to a replica via an election; if the current leader goes down, another replica will automatically be elected to take its place. See also <>. [[SolrGlossary-M]] @@ -132,8 +150,8 @@ Metadata is information about a document, such as its title, author, or location A search that is entered as a user would normally speak or write, as in, "What is aspirin?" [[node]]Node:: -A JVM instance running Solr. -Also known as a Solr server. +An instance of a running Solr process that services search and indexing requests. +A node is a JVM instance running Solr on a <>. [[SolrGlossary-O]] === O @@ -163,7 +181,11 @@ The ability of a search engine to retrieve _all_ of the possible matches to a us The appropriateness of a document to the search conducted by the user. [[replica]]Replica:: -A <> that acts as a physical copy of a <> in a <> <>. +The physical manifestation of a logical <>. +A replica is the actual running instance (represented as a <>) that holds and serves the documents belonging to that shard. +A shard must have at least one replica to exist physically, and may have multiple replicas for redundancy and fault tolerance. +All replicas of the same shard contain the same subset of documents. +See also <>. [[replication]]xref:deployment-guide:user-managed-index-replication.adoc[Replication]:: @@ -179,9 +201,14 @@ Logic and configuration parameters that tell Solr how to handle incoming "reques Logic and configuration parameters used by request handlers to process query requests. Examples of search components include faceting, highlighting, and "more like this" functionality. +[[server]]Server:: +The hardware or virtual machine that hosts Solr software. +A server may run one or more Solr <>. + [[shard]]Shard:: -In SolrCloud, a logical partition of a single <>. -Every shard consists of at least one physical <>, but there may be multiple Replicas distributed across multiple <> for fault tolerance. +A logical slice of a <>. +Each shard represents a logical partition containing a subset of the collection's documents. +A shard exists physically as one or more <>, which may be distributed across multiple <> for fault tolerance and scalability. See also <>. [[solrclouddef]]xref:cluster-types.adoc#solrcloud-mode[SolrCloud]:: From 48e1f9d3bd61179dcc93d47ffde25b483b03185b Mon Sep 17 00:00:00 2001 From: Eric Pugh Date: Sun, 29 Mar 2026 09:13:50 -0400 Subject: [PATCH 6/6] add in common terms standalone and user managed. --- .../getting-started/pages/solr-glossary.adoc | 22 ++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc b/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc index a2c36dd8b5f1..2216438699af 100644 --- a/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc +++ b/solr/solr-ref-guide/modules/getting-started/pages/solr-glossary.adoc @@ -25,7 +25,7 @@ Where possible, terms are linked to relevant parts of the Solr Reference Guide f *Jump to a letter:* -<> <> <> <> <> <> G H <> J K <> <> <> <> P <> <> <> <> U V <> X Y <> +<> <> <> <> <> <> G H <> J K <> <> <> <> P <> <> <> <> <> V <> X Y <> [[SolrGlossary-A]] @@ -44,10 +44,10 @@ These control the inclusion or exclusion of keywords in a query by using operato [[SolrGlossary-C]] === C -[[cluster]]Cluster:: +[[cluster]]xref:cluster-types.adoc[Cluster]:: In Solr, a cluster is a set of Solr nodes operating in coordination with each other via <>, and managed as a unit. A cluster may contain many collections. -See also <>. +See also xref:cluster-types.adoc[] and <>. [[collection]]Collection:: The complete logical set of searchable documents that share a schema and configuration. @@ -240,6 +240,12 @@ Synonyms generally are terms which are near to each other in meaning and may sub In a search engine implementation, synonyms may be abbreviations as well as words, or terms that are not consistently hyphenated. Examples of synonyms in this context would be "Inc." and "Incorporated" or "iPod" and "i-pod". +[[standalone]]Standalone:: +An informal term referring to Solr deployments that do not use <> mode. +This includes both single-node installations and <> clusters. +In source code and documentation, "Standalone" may refer to either <> or single-node deployments. +See also xref:cluster-types.adoc[] and <>. + [[SolrGlossary-T]] === T @@ -252,6 +258,16 @@ See also <>. An append-only log of write operations maintained by each <>. This log is required with SolrCloud implementations and is created and managed automatically by Solr. +[[SolrGlossary-U]] +=== U + +[[usermanaged]]xref:cluster-types.adoc#user-managed-mode[User-Managed Mode]:: +A mode of operating a Solr <> without the centralized coordination provided by <> in <> mode. +In user-managed mode, cluster coordination activities must be performed manually or with local scripts. +This includes shard creation, document routing, leader/follower configuration, and load balancing. +Also known as <> mode. +See also xref:cluster-types.adoc[]. + [[SolrGlossary-W]] === W