Can Magento really handle over 500,000 products? If yes, how?
When choosing an ecommerce platform for a large product catalog, merchants naturally want to be sure whether Magento is the right choice for them, so the question arises: How many products can Magento actually manage? It is hard to give a general answer but it is definitely possible to provide some practical guidelines.
In an effort to give some useful hints, let us try to find an answer to some more precise questions:
- How many products the different versions of Magento can manage easily by default, hosted on an average server, with and without extra optimizations?
- What makes the difference between Magento versions in terms of catalog performance?
- What are the main bottlenecks for a big catalog in Magento, and how can Magento be scaled to handle even more products?
- What are the common mistakes that one can make when handling a huge Magento catalog?
How many products the different versions of Magento can manage easily by default, hosted on an average server, with and without extra scaling?
- Magento CE 1.9.x safely manages ~10,000 – 25,000 products in most cases, without much extra care*. On the other hand it can manage 100,000 – 200,000 products, or even more via heavy, system wide scaling, resource tuning and code optimization.
- Magento EE 13.x. 14.x safely manages ~100,000 – 200,000 products in most cases, without much extra care*, and even 400,000 – 500,000 or even more with proper scaling, optimization and server resources.
- Magento 2 CE safely manages ~ 100,000 – 200,000 products in most cases, without much scaling and extra care, and even 400,000 – 500,000 or even more with proper scaling, optimization and server resources.
- Magento 2 EE is designed to be able to manage even more**, depending on some highly advanced enterprise features like database sharding, job queues, advanced mysql and web server topologies and proper resources.
All these are very rough numbers, of course, referring to a catalog with a few attributes, categories and simple products. The figures are based on Magento’s own performance tests and our own experiences. The figures may vary greatly by the server setup, software and resources, too.
What is also very important to note is that due to Magento’s database design, there are some especially massive aspects that act as multiplication factors to get to the real number of products Magento really works with in the background.
Some of most important elements that have a huge impact are:
- number of Magento shops/languages
- number of product attributes
- number of categories and depth of the category tree
- number of configurable/bundle products
- number of customer groups with different product prices
All this means that as far as catalog management is concerned, a Magento catalog with a few thousand products in a heavyweight catalog setup, for example ~20 languages, diverse attributes and category structure, mostly with configurable products and multiple customer price groups, may equal to a Magento catalog with 100,000+ simple products in a single store.
These are the features that make Magento really flexible, but they come at a cost.
What makes the difference between these Magento versions in term of catalog performance?
- Magento CE 1.x, including 1.9.x and
- – indexing, especially URL and search indexes are not optimized for large catalogs (this also applies to EE up to 1.12.x).
- – No Full Page Cache (FPC) is available by default
- Magento 1.x EE
- + FPC is available, which speeds up catalog browsing and saves a lot of server resources
- + From version 1.13+.x incremental indexing is introduced whereby products that were changed or added will be re-indexed in the background by cron jobs.
- + From version 1.13+.x full reindex processes are highly improved as well and work well even for large catalogs.
- + Solr search engine is available by default
- Magento 2 CE
- + Inherits incremental indexing feature from Magento EE 1.13+.x
- + Inherits FPC from EE 1.13+.x and Varnish frontend cache is added as a choice of FPC. The requests that are served by the Varnish cache never need to reach the Magento application servers, which reduces the load on the web nodes while dramatically improving the response time.
- + Browser cache is utilized for session data caching
- + Checkout process is improved greatly
- + Async order and product updates
- + Client side optimizations like minification, js resources bundling, caching static content, image compression
- + PHP 7 is supported by default. PHP 7 may have even 200% performance gain over PHP 5.x by itself.
- Magento 2 EE
- + Has all the features of Magento 2 CE
- + Solr (2.0) and Elasticsearch (2.1) for search
- + Database sharding separating catalog, checkout and order business domains is available
- + Mysql cluster and Multi-master mysql setup is supported
- + Job queues introduced for advanced background data processing (deferred stock update as the first implementation)
What are the main bottlenecks for a big catalog in Magento and how can Magento be scaled to handle even more products?
- Hosting – a big catalog obviously needs more resources, a VPS with plenty of memory and a multicore processor is a must, a multi-node server setup may be needed.
- Server software – Nginx as a web server and PHP 7 and Mysql 5.6 or equivalent (Percona/MariaDb) are highly recommended, even for Magento 1. Fine tuning of these pieces of server software for Magento is essential in this case.
- Product Import – optimized and tailored product import and update is one of the key elements in this case. Tools that enable batch database updates are extremely helpful. Any method or tool that uses single product updates is a huge potential bottleneck. Magmi is one of the best choices here.
- Some other rules of thumb:
- Save only what has changed
- Use dedicated resources for import processes
- Separate price, stock and basic product data import
- Re-index only products and product data that needs to be reindexed
- Indexing – Indexing in Magento is a second step of saving product data and it is the trickiest part when having a huge catalog. It contains of a series of processes to copy product data from database tables optimized for data storage to tables optimized for different aspects of frontend data access. Since indexing is “only” needed for Magento frontend features, it is possible to separate indexing from product save or import. In the newer versions of Magento 1 EE and Magento 2 CE and EE incremental background indexing is introduced, which speeds up working in the admin but may still not be ideal for large volume product updates.
- Indexing in Magento 1 CE is one of the greatest bottlenecks.
- URL indexing tends to provide the most issues, partly because it can be bloated to millions of records, partly because it runs for a long time and in Magento 1 CE it is not optimized for big catalogs.
- Beyond a certain catalog size and number of Magento stores the increased overhead of product flat indexes will outweigh its benefits so it tends to be better not to turn on this feature for a big catalog.
- Indexing in Magento 1 CE is one of the greatest bottlenecks.
- Catalog search
- Default MYSQL fulltext search is resource greedy, indexing and the actual search on the frontend tends to be slow and its accuracy and the relevance of the result set is fairly poor. So it has to be replaced, even in Magento 1. There are some good, even free alternatives for Magento 1 to replace MYSQL search with Solr, Elasticsearch or Sphinx. Magento 1 EE has Solr and Magento 2 from 2.1 on Elasticsearch by default. Some 3rd party Elasticsearch and Solr 3rd extensions can replace the default MYSQL layered navigation engine at the same time, which can be a huge benefit.
- Full Page Cache
- Full Page Cache (FPC) is a mechanism whereby html pages generated by the server softwares are cached as a whole. Next time the same web page is required, the cached version is returned without the need to regenerate the content. In Magento 1 CE has no FPC by default and in Magento 1 EE caching is managed by Magento code itself. It saves a lot of resources and results in greater speed, but the best way to do FPC is by implementing caching in a layer in front of Magento without the need to touch Magento at all when a cached content is served. This is accomplished by Varnish caching in Magento 2. Though Magento 1 CE has no FPC and Varnish, there are good extensions to implement these features, too.
- Application Cache
- Magento heavily relies on different types of config and application caches, Redis, a memory based, scalable application cache is highly recommended with tag management. The extension to handle Redis is built into Magento starting from the latest version of Magento 1 CE.
To sum it up for different Magento versions:
- Common bottlenecks are product import and indexing in the backend, search and layered navigation in the frontend. Checkout and Order management are also important factors, but these are more strongly related to the number of visitors and transactions.
- There is a big chance that batch product import is to be tailored and optimized in all Magento versions – feature rich and optimized product import is still to be solved even in Magento 2 EE.
- Magento 1 CE is the platform least prepared to handle big catalogs, but through heavy scaling, proper hosting and 3rd party extensions that provide better indexing, search and caching, it can be considerably upscaled. Indexing may still remain a bottleneck, however.
- Magento 1 EE has a bunch of optimizations already in place, to implement Varnish cache and fine tune the server architecture are the best candidates to scale it up even further.
- Magento 2 CE is designed in a way that it should be prepared to serve middle-sized businesses, too, just like Magento 1 EE. Functionality wise it lacks enterprise features, such as store credits, better CMS management etc., but as far as performance is concerned it is on pair with Magento 1 EE – or even better due to Varnish. An obvious way to scale it up is to utilize built in optimization options, fine tune server architecture and resources and implement Elasticsearch or Solr for catalog search.
- Magento 2 EE targets enterprises even beyond the middle-size range and aims to offer a highly scalable architecture that utilizes the benefits of cloud computing.
What are the common mistakes that one can make when having a huge Magento catalog?
In addition to failing to missing the points above, there are some additional mistakes that one can make when having a huge Magento catalog.
- Underscaling – Maybe the most important advice here is that it is very important to build a live Magento store in a way that it has plenty of system reserves and there is a proven way to scale it up quickly if needed. Performance tests throughout the development cycles are very useful so that the limits of the system are known.
- Too many / poor quality 3rd party or custom extensions – 3rd party extensions are not always built for and tested to be performant for large catalogs and even a small oversight in the extension design can result in performance disasters.
- Lack of proper monitoring – System monitoring is essential so that issues and bottlenecks can be discovered and eliminated in time.
Some technical details behind the hints
In the section below we try to provide some details on Magento concepts and terms for the better understanding of the subject.
Data Storage in Magento
One point is that Magento is very complex and its features are virtually maxing out the limits of a PHP/MYSQL based system. Products are really massive objects, with many aspects to them from its really versatile attribute scheme through prices, images, categories, product options to different product relations.
What is more, all these properties may have as many different layers of values as many stores and languages are used. On the top of that, these aspects are potentially extended by a number of 3rd party extensions. All this means that saving and updating products in the database is a complex action.
Indexes – Preparing for Magento’s Frontend Data Access
The other point is that the structure of the database tables where product data is primarily saved is optimized for flexible data storage but, due to this complexity and the nature of relational databases, not optimized for the different types of data retrieval at the same time. To save the day, Magento introduced the so-called index tables that are populated by the process called indexing.
Most indexes, like URL, category/product relation, price, and stock and attribute indexes, as well as search index are pretty much essential for a stock Magento to work. There are some other indexes that are optional such as catalog flat and product flat indexes, which flatten the EAV and multi store data to dedicated store tables and single product rows.
Indexing in Magento has two faces. On the one hand, it enables Magento’s most powerful features to work and optimizes data access. On the other hand, it makes even more complex, time consuming and resource greedy to store product data in a usable way for the frontend.
- Product Attributes Index – By default, it is essential so that layered navigation works. It copies product/attribute options data to a table structure that is optimal for finding products based on the different attribute options they have. The number of attribute options are multiplied by the number of stores/languages Magento has.
- Product Price Index – By default, It is essential so that sorting and filtering by product prices work. Any Tier prices, configurable, bundle and products complicate this calculation to a great extent since the minimal price of complex product types depends on the price of the constituent products. The number of Magento websites and price groups act as further complicating factors here.
- Catalog URL Rewrites Index – By default, it is essential so that SEO friendly URLs and redirection from old URLs to new ones work. The depth and size of the category structure, the number of old URLs and stores/languages has a huge affect on this table. The size of this table can be blown up to millions of records and keep continuously growing even in the case of a catalog with a few thousand products and few languages. In Magento 1 CE and Magento 1 EE up to 12.x this is definitely the single most problematic indexer as far as big catalogs are concerned.
- Category Products Index – optimizes product filtering based on categories by storing catalog-product relations in a separate table. Again, it is an essential index for the frontend. The fact whether layered navigation is used for a category (“is anchor”) or not has an effect on these product relations.
- Catalog Search Index (fulltext) – It is essential for default MYSQL-based search to work. It merges the text of product attributes and option labels of individual products so that it is searchable by the MYSQL fulltext engine. Here again, as many stores there are, so many index rows are created for a single product.
- Stock Status Index – This is for calculating the fact whether product is purchasable in Magento, which may be governed by the mixture of some global, website and product level values and settings.
- Product Flat Data Index – This index is optional and was added at a later point in the evolution of Magento to speed up product listing/sorting and spare server resources. The way it works is to copy product attribute values which normally could only be retrieved by huge queries joining multiple tables into a flat structure with only one record per product and store.
- Magento EAV
- EAV stands for Entity-Attribute-Value. Is a dynamic attribute management pattern which enables adding/removing/modifying product attributes like colour, manufacturer etc., without changing the structure of the database tables. This is a very powerful and user friendly feature and Magento has a number of configurational options for custom attributes out of the box.
- Magento multi store
- It is a unique Magento feature that a single Magento database can manage multiple stores in a way that attributes can have dedicated store level values that may override default values. This, again, is a highly user friendly approach and makes it easy, for example, to create different language versions of the same catalog only by changing the description of the products and label of the product options.
- Magento indexing
- Indexing in Magento is a process whereby data is copied from database tables optimized for data storage to tables optimized for frontend data access. Different indexes are used for different types of data access. Indexing processes are built to speed up the shop and generally provide a huge benefit, however make data storage more complex and resource greedy. Especially in the case of full reindexing, which is sometimes inevitable, and any Magento shop should be prepared to be able to perform a full index rebuild, it may take a long time and require huge resources to reindex all product data.