How to implement extremely fast custom indexing in Magento?

Nowadays users are quite demanding in terms of applications. It is important to them to find their way around on the user interface and to be able to anticipate what they are going to see if they click or tap somewhere. They also expect to find relevant information as quickly as possible, and not to wait long for page downloads. Therefore the developer needs to make compromises right in the planning phase of the application. To put it simply, the aim is to process, store and display the data as efficiently and as fast as possible.



We display a part of the data, located in the database, to the user unprocessed, i.e. in a raw format, other data we display processed, and there are also data that we do not show at all. The data should be stored decomposed into the smallest sections possible. This is one of the basic conditions of database normalization, which determines that one column of a database table represents one basic value.

A result of a normalized database is storing information efficiently, but it also slows down processing and displaying. In the case of a non-normalized database anomalies will occur, so normalization is simply a must. However, we’d still like to display data to the user fast.

Solution: we organize the data into a joint table, from which we can display the data without processing. The table containing the collective data is the flat table and the procedure which manages it is indexing. In this article we introduce the aspects with which, using Magento’s framework system, we can speed up processing and displaying without compromising on database normalization.

 

This article includes the following topics:

  • Anomalies of a non-normalized database

  • Requirements related to the data indexer

  • Event oriented indexing

  • Indexing in Magento

  • Summary

 

magento indexing flat table developer

 

Anomalies of a non-normalized database

 

We are going to touch on the reasons and consequences of anomalies without describing how to handle them.

A prerequisite of a normalized database is that all the columns of the table, i.e. every element of the record, represent one basic value each. There must not be identical rows and the order of rows should not refer to any information.

 

Three types of anomaly can occur in the case of a non-normalized database:

  1. Modification
  2. Insertion
  3. Deletion

 

  1. A Modification anomaly occurs when an attribute is present in multiple tables. In this case, a modification needs to take place in several locations. If it does not take place that way, our database will be inconsistent.
  2. Insertion anomaly occurs when, because of a missing piece of information, we cannot insert a row. The result of this is loss of information.
  3. If we delete data that we would still need, it is a case of Deletion anomaly. Here too, loss of information is a consequence.

In order to avoid anomalies, we need to normalize our database according to database normal forms.

 

Indexer requirements

 

The task of the indexer is to speed up data display. But, as we could see above, the indexed table does not contain basic data but processed data. Thus it will fail on a basic level in a normalization test. Therefore it needs to match certain requirements.

 

The main requirement is the following:

  • If we delete a flat table from the system, is should not cause any anomaly.
  • The system should work even without flat tables. -> Flat tables are not part of the system.

 

The first requirement can be met if only the indexing procedure manages the table and we delete the table before every indexing.

What happens if we have deleted such data that would be needed?

The answer is trivial. We cannot delete such data, the flat table is not part of the system, it only helps to display them. The system should operate even without it.

tips Example: We compose the flat table, but we delete a record in the meantime, which is not needed. In this case, the data in the flat table are not up-to-date anymore. When we reindex, there will be data that are not needed anymore. If we do not delete the flat table, then we should check if the given values are needed or not. This would be a lengthy and complex process. Deleting and then rebuilding the whole table is the best solution. This notion will be supported by other reasons to be mentioned later.

Larger speed will be a result because only those data are included in the table that are needed. But these data need to be managed and provide the most up-to-date state. The most important aspect of the system is speed and efficiency. However, there can be huge variances between the task management efficiencies of the sub-systems.

 

Common solution:

magento indexing process flat table

1. Illustration of indexing process

 

The graph above includes the following steps:

  1. PHP requests the records to be indexed, from several tables and through a number of models.
  2. Next, MySQL sends the requested data.
  3. PHP processes the records with the help of cycles, often with multi cycles.
  4. PHP sends back the records one by one.
  5. MySQL updates them.

 

Pro: Simple and clear logic.

Con: Very slow. It is not suitable for meeting requirements related to flat tables. There are certain systems where it provides satisfactory performance, but here it is not an option.

 

Optimal solution:

optimal magento index process flat table

2. Process of indexing

 

 

Optimized steps:

  1. With the help of models, PHP composes the SQL SELECT which is returning the flat table
  2. MySQL executes this SELECT and inserts the created table in the database

Pro: Very fast, in some cases a 30-fold increase in speed. Eliminates loss of communication between the two systems. Satisfies the requirements.

Con: The request process is often very complex. It can happen in rare cases that MySQL is not capable of creating the processed data. In this case an optimized hybrid solution is to be used which is case-dependant.

 

Event oriented indexing

 

We can initiate the indexing process manually possibly also with the help of cron. In this case a total reindexing is necessary since we do not know which records are not up-to-date. It also presents a dilemma that how often the reindexing should be run.

Total reindexing is unavoidable since it is important in terms of speed that only relevant data are present in the table. On the other hand, running it after every modification is costly and thus we lose more than what we’d gain. To avoid this, such processes should be created that index certain rows only. Implementing this is quite simple because we can filter the records in the WHERE condition of the SELECT request based on the given condition.

We structure the database tables in such a way that when inserting, MySQL will know automatically if a record is new one or an existing one that just needs to be modified.

 

Two things need to be defined:

  1. When should the index be run?
  2. Which records should be indexed?

 

Reindexing should be run if a data is changed in the source table and should be run in those records where the data is located.

Implementation: We define events, and when they occur, we know that some data have been changed in the source table. If the event occurs, then we run the indexing for the given record.

 

Indexing in Magento

 

Creating a Flat table

A flat table should be created in such a way that when inserting, MySQL can identify whether there is a new or an existing record. Unique indexing is a solution for that.

config.xml: We need to give a name to our table.

<entities>
    <index_table>
        <table>custom_index_table</table>
    </index_table>
</entities>

 

We register the indexer:

<global>
    ......
    <index>
        <indexer>
            <some_key>
                <model>module/model</model>
            </some_key>
        </indexer>
    </index>
    ......
</global>

 

Code: We add the unique indexes in the installer.

$table->addIndex(
    $installer->getIdxName(
        'your_namespace/your_table',
        array(
            'column1',
            'column2',
            'column3',
        ),
        Varien_Db_Adapter_Interface::INDEX_TYPE_UNIQUE
    ),
    array(
        'column1',
        'column2',
        'column3',
    ),
    array('type' => Varien_Db_Adapter_Interface::INDEX_TYPE_UNIQUE)
);

 

With this, we avoid to have records in multiple locations in the table.

 

Creating the indexing process

We create the actual functions in a helper.

We need to implement 3 functions:

  • runReindex($id) – private
  • reindexAll() – public
  • reindexById($id) – public

 

Runindex method

First we set the database adapter:

$resource = Mage::getSingleton('core/resource');
$adapter = $resource->getConnection('core_write');

 

Then we request the model to which we join the other tables:

$collection = Mage::getModel('namespace/model')
    ->getCollection()
    ->removeAllFieldsFromSelect()
    ->removeFieldFromSelect('id');

 

We remove all the columns of SELECT so that we can adjust these to the index table. Then we join the tables, from which we still need data.

Example: ORDER ITEM join:

$collection->getSelect()->joinLeft(
    array('order_item' => Mage::getSingleton('core/resource')->getTableName('sales/order_item')),
    'order_item.order_id = main_table.order_id',
    array()
);

 

Next we define the column names and column order structure equivalent to the flat table.

$columns = array(
    'column1',
    'column2',
    'column3',
);

$collection->getSelect()
    ->columns('wishlist_item.product_id AS column1')
    ->columns('GROUP_CONCAT(customer_id SEPARATOR ",") AS column2')    ->columns('SUM(wishlist_item.qty) AS column3');

 

We create the request providing the flat table:

$select = $collection->getSelect();

 

We run the request and insert it in the table:

$sql = $adapter->insertFromSelect($select,
    Mage::getSingleton('core/resource')->getTableName('namespace /custom_index_table'),
    $columns,
    Varien_Db_Adapter_Interface::INSERT_ON_DUPLICATE);

$adapter->query($sql);

 

As we can see, communication is minimal between the two systems. PHP sends the request which returns the flat table. MySQL runs this and inserts it in the database.

 

ReindexById method

We need to filter the records of SELECT:

$collection->getSelect()->where('id = '.$id); 

 

ReindexAll

We empty the index table. We request the identifiers of all the records and call the runReindex($id) method.

 

Event dispatch

<?php
class Namespace_Model_Model extends Mage_Sales_Model_Order_Item
{
    const ENTITY = 'namespace_model_model';

     /**
      * Before Delete
      */
     protected function _beforeDelete()
     {
         parent::_beforeDelete();

         Mage::getSingleton('index/indexer')->logEvent(
             $this, self::ENTITY, Mage_Index_Model_Event::TYPE_DELETE
         );
     }

     /**
      * Before Save
      */
     protected function _beforeSave()
     {
         parent::_beforeSave();

         Mage::getSingleton('index/indexer')->logEvent(
            $this, self::ENTITY, Mage_Index_Model_Event::TYPE_SAVE
         );
     }

     /*
      * After Save Commit
      */
     protected function _afterSaveCommit()
     {
         parent::_afterSaveCommit();

         Mage::getSingleton('index/indexer')->indexEvents(
             self::ENTITY, Mage_Index_Model_Event::TYPE_SAVE
         );
     }

     /*
      * After Delete Commit
      */
     protected function _afterDeleteCommit()
     {
         parent::_afterDeleteCommit();

         Mage::getSingleton('index/indexer')->indexEvents(
             self::ENTITY, Mage_Index_Model_Event::TYPE_DELETE
         );
     }

 }

The data can change in two cases: modification and deletion. Thus these events need be observed. As we can see, Magento differentiates between index events. It is up to the programmer which events should be monitored by the indexer.

If the indexer wants to observe an event which is not dispatched, and can be found in the Magento Core, then the original class should be overwritten. This class should be derived from the original class.

 

Implementing the Magento Indexer

The indexer class, which monitors and runs the indexing processes, should be created in the model directory of our module. This class should be extended from the Mage_Index_Model_Indexer_Abstract class.

class Namespace_Model_Indexer extends Mage_Index_Model_Indexer_Abstract

 

Next, observation of events is needed, this is declared in a class array:

/**
 * Index matched Entities array
 *
 * @var array
 */
protected $_matchedEntities = array(
    Namespace_Model_Model::ENTITY => array(
        Mage_Index_Model_Event::TYPE_SAVE,
        Mage_Index_Model_Event::TYPE_MASS_ACTION,
        Mage_Index_Model_Event::TYPE_DELETE
    ),
);

 

We have declared the model’s events a little earlier. We can see in the code above the function of the ENTITY constant value to be found in the class. With this we identify the model. The abstract methods need to be created:

/**
 * @return bool
 */
public function isVisible()
{
    return true;
} 

/**
 * Retrieve Indexer name
 *
 * @return string
 */
public function getName()
{
    return Mage::helper('namespace')->__('Custom indexer');
} 

/**
 * Retrieve Indexer description
 *
 * @return string
 */
public function getDescription()
{
    return Mage::helper('namespace')->__('Reorganize custom flat data');
} 

/**
 * Rebuild all index data
 */
public function reindexAll()
{
    Mage::helper('namespace/indexer')->reindexAll();
}

 

Event recognition and management

We can carry it out by creating the _registerEvent method.

/**
 * Register indexer required data inside event object
 *
 * @param   Mage_Index_Model_Event $event
 */
protected function _registerEvent(Mage_Index_Model_Event $event)
{
    $dataObj = $event->getDataObject();
    if($event->getType() == Mage_Index_Model_Event::TYPE_SAVE){        $event->addNewData('id, $dataObj->getId());
    }elseif($event->getType() == Mage_Index_Model_Event::TYPE_DELETE){
        $event->addNewData('id, $dataObj->getId());
    }
}

 

We detect what kind of event has happened and then add the data necessary for indexing. In our example, it is the model identifier since we index based on it. Managing events can be done uniquely,  however, and may need different data.

 

Running indexing

 The actual indexing takes place through the _proccessEvent method.

/**
 * Process event based on event state data
 *
 * @param   Mage_Index_Model_Event $event
 */
protected function _processEvent(Mage_Index_Model_Event $event)
{
    $data = $event->getNewData();
    if(!empty($data['id'])){
        Mage::helper('namespace/indexer')->reindexById((int)$data['id']);
    }
}

 

SUMMARY

Optimizing speed is usually among the most important aspects with any system. (An exception is banking applications where safety is the only important consideration.) Flat tables can provide fast display, which is normally very much appreciated by our users. Using them is recommended with complex entities where information is allocated to a huge number of tables.

Since a traffic or capacity bottleneck is caused by the database segment, the problem needs to be tackled here as well and communication between the segments has to be minimized. The advantage of using flat tables is speed which makes it possible for the users to be able to view our pages conveniently and to find quickly what they look for.

 


  • Naked Digital

    Hi Attila. We have a Mage 1.9.x store that has around 60K products and 500 customer groups. As a result price reindexing takes around 10 hrs. Is your solution here suitable to speed this up?

    • Adel

      We have just helped a client to optimise 200k product Magento database. The price index only takes 15 minutes. Happy to help out with your issue. Let me know if you guys need our input.

      • Rahul

        Hi
        Can you tell me how to implement this on live store or you can you share the step by step guide of implementation of this.

Do you need our support?
  • Magento Site Check
  • Magento Code Audit
  • Magento SEO Audit
  • Magento Project Rescue
Request help

NEED A RELIABLE, PROFESSIONAL MAGENTO DEVELOPMENT PARTNER?

Contact us if you have any question or requirement related to the preparation of a new or renewal of an existing online store.

Next