PHP's Symfony2 Solr Bundle Implementation

solr_logo_rgb (1150x580)

By Andy Thorne

on the 20th February 2017


Symfony2 Apache Solr Integration

As part of the ongoing development of LineStorm CMS, I've come to the stage of implementing page body and data searching. As you may know, MySQL's InnoDB doesn't support full text search pre 5.6, which is what the majority of people use. Therefore, I started looking for a range of possible generic implementations:

  • InnoDB Full Text (MySQL >= 5.6)
  • InnoDB with data mirroring to a MyISAM table (MySQL < 5.6)
  • MyISAM Full Text
  • Apache Solr
  • Sphinx
  • Tri Graphs

I found a brilliant slide show presented by Bill Karwin at Percona, that highlights the pros and cons of all the above methods. Implementing Tri Graphs, InnoDB and MyISAM methods were very straightforward to implement. However, Solr - despite know of it - was new to me, so I found a pre-made Symfony2 bundle by Florian Semm. After having a play around with it, I could see it needed changing for LineStormCMS, so I put on my forking shoes (sorry, that was terrible.) and ended up rebuild huge chunks.

If you're interested in using the rebuild bundle, below is what has changed. You can grab the code from the andythorne/solr-bundle fork on github.

What has changes for the Solr Bundle

There are several major changes with the structure and interaction with the rebuild fork. Here are the major ones, and why it was changed:

Query Creation, Repositories, Mappers and Doctrine

My original need to modify the bundle came from it's interaction with Doctrine. In my system, Solr is queried and ids are returned. The database is then hit to extract full entities. A lot of this is done for searching, so entity objects are not needed. I needed a way to hydrate as an array. This in-turn needed the query to be provided. Using the query builder provider greater control when modifying it within the bundle.

I then found out that, for each solr representation of an entity returned, the database was hit, resulting in potentially hundreds of almost duplicate queries being run.

I think Florian had the right idea in providing a solr Repository, so I have extend it to provide the needed functionality.

MetaInformation

I never really understood why the MetaInformation class contained actual entity objects. Maybe it's just a bad name for the class, but it seemed wrong, so they have now been de-coupled. It seemed to be creating new entities and reflection classes all over the place. The mapping method `extractSolrValues($entity)` has been added to convert an entity into solr-ready values.

The MetaInformation has been integrated into Repository class. Methods that were being provided by the Solr class, have been de-coupled and moved into the Repository class. Any methods still used in Solr now require the meta information as well. That may sound bad, but as all manipulation happens via the Repository class, the meta is automatically passed through.

* `Solr->createQuery` - This method still exists, but now only returns the response from solr.
* `Solr->XXXDocument` - Direct Entity/Document manipulation has been moved into the Repository.

Mappers and Hydrators

The main change here is that instead of a hydration being called for every solr entity representation, it is now called on the array of entity representations. This allows doctrine to query for a set of ids, as described next.

Doctrine

Doctrine is now queried with the QueryBuilder by all methods. This allows us to join on any EntityFields or CollectionFields (See Annotations section below) and specify the doctrine hydrate mode.


Annotations

There were a few fairly big annotations missing:

  • Properties within Entity relationships
  • Properties within Collections of Entities
  • Fields that are not mapped in the entity, but are still needed. For example combination fields.
  • Ability to map a property to another name. For example `$tags` should map to `tag` within solr.

So, these changes have been made:

Modified Annotation: Field

There is a new "name" property. This allows you to map the entity property to a different name in solr.

/**
 * @var string
 * @Solr\Field(name="article_title")
 */
 protected $title;

New Annotation: EntityField

This will map properties from within an entity into solr.

This will map $category->name into solr. You can specify multiple properties, however they will all be mapped to the
`solr_category` query string.

   /**
     * @var Category
     *
     * @ORM\ManyToOne(targetEntity="BlogCategory")
     *
     * @Solr\EntityField(name="solr_category", properties={"name"})
     */
    protected $category;
Property    Type    Required
name        string  optional
properties  array   required


New Annotation: CollectionField

This will map properties from entities within a collection into solr. Options are the same as `EntityField`.

    /**
     * @var Tag[]
     *
     * @ORM\ManyToMany(targetEntity="BlogTag")
     *
     * @Solr\CollectionField(name="slor_tag", properties={"name", "description"})
     */
    protected $tags;
Property    Type    Required
name        string  optional
properties  array   required
 

New Annotation: MetaFields

This will map any fields that are in solr, but do not have a property. It is defined at the class level.

    /**
     * @Solr\Document
     * @Solr\MetaFields(fields={{"name"="text"}})
     *
     * @ORM\Entity
     */
    class Article
    {
        ...
    }
Property    Type    Required
fields      array   The array is a key/value pair of options you would usually pass to a Field annotation.