home
Hero image for: How to remove file documents from SOLR index after been replaced in nodes in Drupal 7

How to remove file documents from SOLR index after been replaced in nodes in Drupal 7

By Eduardo García ● CTO | June 17th, 2015

If you have revisions enabled in your content types, it keeps all your old files on the server (associated with old revisions), so replacing a file is harder.

Even though if you don’t have revisions enabled, if you try to remove it and add it again to the node, the name/link is updated, but since a file with that name is kept on the server, and there is a name duplication, Drupal adds those “_0”, “_1” etc. suffixes to future uploaded versions of that file’s name.

In term of node render is not a problem, but regarding SOLR indexing all those files will be indexed, so instead of getting one record for a match is possible to get N records where N is the number of the time do you overwrite the same file.

Then, to resolve this annoying behavior I will show you a solution to remove a file document from SOLR index if a file is deleted from a node. This solution uses Entity API module so it should be added as a dependency in your module .info file.

/**
 * Implements hook_node_update().
 */
function MYMODULE_node_update($node) {

  // Array of content types to act on.
  if (in_array($node->type, array('article', 'blog_post'))) {
    $wrapper = entity_metadata_wrapper('node', $node);
    $original_wrapper = entity_metadata_wrapper('node', $node->original);

    // Array of file fields to act on.
    foreach (array('field_public_files', 'field_private_files') 
  as $field) {
      if (!isset($original_wrapper->{$field})) {
        continue;
      }
      $current_files = array();
      $original_files = array();

      // Get files that were attached to the original node 
     // (before update).
      foreach ($original_wrapper->{$field}->value() as $file) {
        $original_files[] = $file['fid'];
      }
      // Stop if there were no files previously attached.
      if (empty($original_files)) {
        continue;
      }

      // Get files currently attached to the node (after update).
      foreach ($wrapper->{$field}->value() as $file) {
        $current_files[] = $file['fid'];
      }

      // Delete files that were in the original node but were removed 
      // during this update
      $deleted_files = array_diff($original_files, $current_files);
      if(!empty($deleted_files)) {
       $env_id = apachesolr_default_environment();
       $solr = apachesolr_get_solr($env_id);
       foreach ($deleted_files as $fid) {
         $file_id = apachesolr_document_id($fid, 'file') . '-' 
  . $node->nid;

         // Remove file document from SOLR index, re-index is not required,
         $solr->deleteByQuery("id:$file_id");
       }
      }
    }
      $deleted_files = array_diff($original_files, $current_files);
      if(!empty($deleted_files)) {

    }
  }
}

The code above reacts when a node is updated for specific content types and specific fields, using function entity_metadata_wrapper we can determine the information of our node before the update and after the update.

After calculate if some files were deleted the logic to delete the file document from SOLR is applied, let me see that code in detail

$env_id = apachesolr_default_environment();
$solr = apachesolr_get_solr($env_id);
foreach ($deleted_files as $fid) {
 $file_id = apachesolr_document_id($fid, 'file') . '-' . $node->nid;

 // Remove file document from SOLR index, re-index is not required,
 $solr->deleteByQuery("id:$file_id");
}

Firstly, a SOLR object instance is declared. In my example, I use the default SOLR instance, if you have more than one instance of SOLR is necessary to change the logic to use the proper instance.

Using apachesolr_document_id function, the SOLR ID is calculated using the fid, file type and providing the specific relation to the particular node subject of the update; because in some Drupal installs a file could be associated to several nodes.

Finally, using deleteByQuery method the delete from SOLR index is requested, this delete is applied immediately, SOLR re-index is not required.

I expect you found this article useful.