Drupal: Importing Flexinodes and Search Indexing
Sort of a 'note to self', as well as something that may be useful for others... and an explanation as to why I haven't been writing as much... bear in mind that this appears to be pushing things to the limits, and may be bleeding edge for a lot of people. I'm the guy with the surfboard. :-) This is all on Drupal 4.6.6, in anticipation of Drupal 4.7 coming out of beta.
Over the last week or so, as time permitted, I've been scratching my head over the way Drupal handles search indexing. After importing thousands of entries from a Comma-separated values (CSV file into an implementation of Flexinode, not everything was getting indexed.
I fiddled with the database. After all, I'd programmatically added the data and there weren't any guidelines for that - and node_import doesn't hack it for multiple field flexinodes (see my comments here, in this thread).
Basically, what has to be done is the data from a CSV file, given 10 fields per row, has to be populated in the table flexinode_data as well as the node table.
The flexinode_data has one field from the CSV as a row within the MySQL table. Thus, a row of 10 columns in a CSV equates to 10 separate entries within the MySQL table, and are identified by the field identifiers in flexinode_field, which are defined when one creates content types in a Drupal installation with the Flexinode module installed.
The node table requires that, for every row, a body and teaser be generated in HTML/XML for each row that is to be treated as a standalone entry. Thus, for the 10 fields above in the flexinode_data table, there is one entry in the node table.
What we have is this:
Number of Flexinode_data Entries = n(Number of Nodes)
Where 'n' is the number of fields in an implementation of flexinode - hackers call it a (1:n) relationship between the tables, for those unfamiliar. Simple, and not to hard to code- just some creative control loops in the script of your choice. Once it's all shoved in, you can browse the nodes, etc.
Search Indexing
But the node table is a bit more complicated for long term use. You have to insert appropriate timestamps as well - and if you import thousands of entries with the same timestamp, you might go slightly mad. It helps if you are already slightly mad. So I tossed in modules flexisearch and SQL Search (Trip Search). The new 4.7 version of Drupal will have a lot of code from Trip Search incorporated into the regular search for the site, but I can't wait on 4.7 to come out. The client needs the data searchable now. Or, more appropriately, last week.
The problem was that the search_index table wasn't 'growing'. I checked in Drupal's
administer->settings->search
, and made sure that the report was getting to 100%. It was. And I would reindex (by going into the variable table and wiping node_cron_last/
I spelunked the code. I played with the code. I danced with the code. Then I went off looking for documentation on the problem itself. I even (gasp!) asked for help, but ended up talking to myself (a sign of impending mental disaster). Surf's up - I hit the IRC channel #drupal-support on the FreeNode IRC network and asked around. One fellow told me to upgrade to 4.7 beta, but didn't substantiate it - and I've been watching the code based on Drupal 4.7. Another person told me that it seemed like I found a bug. :-) Grabbing my surfboard, I headed in.
The hard part about this is the watching and waiting - so I started forcing the cron job manually (http://yoursite.com/cron.php ) and pushing the server to it's limits. Whenever I hit 100%, I went to reindex because I thought that at 100%, the search_index table shouldn't be growing. And that the search_index table should be growing with each iteration of cron. To some extent, this is true.
So I queried the Drupal site again, and found 'Incorrect loop logic in node_update_index', which had a patch already. It didn't say there what version they were talking about. What version the patch was for. And so on, and so forth... so I applied the patch anyway (backing up appropriately beforehand) and tried again. Same problem.
But then I found that at 100% reported indexing, the search_index table was growing rapidly. Aha! So, like a nuclear reactor, the search index facility works best in Drupal when operating over 100%... Doing that and forcing cron last night got the search_index to grow about 300%.
So the lesson here is that when you want indexing to happen, allow a few cron runs to happen past the search settings of the site getting to 100%. Depending on how 4.7 works out when it's released, I may have to hop in and fix this, at least for me.

Post new comment