Earlier today a friend asked me how to go about running a certain SQL query using the NDB API. This tends to be a common question so I figured this would be a good opportunity to show an example. The query in question looks something like this:
select count(*),emailaddr from email_example group by emailaddr having count(emailaddr) > ?;
This is a fairly basic query as it does not involve any joins. There is also an ordered index on the "emailaddr" field which makes this quite simple. Essentially we perform a scan on the ordered index, and count the number of times we see an email address in a row. If this number is greater than the parameter, then we print the row. Here's the code:
NdbTransaction trans = ndb.startTransaction();
/* perform a scan on the ordered index on the email address */
NdbIndexScanOperation op = trans.getSelectIndexScanOperation("emailaddr",
"email_example", LockMode.LM_Read);
op.getValue("emailaddr"); /* read only the emailaddr field */
NdbResultSet rs = op.resultData();
trans.executeCommit();
int emailCount = 0;
String email = "";
/* iterate over addresses in index */
while(rs.next())
{
String curEmail = rs.getString("emailaddr");
/* if we have a new email, finish the previous one */
if(!curEmail.equals(email) && !email.equals(""))
{
/* print if it matches */
if(emailCount > x)
System.out.println("Email: " + email + ", count = " + emailCount);
/* reset the count for the next address */
emailCount = 0;
}
/* set the previous to the current value */
email = curEmail;
/* increment this email count */
emailCount++;
}
/* after the loop we still have to check if the last address matched */
if(emailCount > x)
System.out.println("Email: " + email + ", count = " + emailCount);
I've attached the full code example here including some sample data. To verify that our program performs correctly, we can check the results against the output of the SQL query. Here's the output from the example program:
Connected. Running query. Email: a@a.com, count = 2 Email: c@c.com, count = 2 Email: d@d.com, count = 2 Email: f@f.com, count = 3
And the same result through MySQL:
+----------+-----------+ | count(*) | emailaddr | +----------+-----------+ | 2 | a@a.com | | 2 | c@c.com | | 2 | d@d.com | | 3 | f@f.com | +----------+-----------+
This program should be pretty simple to understand if you're familiar with the NDB API. The basic element here is the index scan operation. If you find yourself wondering about how to implement an NDB API program for an existing SQL query, keep this operation in mind as it will often be used. In some future posts, I'll cover some examples of the other operations and how to combine them with the index scan.