PHP and MySQL - Chapter 7 Review questions

What are the two primary types of joins?

The two main types of joins are inner and outer joins, with subtypes within each type.

Why are aliases often used with joins?

Often an alias will be used as a shorthand way of referencing the same table multiple times within the same query.

Why is it considered often necessary and at least a best practice to use the table.column syntax in joins?

When selecting from multiple tables, you must use the dot syntax if the tables named in the query have columns with the same name. (In relational databases, it is common for the primary key in one table to have the same name as a foreign key in a related table.) If you are not explicit when referencing the columns, an error that reports an ambiguous column name will occur.

What impact does the order of tables used have on an outer join?

The left outer join is the most common, and in a left outer join, the first table named will be the one that has all of its records returned. It is important because if you are interested in finding the status or some condition related to all of the records in a table, including records that have no matches in the remainder of the join, that is the table that you want to list first. The example used in the book to illustrate this is that of finding the number of messages in each of the forums that exist in the forums table. In an inner join, if there were no messages in a particular forum, there would be nothing reported. In a left outer join, with the forum table listed first, the forum would be listed with a NULL value for messages if there were no messages that matched.

How do you create a self-join?

A self-join is a join of a table with itself. The trick with self-joins is to treat the two references to the same table as if they were single references to two different tables. The way to do that is to assign a different alias to each table reference.

The example that he gives in the book with the forum messages table, for finding messages that are replies to a parent message, is as follows (with a slight modification):

SELECT m1.message_id,m1.subject, m2.subject AS Reply FROM messages AS m1 LEFT JOIN messages AS m2 ON m1.message_id=m2.parent_id where m1.parent_id=0

This returns a result set like this:
+------------+-------------------------------+------------------------+
| message_id | subject                       | Reply                  |
+------------+-------------------------------+------------------------+
|          1 | Question about normalization. | NULL                   |
|          2 | Database Design               | Database Design        |
|          4 | Database Design               | NULL                   |
|          5 | PHP Errors                    | PHP Errors             |
|          9 | Chicago Bulls                 | Chicago Bulls          |
|         11 | CSS Resources                 | CSS Resources          |
|         13 | HTML vs. XHTML                | HTML vs. XHTML         |
|         15 | Why?                          | Why? Why? Why?         |
|         16 | Dynamic HTML using PHP        | Dynamic HTML using PHP |
+------------+-------------------------------+------------------------+
9 rows in set (0.00 sec)


The messages that have no replies have NULL in the Reply field. The state of the messages table at the time of this query was:

mysql> select message_id,parent_id,forum_id,subject from messages;
+------------+-----------+----------+-----------------------------------------+
| message_id | parent_id | forum_id | subject                                 |
+------------+-----------+----------+-----------------------------------------+
|          1 |         0 |        1 | Question about normalization.           |
|          2 |         0 |        1 | Database Design                         |
|          3 |         2 |        1 | Database Design                         |
|          4 |         0 |        1 | Database Design                         |
|          5 |         0 |        2 | PHP Errors                              |
|          6 |         5 |        2 | PHP Errors                              |
|          7 |         6 |        2 | PHP Errors                              |
|          8 |         7 |        2 | PHP Errors                              |
|          9 |         0 |        3 | Chicago Bulls                           |
|         10 |         9 |        3 | Chicago Bulls                           |
|         11 |         0 |        5 | CSS Resources                           |
|         12 |        11 |        5 | CSS Resources                           |
|         13 |         0 |        4 | HTML vs. XHTML                          |
|         14 |        13 |        4 | HTML vs. XHTML                          |
|         15 |         0 |        6 | Why?                                    |
|         16 |         0 |        2 | Dynamic HTML using PHP                  |
|         17 |        16 |        2 | Dynamic HTML using PHP                  |
|         18 |        17 |        2 | Dynamic HTML using PHP, still not clear |
|         19 |        18 |        2 | Dynamic HTML using PHP, clearer?        |
|         20 |        15 |        6 | Why? Why? Why?                          |
|         21 |        20 |        6 | Because                                 |
+------------+-----------+----------+-----------------------------------------+
21 rows in set (0.00 sec)

To pick a couple examples to explain the result of the join, the message with the message_id=1 has a parent_id of 0, so it is a root message. There are no other messages in the table that have a parent_id=1, therefore that message has no replies, it is listed with NULL in the "Reply" column of the result.
The message with message_id=2 has a parent_id of 0, so it is a root message. Scanning the table of messages, it can be seen that there is another message, with message_id=3, that has message_id=2 as its parent. So the reply column shows the subject line of message_id 3 as the Reply.

A slight modification of that query, showing the parent message_id as "P_id", and the child or reply's message_id as "R_id" might provide some additional clarification of the relationships:

mysql> SELECT m1.message_id AS P_id, m1.subject, m2.subject AS Reply, m2.message_id AS R_id FROM messages AS m1 LEFT JOIN messages AS m2 ON m1.message_id=m2.parent_id WHERE m1.parent_id=0;
+------+-------------------------------+------------------------+------+
| P_id | subject                       | Reply                  | R_id |
+------+-------------------------------+------------------------+------+
|    1 | Question about normalization. | NULL                   | NULL |
|    2 | Database Design               | Database Design        |    3 |
|    4 | Database Design               | NULL                   | NULL |
|    5 | PHP Errors                    | PHP Errors             |    6 |
|    9 | Chicago Bulls                 | Chicago Bulls          |   10 |
|   11 | CSS Resources                 | CSS Resources          |   12 |
|   13 | HTML vs. XHTML                | HTML vs. XHTML         |   14 |
|   15 | Why?                          | Why? Why? Why?         |   20 |
|   16 | Dynamic HTML using PHP        | Dynamic HTML using PHP |   17 |
+------+-------------------------------+------------------------+------+
9 rows in set (0.00 sec)


Examining the first row that is returned in the modified query, the P_id is "1", that is a message that has no "children", so the Reply is NULL, the R_id is NULL. The second row in the table shows that it is a parent message with the id of 2, and there is another message with a parent id of 2, that being message 3. So the second line in the result set shows that the "R_id" for parent 2 is message 3. (This could go on ad infinitum so it might be best to stop here!)

What are the aggregate functions?

The aggregate functions or grouping functions return a value based on a single column over a set of rows. The following table lists the grouping functions.
SQL Grouping Functions
Function Returns
AVG( ) The average of the values in a column.
COUNT( ) The number of values in a column.
GROUP_CONCAT( ) The concatenation of a column's values.
MAX( ) The largest value in a column.
MIN( ) The smallest value in a column.
SUM( ) The sum of all the values in a column.

What impact does the DISTINCT keyword have on an aggregate function? What impact does GROUP BY have on an aggregate function?

The DISTINCT keyword make the aggregation only apply to distinct (or unique) values. The example that is used in the book is that the query SELECT COUNT(customer_id) FROM accounts will return the number of accounts, regardless of who owns the accounts. The query SELECT COUNT(DISTINCT customer_id) FROM accounts will return the number of customers that have accounts.

What kind of index is required in order to perform FULLTEXT searches? What type of storage engine?

A FULLTEXT search requires a FULLTEXT index. This is only supported on the MyISAM table.

What impact does it have when you conclude a SELECT query with \G instead of a semicolon in the mysql client?

Instead of presenting the results of the query in a tabular format, it will display the results as a vertical list. This can make it easier to examine the results of queries that return voluminous information.

How do IN BOOLEAN MODE FULLTEXT searches differ from standard FULLTEXT searches?

An IN BOOLEAN MODE FULLTEXT search is more sophisticated than a normal FULLTEXT search in that it has a number of operators that tweak how each keyword is treated. You can use the '+' operator to indicate that a word must be present in a search. The '-' operator can indicate words that must NOT be present. The '>' operator will increase a word's importance, whereas the '<' operator will decrease the importance of a word, affecting the rankings of the matching records. The asterisk is used as a wildcard character

, the tilde '~' is used to reduce the ranking if a search term is present. Several other differences compared to a normal FULLTEXT search: Because the results are not sorted by relevance, that is an extra step that you need to add to the query.

An example query and returned results, sorted by relevance, is:

mysql> SELECT subject, body, MATCH(body, subject) AGAINST('*HTML >XHTML' IN BOOLEAN MODE) AS R FROM messages WHERE MATCH(body, subject) AGAINST('*HTML >XHTML' IN BOOLEAN MODE) ORDER BY R DESC\G
*************************** 1. row ***************************
subject: HTML vs. XHTML
   body: What are the differences between HTML and XHTML?
      R: 2.5
*************************** 2. row ***************************
subject: HTML vs. XHTML
   body: XHTML is a cross between HTML and XML. The differences are largely syntactic. Blah, blah, blah...
      R: 2.5
*************************** 3. row ***************************
subject: CSS Resources
   body: Read Elizabeth Castro's excellent book on (X)HTML and CSS. Or search Google on "CSS".
      R: 1
*************************** 4. row ***************************
subject: Dynamic HTML using PHP
   body: Can I use PHP to dynamically generate HTML on the fly? Thanks...
      R: 1
*************************** 5. row ***************************
subject: Dynamic HTML using PHP
   body: You most certainly can.
      R: 1
*************************** 6. row ***************************
subject: Dynamic HTML using PHP, still not clear
   body: Um, how?
      R: 1
*************************** 7. row ***************************
subject: Dynamic HTML using PHP, clearer?
   body: I think what Larry is trying to say is that you should buy and read his book.
      R: 1
7 rows in set (0.00 sec)

mysql>

Any record that has the exact phrase 'XHTML' is ranked higher than others, but 'XHTML' is not required in the match. (This is another answer that could go on and on, suffice it to say that this is a much more flexible and powerful search technique.)

What commands can you use to improve a table’s performance?

One command that helps to maintain the table and remove unnecessary overhead is the OPTIMIZE command. This should be used after a table has been changed with the ALTER command, or after many DELETE queries have been run.

A second command is the ANALYZE command, which updates the indexes on the table.

How do you examine the efficiency of a query?

You can use the EXPLAIN query command, which gives an indication of how MySQL will run a query. This gives details on the way a query will be conducted, perhaps indicating where an index should be added, where a table should be modified, etc.

Why doesn’t the forum database support transactions?

Transactions are only supported by the InnoDB storage engine. The forum database uses the MyISAM engine for the messages table, therefore it cannot have transaction support.

How do you begin a transaction? How do you undo the effects of a transaction in progress? How do you make the effects of the current transaction permanent?

A transaction is started with a START TRANSACTION; command. Queries that would modify the database are performed (primarily UPDATE queries - because queries that create, alter, truncate or delete tables cannot be undone.) While a transaction is in progress and not terminated, the ROLLBACK; command will return the database to its initial condition. The effects of the current transaction are made permanent with either the COMMIT command, or by executing one of the aforementioned actions such as CREATE TABLE, etc. COMMIT; is the normal way.

What kind of column type is required to store the output from the AES_ENCRYPT( ) function?

The AES_ENCRYPT() function requires a binary data type, for example VARBINARY or BLOB.

What are the important criteria for the salt used in the encryption process?

The 'salt' argument is a string that helps to randomize the encryption. The exact same salt value is needed for both the encryption and decryption processes, so it must also be stored somewhere (most likely in the database along with the encrypted value.) One final consideration is that each piece of stored data should use a unique salt, and the longer the salt value, the better.