User talk:Deepi G

Tuning SQL
II)

Tuning SQL Queries 1. Analyze Statistics 2. EXISTS versus IN 3. Saving on Numbers of Queries :

One of the "rules" of efficient use of client/server applications is to try to minimize the number of requests submitted; to     maximize the amount of       useful data processed per query. The usual way you would expect this to be useful would be by   replacing a series of SELECT statements that each pull in       one row with a single query that pulls all the rows you need.

i) SELECT * FROM contact WHERE id = 15001;, SELECT * FROM contact WHERE id = 15002;, and so forth,           it is doubtless less expensive to pull all the rows at once, as in the query       SELECT * FROM contact WHERE id BETWEEN 15001 AND 15011; OR  SELECT * FROM contact WHERE id IN (15001, 15002, 15003, 15004, ... 15011);.

Selecting 10,000 rows at once will be substantially cheaper than selecting them one at a time, and this should be      completely unsurprising.

ii) Do 5 selects              explain analyze select name from contact where id = 20004;              explain analyze select voice from contact where id = 311111;               explain analyze select fax from contact where id = 241111;              explain analyze select cell from contact where id = 32111;               explain analyze select postcode from contact where id = 271111;

-- Join this into one select explain analyze select a.name, b.voice, c.fax, d.cell, e.postcode from contact a, contact b, contact c, contact d,                contact e where a.id = 20004 and b.id = 311111 and c.id = 241111 and d.id = 32111 and e.id = 271111;

the five queries each took about 0.15 seconds, thus totalling 0.75s. Joining them together led to a runtime of about 0.56s.

That is certainly not an immense improvement. The JOIN introduces some additional costs in parsing and optimizing the more complex query. More likely to disqualify this approach is the fact that the more complex query is more expensive for programmers to manage, and that is a cost that should not be ignored.

4. The Trouble With Aggregates:

Using Aggregate Functions and Operators the MAX and MIN aggregates thus:

...this simply prevents you from selecting everything from the database, iterating through each row one by one, and calculating your results by hand.

While it is strictly true that aggregates do avoid this, and allow you to avoid the work associated with:

* Marshalling and shipping all the data across network interface; * Processing the data, once it has been transferred; * Writing code to process the data,

Aggregates are nonetheless not as efficient as we might like, since the postmaster still has to " select everything from      the database, iterating through each row one by one". On a large table, this can be very expensive.

If you are looking for the maximum or minimum value for a particular field that has a useful index, there is a much more efficient query than this. Consider the following example:

create table names ( id serial not null unique, name character varying not null unique ); insert into names (name) values ('Tux the Penguin'); insert into names (name) values ('Puff the Friendly Giant'); insert into names (name) values ('Bill the Cat'); insert into names (name) values ('Schrodinger''s Cat'); -- And a further cast of tens of thousands ...

-- Aggregates - which require a Seq Scan select min(name), max(name), min(id), max(id) from names;

-- Select via index select name as min from names order by name limit 1; select name as max from names order by name desc limit 1; select id as min from names order by id limit 1; select id as max from names order by id desc limit 1;

5. Using Cursors Versus Limit:

Allows the system to potentially start returning data much earlier...

6. Optimizing Data Loads