Talk:Third normal form

	This article is within the scope of WikiProject Databases, a project which is currently considered to be inactive.DatabasesWikipedia:WikiProject DatabasesTemplate:WikiProject DatabasesDatabases articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.

Archives: 1 2

Not Simple Enough[edit]

I have a BSc in Computer Science, although I shied away from databases, and I also have a British A-Level in Computing, where I was taught how to reduce data down to 3NF. This article is written using database algebra. I like the history of it, but I passed this as an article about how I designed a coherent naming scheme for about 1000 computers (consider the fields between the dots in a hostname to be atoms), and I didn't even understand the article myself, even though I know the method.

This is an encyclopaedia, not a scholarly textbook! — Preceding unsigned comment added by 93.97.48.95 (talk) 00:11, 17 September 2011 (UTC)[reply]

Agreed. I've added a plainer English version based on a definition at Technopedia (with citation) even though this may not be the best one MaryEFreeman (talk) 15:38, 24 April 2014 (UTC)[reply]

I believe that the definition from the Technopedia article is incorrect. For example, if R = (A,B,C) where AB is a primary key and A is dependent on C, then R is in 3NF (according to my professor and Codd's definition), even though A is not only dependent on the primary key. 142.58.43.110 (talk) 21:48, 1 December 2014 (UTC)[reply]

I still think that all of these articles on the forms of normalization are too wordy and not yet easy enough to understand for an encyclopedia. 156.34.11.225 (talk) 20:10, 14 January 2017 (UTC)[reply]

Example[edit]

I've added some text about why the candidate key is what it is. This is because we frequently get talk-page comments where beginners express confusion about the concept of a candidate key, which in turn leads to confusion about the examples in the NF articles. There is of course always a link to the "candidate key" article, but the candidate key article at present is too theoretical for beginners to understand. --Nabav (talk) 09:17, 1 February 2009 (UTC)[reply]

I think either something's wrong with the example or with the candidate key article. Why is "Winner Date of Birth" supposed to be a non-prime attribute? "Tournament, Winner Date of Birth" is a candidate key according to the definition in the candidate key article (there are no distinct tuples with the same values for Tournament and Winner Date of Birth (1) and (1) does not hold for either Tournament or Winner Date of Birth). -- But then "Winner Date of Birth" is contained in a candidate key and therefore prime by definition. Am I missing something? —Preceding unsigned comment added by 141.76.75.39 (talk) 16:42, 27 January 2010 (UTC)[reply]

I'm not surprised you're confused. There's an important point that isn't coming across clearly in the "candidate key" article. Namely: candidate keys are supposed to be unique identifiers not merely for rows that exist in the table right now, but for all rows that could ever exist in the table. Thus {Tournament, Winner} is a candidate key because it's guaranteed to identify rows uniquely no matter what. Whereas the fact that two people can share the same date of birth means that {Tournament, Winner Date of Birth} won't do the same job.--Nabav (talk) 08:40, 15 May 2010 (UTC)[reply]

Another problem with this example: Winner's name is duplicated already, so this does not follow "nothing but the key". Even if the name is PK, two people could have the same name. — Preceding unsigned comment added by 2.248.162.19 (talk) 09:17, 18 January 2023 (UTC)[reply]

The whole key[edit]

This post relates to this page as much as it does to BCNF: http://en.wikipedia.org/wiki/Talk:Boyce-Codd_normal_form#The_whole_key fogus (talk) 18:52, 18 April 2009 (UTC)[reply]

I've reworded the explanation of the BCNF version of the key-whole-key-and-nothing-but-the-key dictum in this article, in response to your criticism. --Nabav (talk) 21:01, 20 April 2009 (UTC)[reply]

X and A?[edit]

In the example "This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:" I am totally unclear as to whether X and A are keys or not, or whether it matters or not. Can somebody clarify? —CobraA1 19:54, 9 November 2009 (UTC)[reply]

i think this is a total false: " A-X, the set difference between A and X is a prime attribute "

from what i know each att of A, Ai should be prime. (and about the response if x is superkey it can be a key) — Preceding unsigned comment added by 46.117.228.13 (talk) 10:41, 7 September 2011 (UTC)[reply]

That's correct. I've changed the text accordingly. However, if I recall correctly, Zaniolo's paper was written using the canonlical definition of FDs, where the determined object is not an attribute set but an attribute (not even a singleton set of attributes) and I didn't make that change since most of the normalisation stuff in wikipedia uses the for where the determined object is a set of attributes. Logically, of course, the single attribute vs set of attributes makes no difference. Michealt (talk) 19:26, 12 October 2012 (UTC)[reply]

Odd comment[edit]

'Normalization is a process of reducing data redundencies.'Bold text —Preceding unsigned comment added by 112.135.20.34 (talk) 06:48, 8 March 2011 (UTC)[reply]

Example has two composite keys[edit]

I was editing the article to remove the "clarification" tag at the end of the tournament example, but I can't come up with a clarification that I like. I initially wrote this:

"Update anomalies cannot occur in these tables, since the only informational column (the date of birth) is not repeated across rows. This assumes that the winner name cannot change or be subject to variation, for example Al being called Alfred in a different competition, a problem which can be avoided through use of a surrogate key".

I don't like it because the date of birth is not purely informational, since in the single-table representation it may also serve to distinguish between two people with the same name. I don't like getting sidetracked with the name thing, but I feel that it is necessary and cannot bring myself to implicitly recommend using a real name as a primary key by not pointing out the problem.

Perhaps the example needs to be amended to use a "Competitor ID" column like the exmaple in first normal form? That doesn't really work very well either, since the data is clearly aggregated from multiple sources and does not have a consistent business key that can be used. Any attempt to identify a unique person by matching name and date of birth is an unreliable assumption in such a context.

— PhilHibbs | talk 09:05, 25 June 2015 (UTC)[reply]

I think everyone is taking the original example too literally. The Example uses the Winner field as an example only. You're supposed to assume--for the sake of simplicity--that the name is Unique. I'm quite certain the author knew that there could be two Al Fredricksons out there, even with the same birthday; same goes for the tournaments, there could be two Indiana Invitational tournaments in 1999--maybe one is in Carmel and one is in La Porte. You're supposed to look at that as an example and instantly realize that the DOB is the thing that doesn't belong. — Preceding unsigned comment added by 198.29.191.200 (talk) 16:53, 3 April 2017 (UTC)[reply]

Zaniolo[edit]

The section does not deliver a proof. First, it does not show equivalency of the two definitions but only one direction. Second, it even fails to deliver that promise. it shows nothing. It is not even wrong. — Preceding unsigned comment added by 93.207.206.126 (talk) 23:30, 28 January 2016 (UTC)[reply]

The entire proof is in the paper which is cited earlier in the article. --Clubguppy (talk) 17:07, 23 May 2018 (UTC)[reply]

(Non-transitively) dependent or Non-(transitively dependent)[edit]

When describing Codd's condition, it is written "Every non-prime attribute of R is non-transitively dependent on every key of R."

I originally read this to mean that every non-prime attribute is directly dependent on every key. Having read the rest of the article, I believe that it is saying every non-prime attribute is not dependent on any key -- not even transitively!

Could the text be edited to disambiguate this? — Preceding unsigned comment added by 92.28.127.164 (talk) 17:19, 6 June 2016 (UTC)[reply]

I think the problem is that the definition given on this 3NF page of 'non-transitively dependent' is a bit sloppy. Compare the definition given on the Transitive dependency article.

Specifically, the specialised article makes clear the three sets of attributes (`X, Y, Z` on the 3NF page confusingly `A, B, C` in the specialised page) must be "distinct" IOW disjoint sets of attributes. I suggest that sentence should read:

> A transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that Y → X), for disjoint sets of attributes X, Y, Z.

WRT the SineBot's point: a non-transitive dependency is a dependency that is not transitive. Yes it does mean that "every non-prime attribute is directly dependent on every key" (the reading you thought was wrong). If some attribute is not dependent on a key, then it's not 'non-prime'. If there's an attribute that's not dependent on a key, then that so-called key isn't a key. Note that attributes that are elements within a key are indeed dependent on the key: that's prime dependency (a trivial Functional Dependency), therefore not non-prime.

Everybody who knows the subject would 'understand' to exclude trivial Functional Dependencies (and indeed the Zaniolo 1982 equivalent explicitly excludes them). So the only people who might misunderstand are precisely those coming here to learn about Normal Forms.

AntC2 (talk) 05:23, 21 February 2019 (UTC)[reply]

Courtroom phrasing[edit]

There must be a way to work in my favorite database joke about the courtroom phrasing: "I promise to use the key, the whole key, and nothing but the key, so help me Codd". But I haven't figured out where it fits yet :-). Amillar (talk) 06:19, 8 July 2016 (UTC)[reply]

That was removed in January for irrelevance (not by me). Though it is mildly amusing, I agree with the removal. --Boson (talk) 06:55, 8 July 2016 (UTC)[reply]

Well, I see it is back, and I'd like to see it stay. The current article explains the use, and for me at least, it is much easier to understand than using standard industry jargon. Natcolley (talk) 00:43, 24 August 2016 (UTC)[reply]

Ubernname (talk) 23:45, 12 October 2018 (UTC) Ubername My first post on Wikipedia so sorry if I have got the rules for signing my post with four tildes wrong.[reply]

My observation is that whilst 3NF is a step on the way to getting to a database design, it is actually a method for identifying a data design, or perhaps less prescriptively, a particular way of viewing data. Ages ago I was taught (according to a process called SSADM - Structured Sytems analysis and Design Methodology) that you would create a TNF model from whatever data was already available, an Entity Relationship Diagram (ERD) from whatever business processes you needed to support (so adding new data as/if identified, in a fairly structured form). From those, you would create a composite logical data model (i.e. combining the results of your TNF and your ERD).

Note that this is still a logical data model - I don't think it is useful to get into whatever you need to do to turn that into a (physical) database design, because the world has moved on from relational databases being the only solution, but I thought it was worth trying to tease the differences between a data design and a database design.

The reason why I think this is important is that back when I was doing this stuff storage was expensive - I was involved in a procurement for £1M for a 14GB (that's right - less than you have on your phone) data storage solution. Now it is almost free, but the underlying principles of getting the data architecture right remain. So I am grateful that people have taken the time to create and curate this article, but I thought I could challenge the assertion that the article is about database design rather than data design.

Considerations for use in reporting environments[edit]

Why is this section here? There are many reasons NOT to put a data structure into 3NF and reporting is only one of them. It would be easy to swamp the topic with a discussion of "Considerations for use in .." matters. It adds nothing to an understanding of what 3NF is, which is explained by the 'Tournament Winners' example far better than the algebraic discussion. Varybit (talk) 21:55, 5 March 2019 (UTC)[reply]

Example vs 2NF example[edit]

Call me stupid, but I don't see the difference between the example on this page and the example on Second normal form.--Henke37 (talk) 20:06, 17 April 2022 (UTC)[reply]

Both examples end meeting the 3NF, but the example in this page starts from a table that meets 2NF and not 3NF.

In this example, the determinant Winner is not part of a candidate key, so it has no issues with 2NF.

In the 2NF article example, the determinant Manufacturer is part of a candidate key, so it's not in 2NF. Linkcisco (talk) 04:33, 23 August 2023 (UTC)[reply]