Talk:Data modeling

Wikipedia Ambassador Program assignment
This article is the subject of an educational assignment at University of Toronto supported by WikiProject Wikipedia and the Wikipedia Ambassador Program&#32;during the 2011 Q3 term. Further details are available on the course page.

Above message substituted from on 15:18, 7 January 2023 (UTC)

Remarks
This sounds like something copied and pasted from a dry high buzzword quotient text book. This definitley needs rewriting to be useful as an encyclopedia entry, instead of the BS it is. Although it may be, they have done a good work..

I just revamped the definition, before I figured out how to identify myself to Wikipedia. Now that I have done so, I can admit, yes, it's my fault --Dave Hay.


 * Hi Dave, be sure to use the four tildes to name/timestamp your comments. I think this still needs further work - it's kind of like jumping into the deep end of the pool. How about something a little more accessible for those who are completely unfamiliar? Also, many data models might not have PERSON. You might also discuss the all-caps convention in your methodology. :Charles T. Betz 20:15, 26 February 2006 (UTC)


 * I added some clarification and am digging into the history of this article. The first cut wasn't half bad. It seems to have gone through some ups and downs. Charles T. Betz 20:41, 26 February 2006 (UTC)

Your clarification was good. It's looking good. Now, what do we do with "Data Model". Dave Hay 26 February 2006.

So, what do four tildes do? DaveHay 23:34, 26 February 2006 (UTC)


 * The four tildes (DEddy 15:50, 10 April 2006 (UTC)) sign, & time/date stamp your edit.

It is not giving exact idea of data modeling. Need to work out more on this.

Who entered the Zachman Framework description? Those categories have nothing to do with the Zachman levels.DavidCHay 21:49, 27 August 2007 (UTC)


 * Agreed. Revert. Charles T. Betz 23:15, 27 August 2007 (UTC)


 * Suggest data modeling - a verb, be the activity of constructing a data model, etc. per the first sentence or so and just refer to those pages. The sections on this page should be part of the sections under Data model (not data modeling), as the reference is to the artifact (data model), not the process (data modeling). Rather this page should preserve only those techniques one uses to create the data model (i.e. analysis). Gorbag42 (talk) 18:28, 8 February 2008 (UTC)


 * The goals of this article are to overview fundamental data modeling skills that all developers should have, skills that can be applied on both traditional projects that take a serial approach to agile projects that take an evolutionary approach. My personal philosophy is that every IT professional should have a basic understanding of data modeling.  They don’t need to be experts at data modeling, but they should be prepared to be involved in the creation of such a model, be able to read an existing data model, understand when and when not to create a data model, and appreciate fundamental data design techniques.  This article is a brief introduction to these skills.  The primary audience for this article is application developers who need to gain an understanding of some of the critical activities performed by an Agile DBA.  This understanding should lead to an appreciation of what Agile DBAs do and why they do them, and it should help to bridge the communication gap between these two roles.


 * Data modeling involves structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data, data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure. [added to article page 2008-09-01T06:20:21 Mbasuchit1]

The opening sentence
The opening sentence is obtuse and of little use in introducing the topic. It is equivalent to saying, "Building a car is the process of building a car using car building materials and following car building techniques." There is no information in the statement that provides definition or clarity. I have seen that others have also taken note of the opening statement. Please seek to rework the introduction so that the poor saps like me, even with extensive work on other modeling concepts, can glean some information to determine if this article will be relevant to me. In fact, I would suggest using the last three sentences from the comment by mbasuchit1 above with a minor re-write for readability. For instance, the opening can read, "Data modeling involves structuring and organizing data for the design and implementation of systems such as database management systems. In addition to defining and organizing the data, data modeling implicitly or explicitly imposes constraints or limitations on the data placed within the structure." I know the original text above has already been added to the article, but these statements provided enough information to determine if the contents of the article are likely applicable to support my needs, which is why I am recommended to place the one portion of the text as the introduction for the article. Just my 2 pence. Daddyph (talk) 04:29, 15 September 2010 (UTC)


 * Suggested opening lines: "Data modeling is the process of identifying and arranging classes of data that are relevant to a domain of interest. The product of the effort, a data model, may be used to describe data requirements or as a starting design for a database component." Tee Owe (talk) 23:56, 15 September 2010 (UTC)


 * Two good things about the current opening sentence are that it shows the scope (software engineering) and links to the most related article (data model). Please find an alternative with those two elements. -- Mdd (talk) 10:03, 16 September 2010 (UTC)


 * I mentioned the term "data model" in the second sentence. I happen to disagree about software engineering being data modeling's "scope"; maybe it's information systems or systems engineering?.  I have used data models to describe business policy, strategic plans as well as hard-copy forms. Tee Owe (talk) 03:11, 17 September 2010 (UTC)

Hi Tee, I am following an interesting discussion about this at the Linked-In Business Architecture Community about [http://www.linkedin.com/groupItem?view=&gid=84758&type=member&item=28723529&qid=a20cc50e-d0c0-4691-97ca-fac7a75ab834&goback=.anp_84758_1284734352967_1.gmp_84758 UML is the modeling language of technologists. Should it be the modeling language for business also, in the interests of business-technology alignment?]. There seems to be a general understanding here that UML in general (and data modelling in particular) has little use in business. Based on that my first impression is that you use of data models is exceptional. -- Mdd (talk) 14:44, 17 September 2010 (UTC)


 * Hey MDD. In my opinion on UML is that it is not very good as a communication device. In my experience, it is cryptic, obscure, unoriginal, unwieldy, jargon-loaded and mostly unhelpful.  UML Class diagrams are woefully lacking compared to Oracle/Barker-style data models or even IDEF1X models. UML Activity diagrams are much better, but can you say flowchart? And what the heck is the sequence chart really for anyhow? And why is there a UML 2.0? Not only are Data models <> UML but data models >>> UML. Tee Owe (talk) 03:58, 21 September 2010 (UTC)


 * I'd second your criticisms of UMLs. The biggest problem I've found is their lack of understanding by anyone other than the author, particularly the end users, product owners and those who understand the relevant business area more than coding techniques. I now favour text-based user stories, not because they're absolutely better for design or easier to produce, but because they communicate better. Andy Dingley (talk) 09:08, 21 September 2010 (UTC)


 * Thanks for your comments, which short of confirms the opinions in that Linked-In discussion. There was one other argument in that discussion which made a lot of sense to me.
 * We have to remember that when business models, they are modeling value. Their flows are usually cash flows - not data flows. IT people tend to model data flows. (quote by Cliff Berg)
 * Now I don't it is that simple. But the discussion stipulates general management isn't that interested in datamodelling when it comes to making management choices. I have received an systems engineering education, and I have noticed there is a world of difference between software engineering modelling and systems engineering modelling. And there is a world of difference between these and for example systems dynamics modelling, or the many management models.
 * Back to the opening sentence I do thing we have to keep it simple(fied), and related datamodelling to software engineering. -- Mdd (talk) 20:58, 26 September 2010 (UTC)


 * Mdd, I have little disagreement with most of what you wrote. But remember (logical) data models do not model "data flows" at all; they represent the structure of data, independent of procedural constraints. We can use data models to represent either system data requirements, or to specify a physical database design on a relational database management system. While both functions are clearly related to software engineering (especially the latter), they are both more specifically in the domain of systems engineering.  Software engineering is essentially about writing code modules and distributing them.  Data modeling is not.  One can successfully practice excellent software engineering, for example in controls systems, and know nothing about data modeling.
 * Anyone in "general management" who isn't that interested in "data modelling" when it "comes to making management choices" falls into one of two camps. They either implicitly understand enough about the meaning and context of their data resources to be confident in their decision-making or, much more likely, they under-appreciate the value of those data resources and remain blissfully ignorant of the understanding that data modeling can provide to their actual business model.
 * I still believe my proposal is simple enough and it does mention a data model as a product. Tee Owe (talk) 23:36, 26 September 2010 (UTC)


 * Ok, I am beginning to have an idea where you are aiming at. For collecting data requirements you are better off with a systems engineer then with a software engineer!? However, last time I checked datamodeling is not part of the curriculum of systems engineering. But maybe I am mistaken. So I will check some things and will get back on this.


 * For now we could consider adding your two lines as the second and third sentence? -- Mdd (talk) 23:57, 27 September 2010 (UTC)

Updated opening sentence, as it currently doesn’t seem to serve any purpose other than loosely relating data modeling to software engineering. Inserted a reference to information systems, as data models are usually related to some sort of IS. Also, removed ‘data model descriptions’, as I don’t think that adds much meaning to the opening of the article. Does anyone know what the author meant by applying formal data model descriptions using data modeling techniques? Sameen.r (talk) 16:50, 20 October 2011 (UTC)

Overview and Data Modeling Topics
I noticed that the Overview section as well as some of the Data Modeling Topics included irrelevant jargon that appeared to be sourced from an article about how data modeling can go wrong, and why it could end up being expensive. This information seemed more like opinions of the author, rather than facts (and even if they were facts, it didn't seem necessary given the scope of this article). Also, I edited some of the sentences that appeared to be very technical/bookish/unclear, so that people with little or no background information about data modeling could understand and follow. Sameen.r (talk) 02:29, 21 October 2011 (UTC)

Article expanded
I restructered the data model and data modelling article recently. First I merged all content in the data model article about a month ago, leaving this article here a stub. Now I have expanded this article again with the intention of focussing on data modelling. In the process I used some of the things already explained in the data model article. Now this is still far from perfect. I think I still miss some mayor perspectives here.

As Er-Yu Ding (2004) was explaining in his [http://software.nju.edu.cn/eryuding/course/RE2004fall/material--Brief%20history%20of%20data%20modeling.ppt. "Brief history of data modeling"] there have been all kinds of approaches in datamodeling since the 1950s/60s:
 * Structured Programming and Design : Started at code level (programming), with Edsgar Dijkstra (1968)
 * Relational data modeling : based on Data Theory Normalization by E.F. Codd, 1970)
 * Structured Analysis adds in the 1970s with Entity Relationship Diagrams (ERD) for static data modeling by Chen i.e.
 * Structured Analysis and Design with CASE tools in the 1980s based on Information Engineering (Finkelstein & Martin)
 * Prototyping in the 1980s
 * Object-Orientation with C++ sinc 1984 and UML (Gooch, Rumbaugh, Jacobson) in end 1990s
 * In the 1990s RAD (Rapid Application Development), Data Warehouse, Business Rules, Middleware, Web
 * And Generic models (industry & subject area) and standards, Data transformation and transmission

New approaches in software development methodologies have effected data modelling as well. This article hasn't done a very good job yet explaining these historical links, and the new services in brought in data modeling. -- Marcel Douwe Dekker (talk) 01:14, 31 October 2008 (UTC)

Data modeling and Zachman Framework
Currently the introductory section claims that "Data modeling is a technique for defining business requirements for a database. It is sometimes called database modeling because a data model is eventually implemented in a database." This is a very database-centric statement. There are applications of data models that have nothing to do with databases. Zachman Framework provides a good foundation for reasoning about data (the "What" column) by distinguishing between 6 levels of abstraction: Scope, Business Model, System Model, Technology Model, Detailed Representation and Functional Enterprise. One can have a useful data models at each level.

Some of theses models may never become a requirement for a system (Scope and Business Model). The ones that do, may not require a persistent datastore. The ones that do may use some technology other than a database, for example, index sequential files, XML, semantic web, etc.

-- Equilibrioception (talk) 17:00, 10 March 2009 (UTC)


 * You claim the Zachman Framework provides a good foundation for reasoning about data. Then you might want to take a look at Talk:Data model. I just removed the section about the Zachman Framework in the data model, because the text made little sense to me. I don think the data model article and or this article should contain a section about the Zachman Framework. But it should have a clear explaination, and should rely on reliable sources. -- Marcel Douwe Dekker (talk) 18:56, 10 March 2009 (UTC)


 * I reviewed the removed section, and second your decision: the section is incorrect in several subtle ways.
 * I did claim that Zachman Framework can be useful; Do you know about any other way to introduce the idea of [several levels of abstraction at which a data model can be applied]? Zachman Framework is just one "school of thought" to address this idea. How big is the controversy in the definition of these levels? From the system science perspective, was it Zachman who first introduced them?
 * I am concerned about the excessively database-centric statement in the introduction section. Interesting enough - the rest of the article does not describe a database-centric world.
 * -- Equilibrioception (talk) 19:28, 10 March 2009 (UTC)


 * Please refraise your question about the Zachman Framework at the Talk:Data model. I have (re)written the current Zachman Framework Wikipedia article, and I want to share my ideas, but I like to keep this discussion in one place.


 * As to the database-centric statement. I moved that statement from the intro to the end of the overview section, and added "also". I think (hope) this will solve the problem you have detected. -- Marcel Douwe Dekker (talk) 19:43, 10 March 2009 (UTC)

Other uses of the term data modeling
Some sources have a much more mathematical (less data-base oriented) definition of what data modeling is. I'm not sure whether this is best treated as a special case of what is covered here or as a separate article. Probably best to create a separate article because I think one represents how best to store data whereas the other looks for a model which would fit the data.

e.g. William H. Press (2007) "Numerical recipes: The art of scientific computing" - Chapter 15: Modelling of Data.

Yaris678 (talk) 21:12, 3 September 2009 (UTC)


 * I am not familiar with these mathematical theory, but it seems like a good idea to start a separate article first. And beside that, this article is not just data-base oriented. There is a separate article about database models : the visualization of how to store data. Data modeling, I think, is more specific about the visualization of the use of data in organizations. -- Marcel Douwe Dekker (talk) 22:24, 3 September 2009 (UTC)

Copy-paste registration
-- Mdd (talk) 22:57, 4 November 2009 (UTC)
 * In this edit text is copy/pasted here from the IDEF1X and Generic data model articles.
 * In this edit text is copy/pasted here a PD source.
 * In this edit text is copy/pasted here from the database design article.
 * In this edit text is copy/pasted here from the Data model article.
 * In this edit text is copy/pasted here from a PD source.

Conceptual, logical and physical schemas
This paragraph opens with "In 1975 ANSI described three kinds of data-model instance". They did but in 1975 it wasn't Conceptual/Logical/Physical as the para claims. It was External/Conceptual/Physical, see the wiki article http://en.wikipedia.org/wiki/Three_schema_approach or look at the diagram alongside! 80.254.147.164 (talk) 14:45, 17 July 2014 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 1 one external link on Data modeling. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive https://web.archive.org/20061011024026/http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf to http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

Cheers.—cyberbot II  Talk to my owner :Online 02:00, 8 January 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 1 one external link on Data modeling. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive http://web.archive.org/web/20090320001015/http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf to http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/$FILE/DIGloss.pdf

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 06:02, 27 March 2016 (UTC)

Proposed merge of Database design into Data modeling
AKA database modeling fgnievinski (talk) 19:11, 18 June 2021 (UTC)

Horrified I'm primarily concerned about the underlying sensibility behind this merge proposal, but sensibilities are not so easily discussed in concise terms and I don't want to break the thread, so I've formatted by response to indent the screedish burger between the buns. Burger portion

Once upon a time, "database design" pretty much meant one thing only: relational database design on table-based data stores (i.e. RDBMSs), though from time to time in the 1990s you would also hear about object-oriented databases, which was usually some kind of objectified bolt-on to mostly the same foundation.

The center of the RBDMS world is the tabular join. All your properties in time, space, scale, and manageability derive from this. The tabular join was one of the few miraculous foundation stones in the history of computer science: for a long while, it was very nearly one-size fits all. You could even model graphs up to k-degrees of separation, for small value of k. Then along came social media and the social graph at scale (nearly 8 billion nodes, and counting) and suddenly the bubble burst, circa 2005.

These days, your father's RDBMS is relegated to the dustbin of essential infrastructure (along with C++ and regular expressions) that keep the world humming along while the kids play with cool new technologies, such as the 50 shades of microservice key–value stores. Whether you call this web scale or big data, it's not data modeling as originally practiced.

Back when C++ was invented it didn't have the luxury of saying "well, if some task proves unsuited to C++, you can always use $HEAVY instead". It was actually the reverse: "if some task proves unsuited to $DSVM, you can always code down to the bare metal in C++ instead". DSVM is a neologism I just invented now for "domain specific virtual machine" such as JVM or BEAM.

These days, software is eating the world, and 90% of eyes and mouth are $SEXY_GC. Further down the digestive track, 90% of the digestive process continues to rely upon the grungy plumbing of Unix pipes, C/C++, regular expressions, RDBs and TeX. Rust is pretty much the only sexy new thing that's actually making progress in the quadrant of zero-cost-abstraction minus foot-gun.

Viewed from anywhere further down the intestinal tract than the epiglottis, yesterday's data modeling (fixated on RDBs) was a vast universe, and it's not dead yet.


 * Top programming languages: Python still rules but old COBOL gets a pandemic bump — 27 July 2020; edit: COBOL rendered as God intended

This year's rankings are based on 11 metrics from eight sources, including CareerBuilder, GitHub, Google, Hacker News, the IEEE, Reddit, Stack Overflow, and Twitter. One standout from this year's ranking is the 60-year-old COBOL, which, based on the Twitter metric alone, is the seventh most popular language. IEEE Spectrum speculates this is because unemployment benefits systems in several US states that are written in COBOL were failing under the strain of higher volumes due to workers being laid off during the pandemic lockdowns.

My concern is that if you try to stuff this vast and deeply orthogonal cultural history into unified coverage, the sheer mass of overhang from not-half-so-far-back-as-COBOL will constantly verge on undue coverage. That would be completely unfair to history, which isn't even history yet by a proper standard — if you're not fixated on the bathroom mirror of the world lately. If we wanted to have an article on relational database design that would alleviate the problem. But this is not a term it was ever previously known by. It was merely "database design" full stop. &mdash; MaxEnt 20:51, 29 November 2021 (UTC)