misspelledsearch.com:

data profiling

information page

If you cannot find the information you are searching for on this page, we suggest searching Google with the correct spelling "data profiling":

Google

Data Profiling is a process whereby one examines the data available in an existing database and collects statistics and information about that data. Typical types of metadata sought are:

  • Domain: whether the data in the column conforms to the defined values or range of values it is expected to take
    • for example: ages of children in kindergarten are expected to be between 4 and 5. An age of 7 would be considered out of domain
    • a code for flammable materials is expected to be A, B or C. A code of 3 would be considered out of domain.
  • Type: Alphabetic or numeric
  • pattern: a North American phone number should be (999)999-9999
  • frequency counts: most of our customers should be in California; so the largest number of occurrences of state code should be CA
  • Statistics:
    • minimum value
    • maximum value
    • mean value (average)
    • median value
    • modal value
    • standard deviation
  • Interdependency:
    • within a table: the zip code field always depends on the country code
    • between tables: the customer number on an order should always appear in the customer table

Broadly speaking, most vendors who provide tools in the data profiling space divide the functionality into three categories. The names for these categories often differ depending on the vendor, but the overall process is in three steps, which must be executed in order:

  • Column Profiling (Including the statistics and domain examples provided above)
  • Dependency Profiling, which identifies intra-table dependencies. Dependency profiling is related to the normalization of a data source, and addresses whether or not there are non-key attributes that determine or are dependent on other non-key attributes. The existence of transitive dependencies here may be evidence of second-normal form.
  • Redundancy Profiling, which identifies overlapping values between tables. This is typically used to identify candidate foreign keys within tables, to validate attributes that should be foreign keys (but that may not have constraints to enforce integrity), and to identify other areas of data redundancy. Example: redundancy analysis could provide the analyst with the fact that the ZIP field in table A contained the same values as the ZIP_CODE field in table B, 80% of the time.

Column profiling provides critical metadata which is required in order to perform dependency profiling, and as such, must be executed before dependency profiling. Similarly, dependency profiling must be performed before redundancy profiling. While the output of previous steps may not be interesting to an analyst depending on his or her purpose, the analyst will most likely be obliged to move through these steps anyway.

This data profiling index site has been developed to help wayward users find the information they are looking for, no matter how they are mistakenly spelled or mistyped. This site is designed to help users find data profiling information for the following query variants:

data data perfulint data perfiliegnt data profiliegng
data plofileigng data proiling data plofuliegng data porfuling
data porfiliegng data profiriegng data plofireigng data profling
data perfuliegng data porfulint data porfiriegng data profiliegnt
data plofileignt data profiing data porfuliegng data profuleigng
data porfiliegnt data profiriegnt data perfileigng data profilng
data profuleignt data profuling data plofiliegng data perfireigng
data profilig data plofuleigng data profulint data plofiriegng
data perfileignt data profileigng data perfuleigng data plofuling
data plofiliegnt data porfileigng data profireigng data porfuleigng
data plofulint data perfiliegng data porfireigng data profileignt
data pofiling data profuliegng data perfuling data perfiriegng
data porfileignt data profireignt data prfiling data profuliegnt
data profirint data plofiring data profilint data plofirint
data plofiling data perfiring data plofilint data perfirint
data perfiling data porfiring data perfilint data porfirint
data porfiling data porfilint data profiring data profiiing
data proflllng data profilimg data profilign data profilnig
data profiilng data profliing data proifling data prfoiling
data rpofiling data profilin data rofiling profiling
dat profiling ata profiling daat profiling dtaa profiling
adta profiling daa profiling dta profiling

If you would like to add or correct the content of this site, or if you are interested in supporting the efforts of misspelledsearch.com by placing your product information on these data profiling pages, please contact mistype@gmail.com for details.

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "data profiling".