Top 50 Database System Concepts Interview Questions and Answers for 19.Apr.2024

Q1. Since Indices Speed Query Processing, Why Might They Not Be Kept On Several Search Keys?

Reasons for not keeping several search indices include:

Every index requires additional CPU time and disk I/O overhead during inserts and deletions.
Indices on non-primary keys might have to be changed on updates, although an index on the primary key might not (this is because updates typically do not modify the primary key attributes).
Each extra index requires additional storage space.
For queries which involve conditions on several search keys, efficiency might not be bad even if only some of the keys have indices on them. Therefore database performance is improved less by adding indices when many indices already exist

Q2. Explain The Distinction Between Condition-defined And User-defined Constraints. Which Of These Constraints Can The System Check Automatically?

In a generalization–specialization hierarchy, it must be possible to decide which entities are members of which lower level entity sets. In a condition defined design constraint, membership in the lower level entity-sets is evaluated on the basis of whether or not an entity satisfies an explicit condition or predicate.User-defined lower-level entity sets are not constrained by a membership condition; rather, entities are assigned to a given entity set by the database user.

Condition-defined constraints alone can be automatically handled by the system. Whenever any tuple is inserted into the database, its membership in the various lower level entity-sets can be automatically decided by evaluating the respective membership predicates. Similarly when a tuple is updated, its membership in the various entity sets can be re-evaluated automatically.

Q3. List Five Responsibilities Of A Database Management System?

A general purpose database manager (DBM) has five responsibilities:

interaction with the file manager.
integrity enforcement.
security enforcement.
backup and recovery.
concurrency control.

Q4. Discuss The Relative Advantages Of Centralized And Distributed Databases?

A distributed database allows a user convenient and trparent access to data which is not stored at the site, while allowing each site control over its own local data. A distributed database can be made more reliable than a centralized system because if one site fails, the database can continue functioning, but if the centralized system fails, the database can no longer continue with its normal operation. Also, a distributed database allows parallel execution of queries and possibly splitting one query into many parts to increase throughput.
A centralized system is easier to design and implement. A centralized system is cheaper to operate because messages do not have to be sent.

Q5. What Are The Factors That Can Work Against Linear Scale Up In A Traction Processing System?which Of The Factors Are Likely To Be The Most Important In Each Of The Following Architectures: Shared Memor

Increasing contention for shared resources prevents linear scale-up with increasing parallelism. In a shared memory system, contention for memory (which implies bus contention) will result in falling scale-up with increasing parallelism. In a shared disk system, it is contention for disk and bus access which affects scale-up. In a shared-nothing system, inter-process communication overheads will be the main impeding factor. Since there is no shared memory, acquiring locks, and other activities requiring message passing between processes will take more time with increased parallelism.

Q6. When Is It Preferable To Use A Dense Index Rather Than A Sparse Index?

It is preferable to use a dense index instead of a sparse index when the file is not sorted on the indexed field (such as when the index is a secondary index) or when the index file is small compared to the size of memory.

Q7. Explain Why 4nf Is A Normal Form More Desirable Than Bcnf?

4NF is more desirable than BCNF because it reduces the repetition of information. If we consider a BCNF schema not in 4NF (see Exercise 7.28), we observe that decomposition into 4NF does not lose information provided that a loss less join decomposition is used, yet redundancy is reduced.

Q8. Explain Why A Physical Oid Must Contain More Information Than A Pointer To A Physical Storage Location?

A physical OID needs to have a unique identifier in addition to a pointer to a physical storage location. This is required to prevent dereferences of dangling pointers.

Q9. If Physical Oids Are Used, An Object Can Be Relocated By Keeping A Forwarding Pointer To Its New Location. In Case An Object Gets Forwarded Multiple Times, What Would Be The Effect On Retrieval Spee

If an object gets forwarded multiple times, the retrieval speed will decrease because accessing it will require accessing the series of locations from which the object has been successively forwarded to the current location. Multiple accesses can be avoided by always keeping in the oldest location the latest address of the object. This can be done by checking while forwarding whether this object has already been forwarded and in that case updating the forwarding address at the oldest location. Thus, atmost two accesses will be required.

Q10. We Described The Use Of Views To Simplify Access To The Database By Users Who Need To See Only Part Of The Database. The Use Of Views As A Security Mechanism. Do These Two Purposes For Views Ever Con

Usually, a well-designed view and security mechanism can avoid conflicts between ease of access and security. However, as the following example shows, the two purposes do conflict in case the mechanisms are not designed carefully.

Suppose we have a database of employee data and a user whose view involves employee data for employees earning less than $10,0@If this user inserts employee Jones, whose salary is $9,000, but accidentally enters $90,000, several existing database systems will accept this update as a valid update through a view. However, the user will be denied access to delete this erroneous tuple by the security mechanism.

Q11. In Multiple-granularity Locking, What Is The Difference Between Implicit And Explicit Locking?

When a traction explicitly locks a node in shared or exclusive mode, it implicitly locks all the descendents of that node in the same mode. The traction need not explicitly lock the descendent nodes. There is no difference in the functionalities of these locks, the only difference is in the way they are acquired, and their presence tested.

Q12. List The Acid Properties. Explain The Usefulness Of Each?

The ACID properties, and the need for each of them are:-

Consistency: Execution of a traction in isolation (that is, with no other traction executing concurrently) preserves the consistency of the database. This is typically the responsibility of the application programmer who codes the tractions.
Atomicity: Either all operations of the traction are reflected properly in the database, or none are. Clearly lack of atomicity will lead to inconsistency in the database.
Isolation: When multiple tractions execute concurrently, it should be the case that, for every pair of tractions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each traction is unaware of other tractions executing concurrently with it. The user view of a traction system requires the isolation property, and the property that concurrent schedules take the system from one consistent state to another. These requirements are satisfied by ensuring that only serializable schedules of individually consistency preserving tractions are allowed.
Durability: After a traction completes successfully, the changes it has made to the database persist, even if there are system failures.

Q13. Consider A Two-dimensional Integer Array Of Size N ×m That Is To Be Used In Your Favorite Programming Language. Using The Array As An Example, Illustrate The Difference?

Betw

Let tgrid be a two-dimensional integer array of size n × m.

- The physical level would simply be m × n (probably consecutive) storage locations of whatever size is specified by the implementation (e.g., 32 bits each).
- The conceptual level is a grid of boxes, each possibly containing an integer, which is n boxes high by m boxes wide.
- There are 2m×n possible views. For example, a view might be the entire array, or particular row of the array, or all n rows but only columns 1 through i.
- Consider the following Pascal declarations: type tgrid = array[1..n, 1..m] of integer; var vgrid1, vgrid2 : tgrid Then tgrid is a schema, whereas the value of variables vgrid1 and vgrid2 are instances.
- To illustrate further, consider the schema array[1..2, 1..2] of integer. Two instances of this scheme are:

Q14. Explain The Distinction Between Total And Partial Constraints?

In a total design constraint, each higher-level entity must belong to a lower-level entity set. The same need not be true in a partial design constraint. For instance, some employees may belong to no work-team.

Q15. What Are Two Advantages Of Encrypting Data Stored In The Database?

Encrypted data allows authorized users to access data without worrying about other users or the system administrator gaining any information.
Encryption of data may simplify or even strengthen other authorization mechanisms. For example, distribution of the cryptographic key amongst only trusted users is both, a simple way to control read access, and an added layer of security above that offered by views.

Q16. Under What Conditions Is It Less Expensive To Avoid Deadlock Than To Allow Deadlocks To Occur And Then To Detect Them?

Deadlock avoidance is preferable if the consequences of abort are serious (as in interactive tractions), and if there is high contention and a resulting high probability of deadlock.

Q17. List Two Reasons Why We May Choose To Define A View?

Security conditions may require that the entire logical database be not visible to all users.
We may wish to create a personalized collection of relations that is better matched to a certain user’s intuition than is the actual logical model.

Q18. Explain The Distinction Between Closed And Open Hashing. Discuss The Relative Merits Of Each Technique In Database Applications?

Open hashing may place keys with the same hash function value in different buckets. Closed hashing always places such keys together in the same bucket. Thus in this case, different buckets can be of different sizes, though the implementation may be by linking together fixed size buckets using overflow chains. Deletion is difficult with open hashing as all the buckets may have to inspected before we can ascertain that a key value has been deleted, whereas in closed hashing only that bucket whose address is obtained by hashing the key value need be inspected. Deletions are more common in databases and hence closed hashing is more appropriate for them. For a small, static set of data lookups may be more efficient using open hashing. The symbol table of a compiler would be a good example.

Q19. Explain The Distinction Between The Terms Serial Schedule And Serializable Schedule.

A schedule in which all the instructions belonging to one single traction appear together is called a serial schedule. A serializable schedule has a weaker restriction that it should be equivalent to some serial schedule. There are two definitions of schedule equivalence – conflict equivalence and view equivalence. Both of these are described in the chapter.

Q20. Consider A Network Based On Dial-up Phone Lines, Where Sites Communicate Periodically, Such As Every Night. Such Networks Are Often Configured With A Server Site And Multiple Client Sites. The Clien

With the central server, each site does not have to remember which site to contact when a particular data item is to be requested. The central server alone needs to remember this, so data items can be moved around easily, depending on which sites access which items most frequently.Other house-keeping tasks are also centralized rather than distributed, making the system easier to develop and maintain. Of course there is the disadvantage of a total shutdown in case the server becomes unavailable. Even if it is running, it may become a bottleneck because every request has to be routed via it.

Q21. List The Physical Storage Media Available On The Computers You Use Routinely. Give The Speed With Which Data Can Be Accessed On Each Medium?

Your wer will be based on the computers and storage media that you use. Typical examples would be hard disk, floppy disks and CD-ROM drives.

Q22. Explain How Dangling Tuples May Arise. Explain Problems That They May Cause?

Dangling tuples can arise when one tuple is inserted into a decomposed relation but no corresponding tuple is inserted into the other relations in the decomposition. They can cause incorrect values to be returned by queries which form the join of a decomposed relation since the dangling tuple might not be included. dangling tuples can be avoided by the specification of referential integrity constraints.

Q23. Consider The Validation-based Concurrency-control. Show That By Choosing Validation(ti), Rather Than Start(ti), As The Time Stamp Of Traction Ti, We Can Expect Better Response Time Provided That Confl

In the concurrency control scheme choosing Start(Ti) as the time stamp of Ti gives a subset of the schedules allowed by choosing Validation(Ti) as the time stamp. Using Start(Ti) me that whoever started first must finish first. Clearly tractions could enter the validation phase in the same order in which they began executing, but this is overly restrictive. Since choosing Validation(Ti) causes fewer non conflicting tractions to restart, it gives the better response times.

Q24. Explain The Phantom Phenomenon. Why May This Phenomenon Lead To An Incorrect Concurrent Execution Despite The Use Of The Two-phase Locking Protocol?

The phantom phenomenon arises when, due to an insertion or deletion, two tractions logically conflict despite not locking any data items in common. The insertion case is described in the book. Deletion can also lead to this phenomenon. Suppose Ti deletes a tuple from a relation while Tj sc the relation. If Ti deletes the tuple and then Tj reads the relation, Ti should be serialized before Tj . Yet there is no tuple that both Ti and Tj conflict on.

An interpretation of 2PL as just locking the accessed tuples in a relation is incorrect. There is also an index or a relation data that has information about the tuples in the relation. This information is read by any traction that sc the relation, and modified by tractions that update, or insert into, or delete from the relation. Hence locking must also be performed on the index or relation data, and this will avoid the phantom phenomenon.

Q25. Construct An E-r Diagram For A Car-insurance Company Whose Customers Own One Or More Cars Each. Each Car Has Associated With It Zero To Any Number Of Recorded Accidents?

Q26. Give An Example Where Lazy Replication Can Lead To An Inconsistent Database State Even When Updates Get An Exclusive Lock On The Primary (master) Copy?

Consider the balance in an account, replicated at N sites. Let the current balance be $100 – consistent across all sites. Consider two tractions T1 and T2 each depositing $10 in the account. Thus the balance would be $120 after both these tractions are executed. Let the tractions execute in sequence: T1 first and then T@Suppose the copy of the balance at one of the sites, say s, is not consistent – due to lazy replication strategy – with the primary copy after traction T1 is executed and let traction T2 read this copy of the balance. One can see that the balance at the primary site would be $110 at the end.

Q27. How Does The Concept Of An Object In The Object-oriented Model Differ From The Concept Of An Entity In The Entity-relationship Model?

An entity is simply a collection of variables or data items. An object is an encapsulation of data as well as the methods (code) to operate on the data. The data members of an object are directly visible only to its methods. The outside world can gain access to the object’s data only by passing pre-defined messages to it, and these messages are implemented by the methods.

Q28. Raid Systems Typically Allow You To Replace Failed Disks Without Stopping Access To The System. Thus, The Data In The Failed Disk Must Be Rebuilt And Written To The Replacement Disk While The System

RAID level 1 (mirroring) is the one which facilitates rebuilding of a failed disk with minimum interference with the on-going disk accesses. This is because rebuilding in this case involves copying data from just the failed disk’s mirror. In the other RAID levels, rebuilding involves reading the entire contents of all the other disks.

Q29. Consider A File System Such As The One On Your Favorite Operating System.

what Are The Steps Involved In Creation And Deletion Of Files, And In Writing Data To A File?

explain H

There are several steps in the creation of a file. A storage area is assigned to the file in the file system, a unique i-number is given to the file and an i-node entry is inserted into the i-list. Deletion of file involves exactly opposite steps.

For the file system user in UNIX, durability is important for obvious reasons, but atomicity is not relevant generally as the file system doesn’t support tractions. To the file system implementor though, many of the internal file system actions need to have traction semantics. All the steps involved in creation/deletion of the file must be atomic, otherwise there will be unreferenceable files or unusable areas in the file system.

Q30. During Its Execution, A Traction Passes Through Several States, Until It Finally Commits Or Aborts. List All Possible Sequences Of States Through Which A Traction May Pass. Explain Why Each State Tri

The possible sequences of states are:-

active→partially committed→committed. This is the normal sequence a successful traction will follow. After executing all its statements it enters the partially committed state. After enough recovery information has been written to disk, the traction finally enters the committed state.
active → partially committed → aborted. After executing the last statement of the traction, it enters the partially committed state. But before enough recovery information is written to disk, a hardware failure may occur destroying the memory contents. In this case the changes which it made to the database are undone, and the traction enters the aborted state.
active → failed → aborted. After the traction starts, if it is discovered at some point that normal execution cannot continue (either due to internal program errors or external errors), it enters the failed state. It is then rolled back, after which it enters the aborted state.

Q31. Explain The Distinction Between A Type X And A Reference Type Ref(x). Under What Circumstances Would You Choose To Use A Reference Type?

If the type of an attribute is x, then in each tuple of the table, corresponding to that attribute, there is an actual object of type x . If its type is ref(x), then in each tuple, corresponding to that attribute, there is a reference to some object of type x. We choose a reference type for an attribute, if that attribute’s intended purpose is to refer to an independent object.

Q32. Explain The Notions Of Trparency And Autonomy. Why Are These Notions Desirable From A Human-factors Standpoint?

Autonomy is the amount of control a single site has over the local database. It is important because users at that site want quick and correct access to local data items. This is especially true when one considers that local data will be most frequently accessed in a database. Trparency hides the distributed nature of the database. This is important because users should not be required to know about location, replication, fragmentation or other implementation aspects of the database.

Q33. Justify The Following Statement: Concurrent Execution Of Tractions Is More Important When Data Must Be Fetched From (slow) Disk Or When Tractions Are Long, And Is Less Important When Data Is In Memo

If a traction is very long or when it fetches data from a slow disk, it takes a long time to complete. In absence of concurrency, other tractions will have to wait for longer period of time. Average response time will increase. Also when the traction is reading data from disk, CPU is idle. So resources are not properly utilized. Hence concurrent execution becomes important in this case. However, when the tractions are short or the data is available in memory, these problems do not occur.

Q34. Write An Assertion For The Bank Database To Ensure That The Assets Value For The Perry Ridge Branch Is Equal To The Sum Of All The Amounts Lent By The Perry Ridge Branch?

The assertion-name is arbitrary. We have chosen the name Perry. Note that since the assertion applies only to the Perry ridge branch we must restrict attention to only the Perry ridge tuple of the branch relation rather than writing a constraint on the entire relation.

  create assertion perry check  (not exists (select *  from branch  where branch-name = ’Perryridge’ and  assets = (select sum (amount)  from loan  where branch-name = ’Perryridge’)))

Q35. How Does Data Encryption Affect Index Schemes? In Particular, How Might It Affect Schemes That Attempt To Store Data In Sorted Order?

Note that indices must operate on the encrypted data or someone could gain access to the index to interpret the data.Otherwise, the index would have to be restricted so that only certain users could access it. To keep the data in sorted order, the index scheme would have to decrypt the data at each level in a tree. Note that hash systems would not be affected.

Q36. A Weak Entity Set Can Always Be Made Into A Strong Entity Set By Adding To Its Attributes The Primary Key Attributes Of Its Identifying Entity Set. Outline What Sort Of Redundancy Will Result If We

The primary key of a weak entity set can be inferred from its relationship with the strong entity set. If we add primary key attributes to the weak entity set, they will be present in both the entity set and the relationship set and they have to be the same. Hence there will be redundancy.

Q37. In Designing A Relational Database, Why Might We Choose A Non-bcnf Design?

BCNF is not always dependency preserving. Therefore, we may want to choose another normal form (specifically, 3NF) in order to make checking dependencies easier during updates. This would avoid joins to check dependencies and increase system performance.

Q38. We Can Convert Any Weak Entity Set To A Strong Entity Set By Simply Adding Appropriate Attributes.why, Then, Do We Have Weak Entity Sets?

We have weak entities for several reasons:

We want to avoid the data duplication and consequent possible inconsistencies caused by duplicating the key of the strong entity.
Weak entities reflect the logical structure of an entity being dependent on another entity.
Weak entities can be deleted automatically when their strong entity is deleted.
Weak entities can be stored physically with their strong entities.

Q39. An E-r Diagram Can Be Viewed As A Graph.what Do The Following Mean In Terms Of The Structure Of An Enterprise Schema?

the Graph Is Disconnected.<
the Graph Is Acyclic.

If a pair of entity sets are connected by a path in an E-R diagram, the entity sets are related, though perhaps indirectly. A disconnected graph implies that there are pairs of entity sets that are unrelated to each other. If we split the graph into connected components, we have, in effect, a separate database corresponding to each connected component.
As indicated in the wer to the previous part, a path in the graph between a pair of entity sets indicates a (possibly indirect) relationship between the two entity sets. If there is a cycle in the graph then every pair of entity sets on the cycle are related to each other in at least two distinct ways. If the E-R diagram is acyclic then there is a unique path between every pair of entity sets and, thus, a unique relationship between every pair of entity sets.

Q40. Since Every Conflict-serializable Schedule Is View Serializable, Why Do We Emphasize Conflict Serializability Rather Than View Serializability?

Most of the concurrency control protocols (protocols for ensuring that only serializable schedules are generated) used in practice are based on conflict serializability—they actually permit only a subset of conflict serializable schedules. The general form of view serializability is very expensive to test, and only a very restricted form of it is used for concurrency control.

Q41. Consider A Grid File In Which We Wish To Avoid Overflow Buckets For Performance Reasons. In Cases Where An Overflow Bucket Would Be Needed, We Instead Reorganize The Grid File. Present An Algorithm

Let us consider a two-dimensional grid array. When a bucket overflows, we can split the ranges corresponding to that row and column into two, in both the linear scales. Thus the linear scales will get one additional entry each, and the bucket is split into four buckets. The ranges should be split in such a way as to ensure that the four resultant buckets have nearly the same number of values.

There can be several other heuristics for deciding how to reorganize the ranges, and hence the linear scales and grid array.

Q42. List Four Significant Differences Between A File-processing System And A Dbms?

Some main differences between a database management system and a file-processing system are:

Both systems contain a collection of data and a set of programs which access that data. A database management system coordinates both the physical and the logical access to the data, whereas a file-processing system coordinates only the physical access.
A database management system reduces the amount of data duplication by ensuring that a physical piece of data is available to all programs authorized to have access to it,whereas data written by one program in a file-processing system may not be readable by another program.
A database management system is designed to allow flexible access to data (i.e., queries), whereas a file-processing system is designed to allow predetermined access to data (i.e., compiled programs).
A database management system is designed to coordinate multiple users accessing the same data at the same time. A file-processing systemis usually designed to allow one or more programs to access different data files at the same time. In a file-processing system, a file can be accessed by two programs concurrently only if both programs have read-only access to the file.

Q43. Explain The Distinction In Meaning Between Edges In A Dag Representing Inheritance And A Dag Representing Object Containment?

An edge from class A to class B in the DAG representing inheritance me that an object of class B is also an object of class A. It has all the properties that objects of class A have, plus additional ones of its own. In particular, it inherits all the variables and methods of class A. It can of course provide its own implementations for the inherited methods. And edge from class A to class B in the object containment DAG me that an object of class A contains an object of class B. There need not be any similarities in the properties of A and B. Neither B nor A inherit anything from the other. They function as independent types, to the extent that an object of class A can access the variables of the B object contained in it only via the B object’s methods.

Q44. Why Are Certain Functional Dependencies Called Trivial Functional Dependencies?

Certain functional dependencies are called trivial functional dependencies because they are satisfied by all relations.

Q45. Consider The Relational Database Of Figure 4.

  create view salinfo as  select manager-name, avg(salary)  from manages m, works w  where m.employee-name = w.employee-name  group by manager-name

Updates should not be allowed in this view because there is no way to determine how to change the underlying data. For example, suppose the request is “change the average salary of employees working for Smith to $200”. Should everybody who works for Smith have their salary changed to $200? Or should the first (or more, if necessary) employee found who works for Smith have their salary adjusted so that the average is $200?Neither approach really makes sense.

Q46. Suppose There Are Two Relations R And S, Such That The Foreign Key B Of R References The Primary Key A Of S. Describe How The Trigger Mechanism Can Be Used To Implement The On Delete Cascade Option, W

We define triggers for each relation whose primary-key is referred to by the foreign-key of some other relation. The trigger would be activated whenever a tuple is deleted from the referred-to relation. The action performed by the trigger would be to visit all the referring relations, and delete all the tuples in them whose foreign-key attribute value is the same as the primary-key attribute value of the deleted tuple in the referred-to relation. These set of triggers will take care of the on delete cascade operation.

Q47. What Is The Difference Between A Primary Index And A Secondary Index?

The primary index is on the field which specifies the sequential order of the file. There can be only one primary index while there can be many secondary indices.

Q48. Use Of Multiple-granularity Locking May Require More Or Fewer Locks Than An Equivalent System With A Single Lock Granularity. Provide Examples Of Both Situations, And Compare The Relative Amount Of Co

If a traction needs to access a large a set of items, multiple granularity locking requires fewer locks, whereas if only one item needs to be accessed, the single lock granularity system allows this with just one lock. Because all the desired data items are locked and unlocked together in the multiple granularity scheme, the locking overhead is low, but concurrency is also reduced.

Q49. What Is The Purpose Of Having Separate Categories For Index Authorization And Resource Authorization?

Index and resource authorization should be special categories to allow certain users to create relations (and the indices to operate on them) while preventing these time-consuming and schema-changing operations from being available to many users. Separating index and resource authorization allows a user to build an index on existing relations, say, for optimization purposes, but allows us to deny that user the right to create new relations.

Q50. Suppose That There Is A Database System That Never Fails. Is A Recovery Manager Required For This System?

Even in this case the recovery manager is needed to perform roll-back of aborted tractions.

Top 50 Database System Concepts Interview Questions You Must Prepare 19.Apr.2024