mysql – Is using strings as keys for reference to other tables bad in terms of memory usage?

Yes MySql and all other rdms will store the complete email as varchar and reserve space for the number of bytes.

Integer with up to 8 bytes for a big integer will on löy use these bytes, and are so faster when referencing.

In terms of speed you use INTEGER, and consider other like varchar(36) for uuids when the need arises, for example different servers have to save data in the same table.

whith emails a unique and so be indexed for referencing, you should take the extra mile and use Integer if you expect you will have big tables.

database design – Best practices when designing SQL DB with “redundant” tables

I have a design dilemma for a DB I’m creating for an e-commerce platform I want to develop.

I have 3 different actors interacting with the website:

  • Customer
  • Manager
  • Supplier

Those 3 actors have the same table structure: email, username, address…

My initial design for the DB was to create a single table (StoreUser), with an additional field to distinguish between the 3 different types of actors.

The issue I see with this design is that when referencing a Customer in the “Order” table for instance, to assign an Order to a Customer, it would be technically possible to assign a “Manager” or a “Supplier” to the order, even though it isn’t wanted. Same thing for the “Product” table, to which a Supplier’s foreign key should be provided, the “StoreUser” FK would not distinguish between the 3 actors.

On the other hand, creating 3 tables, containing the exact same datafields, seems really redundant, especially from the code perspective (I’m using Django for the website, and I really don’t like the idea of having 3 different classes with the same structure.

Which one seems the most logical to you? What’s the good practice here?

Normalization of database tables. Spring DATA JPA Hibernate

I created the table as stated in the hibernate Bidirectional @ManyToMany documentation. I did exactly the same as shown.

When I add a person, I get the following table:

enter image description here

If you look closely, you will notice that there is a violation of one of the main principles of normalization, that is, information redundancy. Repeats several times New York and Los Angeles

How to add a person correctly so as not to violate the normalization of the database tables, so that there is no data redundancy?

enter image description here

Person.java

    @Entity(name = "Person")
    public class Person {

        @Id
        @GeneratedValue
        private Long id;

        private String name;

        @ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE})
        private List<Address> addresses = new ArrayList<>();

        public Person() {
        }

        public Person(String name) {
            this.name = name;
        }

        // Getters and setters are omitted for brevity

        public void addAddress(Address address) {
            addresses.add( address );
            address.getOwners().add( this );
        }

        public void removeAddress(Address address) {
            addresses.remove( address );
            address.getOwners().remove( this );
        }
    }

Address.java

    @Entity(name = "Address")
    public class Address {

        @Id
        @GeneratedValue
        private Long id;

        private String street;

        @ManyToMany(mappedBy = "addresses")
        private List<Person> owners = new ArrayList<>();

        public Address() {
        }

        public Address(String street) {
            this.street = street;
        }

        // Getters and setters are omitted for brevity
    }

LifecycleController.java

    @Controller
    public class LifecycleController {

        @Autowired
        ServiceJpa serviceJpa;


        @GetMapping(value = "/savePerson")
        public String savePersonAddress () {


            Person person1 = new Person("Jack");

            Address address1 = new Address( "New York" );
            Address address2 = new Address( "Los Angeles" );

            person1.addAddress( address1 );
            person1.addAddress( address2 );

            serviceJpa.savPerson( person1 );


            return "/savePerson";
        }

    }

mysql – How to run multiple calculations from multiple tables in one query,

I have a school management system that stores student marks and generates student report

For a student to pass he she has to have ;

  1. An Average of 60% or above
  2. Get 60% or above in English Language
  3. Get at least 60 % in 5 subjects including English Language

I do have the query to calculate the best 5 subjects and comes up with an average.

But I need my query to be able to check the value of the passing subject and also does a count of the number of subjects a student has passed (inclusive of English) and presents that info in one query

SELECT student_id, round((SUM(t.mark))/5) average_mark from (
            select marks.student_id,  ROUND(AVG(mark)) as mark  from marks
                INNER JOIN teaching_loads ON teaching_loads.id=marks.teaching_load_id
                INNER JOIN subjects ON subjects.id=teaching_loads.subject_id
            where marks.student_id = "520" AND marks.assessement_id=1  
            GROUP BY subject_id
            order by (subject_id =2) desc, mark desc
            
            LIMIT 5
            
            )t ORDER BY round((SUM(t.mark))/5) DESC

How can I build a query that gets checks the value of the passing subject and also does a count of the number of subjects a student has passed (inclusive of English) and presents that info in one query

Something like

Student_id:89
passed_subjects:6
passing_subject_mark:60

In one query I want to be able to get all that data, how can I go about it,
Please help me.

Below is the schema for the databse that stores student data/marks and its related table.

Marks Table-Stores student marks

CREATE TABLE `marks` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `teacher_id` bigint(20) UNSIGNED NOT NULL,
  `student_id` bigint(20) UNSIGNED NOT NULL,
  `teaching_load_id` bigint(20) UNSIGNED NOT NULL,
  `assessement_id` bigint(20) UNSIGNED NOT NULL,
  `mark` int(11) NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Teaching Loads

CREATE TABLE `teaching_loads` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `teacher_id` bigint(20) UNSIGNED NOT NULL,
  `subject_id` bigint(20) UNSIGNED NOT NULL,
  `class_id` bigint(20) UNSIGNED NOT NULL,
  `session_id` bigint(20) UNSIGNED NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Subjects Table

CREATE TABLE `subjects` (
  `id` bigint(20) UNSIGNED NOT NULL,
  `subject_name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `subject_type` enum('core','elective','non-value','passing_subject') COLLATE utf8mb4_unicode_ci NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

SQL Fiddle that has the database schema

Are there any good guidelines around the formatting and display of amounts and numbers in UI, especially for high-density tables?

This might include the alignment of numbers in tables based on type of numerical data, and how to show multi-currency, and negative numbers on a variety of form factors.

tables – Best UI behavior for done actions in a list

I agree with @jhurley and @Stefano’s answers, but let me add a different angle:

Going back to first principles–what is the user’s context and what task are they trying to accomplish?

I’ll use my own experience to illustrate, since I don’t know who you’re designing for. Loosely ranked by frequency of use, these are my personal scenarios:

  1. I have discretionary time and am looking for a task to take on.I
    want to quickly review all my options and make a selection.

  2. I have completed a task and want to update my list.

  3. I want to review what I have completed and get a sense of my
    accomplishment, as well as trigger any additional next actions.

  4. I want to revive a task that was marked completed earlier (either
    because it was checked off accidentally, or I want to use it for a
    slightly different task).

For each of the above tasks, I want to separate DONE from not-DONE tasks.
I’ve seen several implementations of this separation:

a) spatially separate them (usually not-DONE on top, DONE tasks below)

b) check mark in front of DONE tasks (and blank checkbox in front of non-DONE tasks)

c) strikethrough the task title for DONE tasks

d) filter / toggle whether

ALMOST EVERY task manager implement (b) and (c). (e.g. Wunderlist, 2Do, Toodledo, Asana). SOME implement (a), and (d). I consider (d) a more advanced feature, and whether you do (a) depends on how important it is to preserve list order.


To answer your specific question, the only advantage I see of a label vs. strikethrough is a slight increase in readability.

The advantages of strikethrough design:

  • mimicks what people do on paper, which makes it much less ambiguous
    and learnable
  • easier to parse (since the user only needs to look at once place to determine status)
  • easier to distinguish DONE and not-DONE tasks – the strikethrough as a visual pattern is easier to pick up
  • you can reserve labels for other features, like “tags” or “categories”
  • this is what most task management apps do.

postgresql – How to get count of an object, through 3 different tables in postgres with ID’s stored in each table

I’m currently using Postgres 9.6.16.

I am currently using 3 different tables to store a hypothetical users details.

The first table, called contact, this contains:

ID, Preferred_Contact_Method

The second table, called orders, This contains:

ID, UserID, Contact_ID (the id of a row, in the contact table that relates to this order)

The Third Table, Called order_details

ID, Orders_ID (the id in the orders table that relates to this order details)

The tables contain other data as well, but for minimal reproduction, these are the columns that are relevant to this question.

I am trying to return some data so that i can generate a graph, in this hypothetical store, There’s only three ways we can contact a user: Email, SMS, or Physical Mail.

The graph is supposed to be 3 numbers, how many mails, emails, and SMS we’ve sent to the user; since in this hypothetical store whenever you purchase something you get notified of the successful shipment, these methods are 1:1 to the order_details, so if there’s 10 order_detail rows for the same user, then we sent 10 tracking numbers, and since there can be multiple order_details (each item has a different row in order_details) in an order, we can get the count by counting the total rows of order details belonging to a single user/contact, then attributing to what kind of contact method that user preferred at the time of making that order.

To represent this better: If a new user makes a new order, and orders 1 apple, 1 banana, and 1 orange. For the apple, the user set preferred tracking number delivery as SMS, for the banana, they set it to EMAIL, for the orange, they thought it would be funny to set the tracking number delivery via MAIL. Now, i want to generate a graph to this users preferred delivery method. So i’d like to query all those rows and obtain:

SMS, 1
EMAIL, 1
MAIL, 1

Here’s a SQL Fiddle link with the schema and test data: http://sqlfiddle.com/#!17/eb8c0

the response with the above dataset should look like this:

method | count
SMS,     4
EMAIL,   4
MAIL,    4

databases – Insert operation for two tables in Relational Algebra

The given schema is:

Publisher(pid, name, location)
Book(bid, title, author, page, price)
Publish(bid, pid, publish_date)

The RA question/query is:

  1. Insert new record of book named “database” published by “pearson” in
    “2010-12-11” and written by “Patterson”.

What I have tried so far is this but I am not sure if it is correct:

Given :

Publisher.name = "pearson"
Publisher.publish_date = "2010-12-11"

Also Given,

Book.title = "database"
Book.author = "Patterson"

Assuming, the entry for this book is already done in Publish table, I created a temporary relation r1.

r1 ← π bid(σ name="pearson"  ∧ publish_date="2010-12-11" (Publisher ⋈ Publish))

From r1, I can get bid of the same book.

So, now can I write the following?

Approach 1: Book ← Book U { π bid(r1), "database", "Patterson" }

OR

Approach 2: Book ← Book U { r1 X {"database", "Patterson"} }

Are the RA expressions in Approach 1 and 2 valid? Do they(or any one of them) insert data accordingly? And is it fine to ignore value for other two attributes page and price?

What would be the right Relational Algebra query for the following question? Thank you in advance for any help/suggestions.

sql server – Table scan instead of index seeks happening when where clause filters across multiple tables in join using OR

Firstly, thank you for providing the actual execution plans for both cases, that is one of the best things for us to help troubleshoot performance problems.

Secondly, the issue you’re facing is due to the difference in Cardinality between the first query and second query, which in a few words is the number of records your query might return relative to how many records are in the tables themselves, for the predicates (conditions in the JOIN, WHERE, and HAVING clauses) specified.

When SQL Server analyzes your query, its Cardinality Estimator uses statistics the server stores about the tables involved to try to make a reasonable estimate on how many rows will be returned from each table in your query. Then the execution plan is generated based on this information, as different operations are more efficient in different situations with different amounts of rows being returned.

For example, if your query results in a high Cardinality (lot of records being returned), generally an index scan is a more performant operation than an index seek because there is a higher likelihood the index scan will encounter a majority of your records sooner (contiguously) than it would’ve trying to seek out each one individually.

Sometimes the Cardinality Estimator gets confused based on the conditions in your predicates causing it to misestimate the cardinality resulting in performance issues. One way to verify you have a cardinality estimate issue is by comparing the Estimated Number of Rows to the Actual Number of Rows in the actual execution plan. If they are off by a significant amount (e.g. a magnitude or more) then likely there’s a cardinality estimate issue.

Finally, sorry to get your hopes up, but your execution plans don’t seem to be indicative of a cardinality estimate issue. It does seem to be your second execution plan is estimating the cardinality correctly, and it truly is a case where the conditions of your WHERE clause truly result in enough rows for SQL Server’s engine to think an index scan operation will be more performant here than an index seek. As you’ll notice both your Estimated Number of Rows and Actual Number of Rows in your second execution plan are now about 1.5 million rows.

That being said, even with accurate statistics and cardinality estimates sometimes the engine is just plain wrong. You can test this by using the FORCESEEK index hint which in your query’s case would be appended after the table like FROM T1617 WITH (FORCESEEK), for example.

Fair warning, index hints are only recommended for use in production code after extended testing, as when used incorrectly can lead to worse performance. But FORCESEEK is a relatively benign one when appropriately used, and can help correct some uncommon cases where the engine is wrong about which operation will be faster. Alternatively you can try re-writing the query in a more relationally efficient way, when applicable.

Chess Programming – Principal Variation and Transposition Tables

Right now I am currently trying to implement a principal variation search algorithm for my chess engine (C++) as well as a transposition table. I have looked at various sources online and have become confused on how to properly store the principal variation. I have seen some sources/implementations use a seperate array of structs that can be indexed at a certain depth to extract the pv move and I have also heard about people storing the principal variation moves within the transposition table. Therefore I have a few questions that I would like to ask to perhapse clear my mistunderstanding.

  1. Should I or should I not have a seperate array to store the principal variation and store it in the transposition table. Which is better?

  2. If I store it in the transposition table the same as other moves, how would I go about preventing that entry from getting overwritten by another move????

  3. Or am I thinking about this wrong? What then is the proper way to do this?