What must be true in RREF. 1’s on the diagonal 0’s below the diagonal 0’s above the diagonal

which of these three answers would apply (more than one can apply)
options:
1’s on the diagonal
0’s below the diagonal
0’s above the diagonal

diagonal movement to a enemy in pygame

I spent a long time trying to make the diagonal movement in the form of code but I don’t know how to write it. the code I have so far is almost complete but it is missing the pixelart that I will add later

import pygame

class game():
    def __init__(self):
        self.screen = pygame.display.set_mode((600, 600))
        self.square = pygame.Rect((250, 250), (100, 100))
        self.leftframe = pygame.Rect((40, 40), (10, 510))
        self.rightframe = pygame.Rect((550, 40), (10, 520))
        self.frameup = pygame.Rect((40, 40), (520, 10))
        self.framedown = pygame.Rect((40, 550), (510, 10))

        self.enemy = pygame.Rect((100, 100), (50, 50))
        self.enemy2 = pygame.Rect((450, 300), (50, 50))
        self.enemy3 = pygame.Rect((300, 500), (50, 50))
        self.directionenemy = 1
        self.directionememy2 = 1
        self.directionenemy3 = 1

    def play(self):
        self.clean_screen()
        self.draw_borders()
        self.hear_events()
        self.draw_square()
        self.move_and_draw_enemy()
        self.move_and_draw_enemy2()
        self.move_and_draw_enemy3()
        self.check_for_contact()
        self.check_for_contact2()
        self.check_for_contact3()
        self.update_screen()

    def move_and_draw_enemy(self):
        self.enemy.move_ip(self.directionenemy, 0)

        if self.enemy.colliderect(self.rightframe):
            self.directionenemy = -1
        
        if self.enemy.colliderect(self.leftframe):
            self.directionenemy = 1

        pygame.draw.rect(self.screen, (255, 0, 0), self.enemy)
    
    def move_and_draw_enemy2(self):
        self.enemy2.move_ip(0, self.directionememy2)

        if self.enemy2.colliderect(self.frameup):
            self.directionememy2 = 1

        if self.enemy2.colliderect(self.framedown):
            self.directionememy2 = -1
        
        pygame.draw.rect(self.screen, (10, 50, 255), self.enemy2)


    def move_and_draw_enemy3(self):

        pygame.draw.rect(self.screen, (50, 50, 50), self.enemy3)


    def check_for_contact(self):
        if self.square.colliderect(self.enemy):
            self.directionenemy = 0
        elif self.directionenemy == 0 and not self.square.colliderect(self.enemy):
            self.directionenemy = 1

    def check_for_contact2(self):
        if self.square.colliderect(self.enemy2):
            self.directionememy2 = 0
        elif self.directionememy2 == 0 and not self.square.colliderect(self.enemy2):
            self.directionememy2 = 1

    def check_for_contact3(self):
        if self.square.colliderect(self.enemy3):
            self.directionenemy3 = 0
        elif self.directionenemy3 == 0 and not self.square.colliderect(self.enemy3):
            self.directionenemy3 = 1

    def clean_screen(self):
        self.screen.fill((255, 255, 255))

    def draw_borders(self):
        pygame.draw.rect(self.screen, (72, 47, 125), self.leftframe)
        pygame.draw.rect(self.screen, (72, 47, 125), self.rightframe)
        pygame.draw.rect(self.screen, (72, 47, 125), self.framedown)
        pygame.draw.rect(self.screen, (72, 47, 125), self.frameup)

    def hear_events(self):    
        for e in pygame.event.get():
                if e.type == pygame.KEYDOWN:
                    self.move_and_draw_square(e.key)

    def move_and_draw_square(self, key):
        if key == pygame.K_LEFT:
            self.move_square(-1, 0)
            if self.square.colliderect(self.leftframe):
                self.move_square(1, 0)

        elif key == pygame.K_RIGHT:
            self.move_square(1, 0)
            if self.square.colliderect(self.rightframe):
                self.move_square(-1, 0)

        elif key == pygame.K_UP:
            self.move_square(0, -1)
            if self.square.colliderect(self.frameup):
                self.move_square(0, 1)

        elif key == pygame.K_DOWN:
            self.move_square(0, 1)
            if self.square.colliderect(self.framedown):
                self.move_square(0, -1)

        elif key == pygame.K_ESCAPE:
            pygame.quit()

    def move_square(self, x, y):
        self.square.move_ip(x * 10, y * 10)

    def draw_square(self):
        pygame.draw.rect(self.screen, (0, 204, 0), self.square)

    def update_screen(self):
        pygame.display.update()

pygame.init()
game = game()

while True:
  game.play()
```

c++ – Optimizing a diagonal matrix-vector multiplication (?diamv) kernel

For an (completely optional) assignment for an introductory course to programming with C++, I am trying to implement a diagonal matrix-vector multiplication (?diamv) kernel, i.e. mathematically
$$mathbf{y} leftarrow alphamathbf{y} + beta mathbf{M}mathbf{x}$$
for a diagonally clustered matrix $mathbf{M}$, dense vectors $mathbf{x}$ and $mathbf{y}$, and scalars $alpha$ and $beta$. I believe that I can reasonably motivate the following assumptions:

  1. The processors executing the compute threads are capable of executing the SSE4.2 instruction set extension (but not necessarily AVX2),
  2. The access scheme of the matrix $mathbf{M}$ does not affect the computation and therefore temporal cache locality between kernel calls does not need to be considered,
  3. The matrix $mathbf{M}$ does not fit in cache, is very diagonally clustered with a diagonal pattern that is known at compile time, and square,
  4. The matrix $mathbf{M}$ does not contain regularly occurring sequences in its diagonals that would allow for compression along an axis,
  5. No reordering function exists for the structure of the matrix $mathbf{M}$ that would lead to a cache-oblivious product with a lower cost than an ideal multilevel-memory optimized algorithm,
  6. The source data is aligned on an adequate boundary,
  7. OpenMP, chosen for its popularity, is available to enable shared-memory parallelism. No distributed memory parallelism is necessary as it is assumed that a domain decomposition algorithm, e.g. DP-FETI, will decompose processing to the node level due to the typical problem size.

Having done a literature review, I have come to the following conclusions on its design and implementation (this is a summary, in increasing granularity, with the extensive literature review being available upon request to save space):

  1. “In order to achieve high performance, a parallel implementation of a sparse matrix-vector multiplication must maintain scalability” per White and Sadayappan, 1997.
  2. The diagonal matrix storage scheme,
    $$DeclareMathOperator{vec}{vec}DeclareMathOperator{val}{val}
    vecleft(val{(i,j)}equiv a_{i,i+j}right)$$

    where $vec$ is the matrix vectorization operator, which obtains a vector by stacking the columns of the operand matrix on top of one another. By storing the matrix in this format, I believe the cache locality to be as optimal as possible to allow for row-wise parallelization. Checkerboard partitioning reduces to row-wise for diagonal matrices. Furthermore, this allows for source vector re-use, which is necessary unless the matrix is re-used while still in cache (Frison 2016).
  3. I believe that the aforementioned should always hold, before vectorization is even considered? The non-regular padded areas of the matrix, i.e. the top-left and bottom-right, can be handled separately without incurring extra cost in the asymptotic sense (because the matrix is diagonally clustered and very large).
  4. Because access to this matrix is linear, software prefetching should not be necessary. I have included it anyways, for code review, at the spot which I considered the most logical.

The following snippet represents my best effort, taking the aforementioned into consideration:

#include <algorithm>
#include <stdint.h>
#include <type_traits>

#include <xmmintrin.h>
#include <emmintrin.h>

#include <omp.h>

#include "tensors.hpp"


#define CEIL_INT_DIV(num, denom)        1 + ((denom - 1) / num)

#if defined(__INTEL_COMPILER)
#define AGNOSTIC_UNROLL(N)              unroll (N)
#elif defined(__CLANG__)
#define AGNOSTIC_UNROLL(N)              clang loop unroll_count(N)
#elif defined(__GNUG__)
#define AGNOSTIC_UNROLL(N)              unroll N
#else
#warning "Compiler not supported"
#endif

/* Computer-specific optimization parameters */
#define PREFETCH                        true
#define OMP_SIZE                        16
#define BLK_I                           8
#define SSE_REG_SIZE                    128
#define SSE_ALIGNMENT                   16
#define SSE_UNROLL_COEF                 3


namespace ranges = std::ranges;


/* Calculate the largest absolute value ..., TODO more elegant? */
template <typename T1, typename T2>
auto static inline largest_abs_val(T1 x, T2 y) {
    return std::abs(x) > std::abs(y) ? std::abs(x) : std::abs(y);
}


/* Define intrinsics agnostically; compiler errors thrown automatically */
namespace mm {
    /* _mm_load_px - (...) */
    inline auto load_px(float const *__p) { return _mm_load_ps(__p); };
    inline auto load_px(double const *__dp) { return _mm_load_pd(__dp); };

    /* _mm_store_px - (...) */
    inline auto store_px(float *__p, __m128 __a) { return _mm_store_ps(__p, __a); };
    inline auto store_px(double *__dp, __m128d __a) { return _mm_store_pd(__dp, __a); };

    /* _mm_set1_px - (...) */
    inline auto set_px1(float __w) { return _mm_set1_ps(__w);};
    inline auto set_px1(double __w) { return _mm_set1_pd(__w); };

    /* _mm_mul_px - (...) */
    inline auto mul_px(__m128 __a, __m128 __b) { return _mm_mul_ps(__a, __b);};
    inline auto mul_px(__m128d __a, __m128d __b) { return _mm_mul_pd(__a, __b); };
}


namespace tensors {
    template <typename T1, typename T2>
    int diamv(matrix<T1> const &M, 
              vector<T1> const &x,
              vector<T1> &y,
              vector<T2> const &d,
              T1 alpha, T1 beta) noexcept {
        /* Initializations */
        /* - Compute the size of an SSE vector */
        constexpr size_t sse_size =  SSE_REG_SIZE / (8*sizeof(T1));
        /* - Validation of arguments */
        static_assert((BLK_I >= sse_size && BLK_I % sse_size == 0), "Cache blocking is invalid");
        /* - Reinterpretation of the data as aligned */
        auto M_ = reinterpret_cast<T1 *>(__builtin_assume_aligned(M.data(), SSE_ALIGNMENT));
        auto x_ = reinterpret_cast<T1 *>(__builtin_assume_aligned(x.data(), SSE_ALIGNMENT));
        auto y_ = reinterpret_cast<T1 *>(__builtin_assume_aligned(y.data(), SSE_ALIGNMENT));
        auto d_ = reinterpret_cast<T2 *>(__builtin_assume_aligned(d.data(), SSE_ALIGNMENT));
        /* - Number of diagonals */
        auto n_diags = d.size();
        /* - Number of zeroes for padding TODO more elegant? */
        auto n_padding_zeroes = largest_abs_val(ranges::min(d), ranges::max(d));
        /* - No. of rows lower padding needs to be extended with */
        auto n_padding_ext = (y.size() - 2*n_padding_zeroes) % sse_size;
        /* - Broadcast α and β into vectors outside of the kernel loop */
        auto alpha_ = mm::set_px1(alpha);
        auto beta_ = mm::set_px1(beta);

        /* Compute y := αy + βMx in two steps */
        /* - Pre-compute the bounding areas of the two non-vectorizable and single vect. areas */
        size_t conds_begin() = {0, M.size() - (n_padding_ext+n_padding_zeroes)*n_diags};
        size_t conds_end() = {n_padding_zeroes*n_diags, M.size()};
        /* - Non-vectorizable areas (top-left and bottom-right resp.) */
#pragma AGNOSTIC_UNROLL(2)
        for (size_t NONVEC_LOOP=0; NONVEC_LOOP<2; NONVEC_LOOP++) {
            for (size_t index_M=conds_begin(NONVEC_LOOP); index_M<conds_end(NONVEC_LOOP); index_M++) {
                auto index_y = index_M / n_diags;
                auto index_x = d(index_M % n_diags) + index_y;
                if (index_x >= 0)
                    y_(index_y) = (alpha * y_(index_y)) + (beta * M_(index_M) * x_(index_x));
            }
        }
        /* - Vectorized area - (parallel) iteration over the x parallelization blocks */
#pragma omp parallel for shared (M_, x_, y_) schedule(static)
        for (size_t j_blk=conds_end(0)+1; j_blk<conds_begin(1); j_blk+=BLK_I*n_diags) {
            /* Iteration over the x cache blocks */
            for (size_t j_bare = 0; j_bare < CEIL_INT_DIV(sse_size, BLK_I); j_bare++) {
                size_t j = j_blk + (j_bare*n_diags*sse_size);
                /* Perform y = ... for this block, potentially with unrolling */
                /* *** microkernel goes here *** */
#if PREFETCH
                /* __mm_prefetch() */
#endif
            }
        }

        return 0;
    };
}
 

Some important notes:

  1. tensors.hpp is a simple header-only library that I’ve written for the occasion to act as a uniform abstraction layer to tensors of various orders (with the CRTP) having different storage schemes. It also contains aliases to e.g. vectors and dense matrices.

  2. For the microkernel, I believe there to be two possibilities

    a. Iterate linearly over the vectorized matrix within each cache block; this would amount to row-wise iteration over the matrix $mathbf{M}$ within each cache block and therefore a dot product. To the best of my knowledge, dot products are inefficient in dense matrix-vector products due to both data dependencies and how the intrinsics decompose into μops.

    b. Iterate over rows in cache blocks in the vectorized matrix, amounting to iteration over diagonals in the matrix $mathbf{M}$ within each cache block. Because of the way the matrix $mathbf{M}$ is stored, i.e. in its vectorized form, this would incur the cost of broadcasting the floating-point numbers (which, to the best of my knowledge is a complex matter) but allow rows within blocks to be performed in parallel.

    I’m afraid that I’ve missed out some other, better, options. This is the primary reason for opening this question. I’m completely stuck. Furthermore, I believe that the differences in how well the source/destination vectors are re-used are too close to call. Does anyone know how I would approach shedding more insight into this?

  3. Even if the cache hit rate is high, I’m afraid of the bottleneck shifting to e.g. inadequate instruction scheduling. Is there a way to check this in a machine-independent way other than having to rely on memory bandwidth?

  4. Is there a way to make the “ugly” non-vectorizable code more elegant?

Proofreading the above, I feel like a total amateur; all feedback is (very) much appreciated. Thank you in advance.

java – Return the maximum value of a row of a matrix without considering the diagonal coefficients

I have a the two-dimensional array :

0. | 0.0 | -0.8980387129971331 | -0.8900869793075101 | -0.8906789098245378 | 1.0104911316104093 | -0.8816392828513628
1. | -0.8998803800424156 | 0.0 | -0.8894871457733221 | -0.8897044897987794 | 1.1079409359304297 | -0.7105118305961893
2. | -0.8889556072705933 | -0.8924868056899387 | 0.0 | 1.1083728720261286 | 1.0098247893112775 | 1.099113864022297
3. | -0.8808751963282109 | 0.9280169284175466 | -0.8891630366886065 | 0.0 | -0.69121432906078 | -0.7092216479617963
4. | -0.8986589499572509 | -0.8921590617526629 | -0.8891630366344203 | -0.7057342552186525 | 0.0 | -0.7075934709028173
5. | -0.8988751964282238 | -0.8981045503211356 | -0.8891659511135326 | 1.0907466603012215 | 1.1072644730546006 | 0.0

And I want to get the maximum of a line without considering the diagonal coefficients.

I’m using the following funtion :

double maxQ(int s) {
    int()() actionsFromState = actions(s);
    double maxValue = Double.MIN_VALUE;
    for (int i = 0; i < actionsFromState.length; i++) {
        int() nextState = actionsFromState(i);
        int nexts = linearCheck(states, nextState);
        double value = Q(s)(nexts);

        if (value > maxValue)
            maxValue = value;

    }
    return maxValue;
}

But for example for the 4th line it returns me the value 0.0 while I want it to return: -0.7057342552186525.

It must return the maximum among all the values ​​except the value of the column 4.

computation models – Complexity of inverting a diagonal matrix

What is the complexity of inverting a $n times n$ diagonal matrix?

From what I learn in algebra, the inverse of a diagonal matrix is obtained by replacing each element in the diagonal with its reciprocal.

So is it correct to say that the complexity of inverting a diagonal matrix is $mathcal{O}(n)$?

dnd 5e – Is a character considered within 5 feet of another character if it is diagonal to it? In dnd 5e

Yes.

Given the wording here, I’m assuming you’re using the, technically variant, rules for playing on a grid.

The basic rule for Space, found on p. 191 of the PHB, says (emphasis mine):

A creature’s space also reflects the area it needs to fight effectively. For that reason, there’s a limit to the number of creatures that can surround another creature in combat. Assuming Medium combatants, eight creatures can fit in a 5-foot radius around another one.

When playing on a grid, 8 enemies surrounding a single person is easily represented by a 3×3 square, and would include the ‘diagonal’ spots.

programming languages – Construct a function in Haskell that returns “True” if there is a list of three equal and followed diagonal elements

I need to construct an auxiliar function in Haskell that, for example: let’s $A$ be a matrix such as:

enter image description here

returns me a boolean if there is, at least, a triple followed tens list that are diagonal, thus:

*Main> ((10,2,3,4),(5,10,10,8),(9,10,10,12),(10,14,15,16)) 

result: True

because in the matrix above we have three 10’s in main diagonal and three 10’s in the other diagonal (in this case, both diagonals are main and secondary, respectively. But it doesn’t have to be like that ) This code calculates the main diagonal of a matrix:

diagonal :: ((Int)) -> (Int)
diagonal () = ()
diagonal (x:xs) = head x : diagonal (map tail xs)

but i need to solve it for any diagonal and no necessarily for square matrix.

I also made a function to group 3 by 3, but i don’t know how to “connect” with the previous idea:

group3in3:: Int -> (a) -> ((a))
group3in3 _ () = (())
group3in3n n xs
  |n > 0 = (take n xs) : (group3in3  n (drop n xs))
  |otherwise = error "Error" 

I don’t have a lot of experience with Haskell. Any help would be appreciated.

linear algebra – Prove that if A is diagonal, than Adj(A) is diagonal as well

I’m not sure if my proof is formal enough, and I’d appreciate a review or if someone could write his/her proof of the statement. Here’s how my proof goes:

Given a diagonal matrix A ∈ Mn(F), we can say that aij, 1≤i≤n and 1≤j≤n, being an entry of A, equals to 0 if i≠ j. According to the definition of Adj(A), we can find it using the following formula:
$$(Adj(A))ij=((−1)i∗(−1)j)∗|Aji|$$
|Aji| being the determinant of the minor matrix Aji. Since only aii can differ (either be an opening entry or 0) from 0, ∀1≤i≤n, while all of the other entries are 0, achieving Aji (j≠i) requires removing column i and row j – and thus removing a column with the only potential opening entry at aii. This leaves Aji with a zeroes row where aii was removed – and as known, a square matrix with a row/column of zeroes is not convertible ⟺ |Aji| = 0. Which means:
$$(Adj(A))ij=((−1)i∗(−1)j)∗0=0|∀i≠j$$
, just as required.

keyboard – Is there a way to enforce straight mouse movement(horizontal, vertical, diagonal) in macos?

I was wondering if there is a way to enforce a straight pointer movement in MacOs.

For instance when you are annotating a pdf or so, you may wanna make a straight stroke as an underline.

So what I’m looking for is a feature like when moving your pointers with shift pressed, then the pointer movement would be restricted to a straight line(either horizontal, vertical, or diagonal). Is there such feature in macos? or other applications which makes this possible?

linear algebra – Inverse of a matrix and the inverse of its diagonal part 2

My question is very similar to this question — and in fact might be the “classic problem” references in that question.

Given a symmetric positive definite matrix $X$, I want to show that the the matrix $diag(X)^{-1} – X^{-1}$ is positive semidefinite. Or, equivalently, I want to show
$$
v^TX^{-1}v leq v^T diag(X)^{-1}v
$$

for all $v$.

It feels like it should be straight forward, but I’m not seeing it…