"Bigint" binary list of bytes in addition to the two's complement in Python

I am playing with the idea of ​​writing a "bigint" library in 6502
assembler capable of handling integers of variable length up to 255 bytes
long. To help me understand the problems of representation that I have come to
with a function in Python that takes a decimal integer and deposits
in memory (emulated) as its binary representation: as a length
byte followed bytes of value from least to most
important. (The 6502 is a little bitten.)

As a background, the M or Machine object here is the emulated processor
and RAM, with RAM as an array of unsigned integers 0-255. he
offers a deposit(addr, list) method to put values ​​in memory,
byte(addr) enter a single byte, and bytes(addr, len) at
recover multiple bytes. (It's actually an envelope around a
py65.devices.mpu6502 from py65; all the code is in my 8bitdev
repo if you are curious.)

from    testmc.m6502 import  Machine, Registers as R, Instructions as I
import  pytest

def M():
    M = Machine()
    return M

def depint(M, addr, value):
    ''' Deposit a bigint in locations starting at `addr`.
        `addr` contains the length of the following bytes,
        which hold the value from LSB to MSB.

    next = addr + 1             # Skip length byte; filled in at end

    if value >= 0:              # Positive number; fill byte by byte
        while next == addr+1 or value > 0:
            value, byte = divmod(value, 0x100)
            M.deposit(next, (byte))
            next += 1
        if byte >= 0x80:        # MSbit = 1; sign in additional byte
            M.deposit(next, (0x00))
            next += 1

    else:                       # Negative: fill with two's complement values
        value = abs(value+1)    # two's complement = -(n+1)
        while next == addr+1 or value > 0:
            value, byte = divmod(value, 0x100)
            byte = 0xFF - byte  # two's complement
            M.deposit(next, (byte))
            next += 1
        if byte < 0x80:         # MSbit = 0; sign in additional byte
            M.deposit(next, (0xFF))
            next += 1

    #   Store the length
    M.deposit(addr, (next - (addr + 1)))

@pytest.mark.parametrize('value, bytes', (
    (0,             (0x00)),
    (1,             (0x01)),
    (127,           (0x7F)),
    (128,           (0x80, 0x00)),
    (255,           (0xFF, 0x00)),
    (256,           (0x00, 0x01)),
    (0x40123456,    (0x56, 0x34, 0x12, 0x40)),
    (0xC0123456,    (0x56, 0x34, 0x12, 0xC0, 0x00)),
    (-1,            (0xFF)),
    (-128,          (0x80)),
    (-129,          (0x7F, 0xFF)),
    (-255,          (0x01, 0xFF)),
    (-256,          (0x00, 0xFF)),
    (-257,          (0xFF, 0xFE)),
    (0-0x40123456,  (0xFF-0x56+1, 0xFF-0x34, 0xFF-0x12, 0xFF-0x40)),
    (0-0xC0123456,  (0xFF-0x56+1, 0xFF-0x34, 0xFF-0x12, 0xFF-0xC0, 0xFF)),
def test_depint(M, value, bytes):
    print('DEPOSIT', value, 'expecting', bytes)
    addr = 30000                    # arbitrary location for deposit
    size = len(bytes) + 2           # length byte + value + guard byte
    M.deposit(addr, (222) * size)   # 222 ensures any 0s really were written
    depint(M, addr, value)
    bvalue = M.bytes(addr+1, len(bytes))
    assert (len(bytes),   bytes,  222) 
        == (M.byte(addr), bvalue, M.byte(addr+size-1))

    #   Test against Python's conversion
    assert list(value.to_bytes(len(bytes), 'little', signed=True)) 
        == bvalue

Things you might consider when reviewing:

  • Is this really correct? Do the tests provide sufficient coverage?
    (FWIW, in this particular implementation, I do not feel the need to
    test the overflow because all the input values ​​would be entirely under my
  • Is there a clearer way to describe the tests?
  • the next == addr+1 state in the while the curls it's a little
    embarrassing; Is there a better way to handle this? The obvious thing
    would be to use a do loop (the first time I remember wanting a
    in one long time), but Python does not have them.
  • Maybe it would make more sense to just use Python
    to_bytes() manage the conversion. But it would take a good way
    manage the length.
  • Regarding how useful this is, I figured I'll probably
    want a reading routine (in 6502 assembler) to do the same thing in ASCII
    decimal → bigint conversion, which would mean that I could just do a unit test
    using Python to_bytes() and use this routine in unit tests
    to add, multiply, etc. routines. The only problem could be that
    could be much slower in 6502 emulation than using the native Python
    library would be.

Your article on 22 websites for binary and Forex options online since 2012 at $ 30

    Your Article on 22 Websites for Binary Options and Forex Online Since 2012

    I have 22 websites for forex and binary options traffic. My sites are online since 2012!

    on my websites, I can add your article for $ 30 / article / website. Your article will be for life on my sites!

    if you want your article on 2, 3, …, or even on all of my 22 sites, you have to buy several works.

    I will add your article on BLOG section and the article page will be pointed to the homepage under your keywords in the right section of each website on Recent Reviews

    YOU MUST SEND THE ITEM. I only work with your article. I will not add the same article on other sites. You must send me different items

    I have discounts for 2 or more items on a single site

    Here is the list of my websites. You can choose from each website or website of your choice:
























    What is the fastest way to parse variable-recording binary files in Python?


    I am a professional electrical engineer, but my job involves a lot of overlap in software engineering. I've recently been working on a Python parser for a binary format having a variable log format. The format is STDF V4, a binary format created by Teradyne for storing semiconductor test data. My goal was to analyze this data in a tabular format for data analysis and visualization (ideally in CSV format).

    Some libraries already exist in Python to parse this format, but they are slow enough for all files larger than 100 MB. Specifically, I used pySTDF. Rather than continuing to play with this rather complicated format, I decided to do a thorough analysis of the general methods of analyzing binary files in Python.

    This post will be divided into what the research has said so far, an experiment I've conducted on the analysis of binary files and the answers that I hope that the community will be able to provide me.

    Pre-experimental research

    I found two great publications that seem to directly address the root of my problem.

    An article on the code review part of the stack exchange describes some methods for speeding up binary file reads in Python3. When I dig into the source code of pySTDF, this article begins to show why this library is so slow. Although the library is very organized and of a "pythonic" nature, there are many classes and nested function calls that slow down the analysis.

    Another good article here deals with the input / output performance of Python files at a higher level. This article also helps to see things more clearly. Python does not necessarily struggle with reading binary data from disk slowly, but when you browse millions of times, every little function call, operation, etc. begins to hurt you a lot.

    Keeping this in mind, I decided to try out a variety of general binary binaries in Python and compare them in terms of performance.

    L & # 39; s experience

    I decided to compare the performance of 3 different methods of analyzing binary files for 3 levels of different complexity.

    The methods are:

    For the sake of brevity, I will not go into the details of how the parser works (including mine), but please see the github link here for full source code and see the links above for information. on these analyzers.

    File generation

    To generate test files, I used the Construct library to generate binaries of my own format.

    I first wanted a simple binary file with a consistent recording length and format:

    # Simple tabular binary record with fixed fields
    data_simple_fmt = Struct(
        "field1" / Int16ul,
        "field2" / Int8ul,
        "field3" / Float32l,
        "field4" / Int32sl

    Format diagram:

    Simple data diagram

    Then, I looked for a format in which the number of fields was constant, but in which some fields had a variable length:

    # Simple tabular binary record with 2 varying string fields, a record length header is included
    data_varying_fmt = Struct(
        "record_len" / Int16ul,
        "field1" / Int16ul,
        "field2" / Int8ul,
        "field3" / Int16ul,
        "field4" / PascalString(Int8ul, "utf8"),
        "field5" / PascalString(Int8ul, "utf8"),
        "field6" / Float32l

    Format diagram:
    Variable data diagram

    Finally, in the spirit of the STDF, I wanted a format in which there could be different types of registration depending on the registration header:

    # Complex binary record with multiple tabular record types that have varying fields indicated by record header
    table1_fmt = Struct(
        "table1_field1" / Int16ul,
        "table1_field2" / Int8ul,
        "table1_field3" / PascalString(Int8ul, "utf8"),
        "table1_field4" / Float32l
    table2_fmt = Struct(
        "table2_field1" / PascalString(Int8ul, "utf8"),
        "table2_field2" / PascalString(Int8ul, "utf8"),
        "table2_field3" / PascalString(Int8ul, "utf8")
    table3_fmt = Struct(
        "table3_field1" / Int16ul,
        "table3_field2" / Int32ul,
        "table3_field3" / Int32ul,
        "table3_field4" / Int32ul,
        "table3_field5" / Int32ul,
        "table3_field6" / Int32ul,
        "table3_field7" / Int32ul,
        "table3_field8" / Int32ul,
        "table3_field9" / PascalString(Int8ul, "utf8"),
        "table3_field10" / Float32l,
        "table3_field10" / Float32l
    data_complex_fmt = Struct(
        "record_len" / Int16ul,
        "record_typ" / Int8ul,
                   1: Embedded(table1_fmt),
                   2: Embedded(table2_fmt),
                   3: Embedded(table3_fmt)

    Format diagram:
    Complex data diagram

    From there, I coded a rather awkward script to generate 100 KB, 100 MB and 1 GB files of each format with random data:

    import os
    import random
    import string
    from construct_format_defs import data_simple_fmt, data_varying_fmt, data_complex_fmt
    # generation switches
    gen_simple = False
    gen_varying = True
    gen_complex = False
    # output directory
    output_dir = "./test_binary_files"
    # file names
    data_simple_fn = "data_simple"
    data_varying_fn = "data_varying"
    data_complex_fn = "data_complex"
    # desired file sizes
    file_size_small = 100*1024 # 100 kib
    file_size_medium = 100*1024**2 # 100 mib
    file_size_big = 1024**3 # 1 gib
    for fname in (data_simple_fn, data_varying_fn, data_complex_fn):
        if (fname in data_simple_fn and gen_simple) or (fname in data_varying_fn and gen_varying) or 
                (fname in data_complex_fn and gen_complex):
            # build each file type
            with open(os.path.join(output_dir, fname + "_small.bin"), "wb") as f_small, 
                 open(os.path.join(output_dir, fname + "_medium.bin"), "wb") as f_medium, 
                 open(os.path.join(output_dir, fname + "_large.bin"), "wb") as f_large:
                # initialize file size in bytes
                file_size = 0
                while file_size < file_size_big:
                    # Create randomized dictionary
                    if fname in data_simple_fn:
                        d_write = {
                            "field1": random.randint(0, 2**16-1),
                            "field2": random.randint(0, 2**8-1),
                            "field3": random.randint(0, 25 * 10000) / 10000,
                            "field4": random.randint(-(2**31), 2**31-1)
                        # write byte stream to files
                        bytes_data = data_simple_fmt.build(d_write)
                    elif fname in data_varying_fn:
                        d_write = {
                            "field1": random.randint(0, 2**16-1),
                            "field2": random.randint(0, 2**8-1),
                            "field3": random.randint(0, 2**16-1),
                            "field4": ''.join(str(chr(random.choice(range(0, 128)))) for x in range(random.randint(1, 20))),
                            "field5": ''.join(random.choice(string.ascii_letters) for x in range(random.randint(1, 20))),
                            "field6":  random.randint(-(2**31), 2**31-1)
                        # calculate what record length is with varying fields
                        d_write("record_len") = 9 + len(d_write("field4")) + len(d_write("field5")) + 2
                        # write byte stream to files
                        bytes_data = data_varying_fmt.build(d_write)
                        d_write = {
                            "record_typ": random.randint(1, 3)
                        d_sub_write = {}
                        if d_write("record_typ") == 1:
                            d_sub_write("table1_field1") = random.randint(0, 2**16-1)
                            d_sub_write("table1_field2") = random.randint(0, 2**8-1)
                            d_sub_write("table1_field3") = ''.join(
                                str(chr(random.choice(range(0, 128)))) for x in range(random.randint(1, 20)))
                            d_sub_write("table1_field4") = random.randint(-(2**31), 2**31-1)
                            d_write("record_len") = len(d_sub_write("table1_field3")) + 7 + 1
                        elif d_write("record_typ") == 2:
                            d_sub_write("table2_field1") = ''.join(
                                random.choice(string.ascii_letters) for x in range(random.randint(1, 20)))
                            d_sub_write("table2_field2") = ''.join(
                                str(chr(random.choice(range(0, 128)))) for x in range(random.randint(1, 20)))
                            d_sub_write("table2_field3") = ''.join(
                                str(chr(random.choice(range(0, 128)))) for x in range(random.randint(1, 20)))
                            d_write("record_len") = len(d_sub_write("table2_field1")) + len(d_sub_write("table2_field2")) + len(
                                d_sub_write("table2_field3")) + 3
                            d_sub_write("table3_field1") = random.randint(0, 2**16-1)
                            d_sub_write("table3_field2") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field3") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field4") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field5") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field6") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field7") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field8") = random.randint(0, 2**32-1)
                            d_sub_write("table3_field9") = ''.join(
                                random.choice(string.ascii_letters) for x in range(random.randint(1, 20)))
                            d_sub_write("table3_field10") = random.randint(-(2**31), 2**31-1)
                            d_sub_write("table3_field11") = random.randint(-(2**31), 2**31-1)
                            d_write("record_len") = 38 + len(d_sub_write("table3_field9")) + 1
                        d_write("sub_table") = d_sub_write
                        # write byte stream to files
                        bytes_data = data_complex_fmt.build(d_write)
                    # gate writing if we have surpassed desired file size
                    if file_size < file_size_small:
                    if file_size < file_size_medium:
                    # add bytes data to determine file size
                    file_size += len(bytes_data)

    File analysis

    Then I coded a test suite to time the 3 different parsers. To keep things simple, the goal was to analyze each of the 3 file formats in memory. I decided to skip the 1GB file for now because of the time required.

    The complete test suite:

    import os
    import time
    import mmap
    import struct
    from construct import GreedyRange
    from construct_format_defs import data_simple_fmt, data_varying_fmt, data_complex_fmt
    from kaitai_export_simple import DataSimple
    from kaitai_export_varying import DataVarying
    from kaitai_export_complex import DataComplex
    # test directory
    test_dir = "./test_binary_files"
    # file names
    data_simple_fn = "data_simple"
    data_varying_fn = "data_varying"
    data_complex_fn = "data_complex"
    # sizes
    small_f = "_small.bin"
    med_f = "_medium.bin"
    big_f = "_large.bin"
    def parse_simple(bin_fh):
        # Can also use: Array.array or numpy from file read since this is fixed format
        # precompile struct object
        structobj = struct.Struct("

    Performance Analysis

    The results in ascending order of time (seconds):

    • homebrew:
      • small size (100 KB):
        • simple format: 0.004
        • variable format: 0.788
        • complex format: 0.3748
      • average size (100 MB):
        • simple format: 3,706
        • variable format: 5.626
        • complex format: 5.965
    • Kaitai
      • small size (100 KB):
        • simple format: 0.606
        • variable format: 0.028
        • complex format: 0.030
      • average size (100 MB):
        • simple format: 40.580
        • variable format: 43.815
        • complex format: 46.839
    • Construction:
      • small size (100 KB):
        • simple format: 0.252
        • variable format: 2.518
        • complex format: 1.343
      • average size (100 MB):
        • simple format: 231.170
        • variable format: 107.381
        • complex format: 136.934

    I realized too late that creating constant file sizes with various formats may not have been the best way to test. You can see that Construct really struggles with the most basic format because it has by far the largest number of records, and therefore the largest number of iteration operations.

    Conclusions and questions

    From my experiences, I concluded that there is currently no generic binary file analysis library capable of beating pure Python coding from a speed point of view . The only exception I will give is that in the case of binary files formatted for the fixed record, as in the case of "simple format", numpy has a method documented here that allows to directly import a binary file in a table as long as the fields are of fixed length. This method is extremely fast because I believe it is based on C.

    This brings me to my questions for the software engineering community:

    • Am I missing something obvious or are my Python examples the fastest I can get?
    • Is there a domain to replace C-coded functions that I can call from Python for considerable acceleration?
    • Is there a Python library based on the c I just discovered that can analyze non-uniform binaries very quickly?

    You may say that "well ~ 6 seconds, this is not so bad for your complex example", however, when I applied this same methodology to code as little as possible an STDF parser, it I had about 25 seconds left for a 100 MB file which was not acceptable for my use case. Extremely interested to hear the comments of the community about it. At the end of the day, I may just need to change language as a colleague and I could do the same job in about 1 second using Golang.

    Github for the project

    Please see my Github here for all the code described in this project.

    performance – reasonably fast Python decoding of binary data (AIS)

    I am currently working on an AIS message decoder written in pure Python. It's just a small project to learn things about binary data, decoding, and so on. Because I have only been programming for about a year, I am not quite sure if my approaches are reasonable. The complete source code can be found here: https://github.com/M0r13n/pyais

    Without going into too much detail, my question is: what is the reasonably fast way to decode binary content in pure Python?

    Suppose I have a sequence of bits of a length of 168 bits and that this bit sequence contains encoded data. The data may not be a multiple of 2 and therefore will not fit into conventional data structures such as bytes.

    I have tried three approaches:

    1: Store the bin sequence as a normal string and convert each substring to an int individually:

    bit_str = '000001000101011110010111000110100110000000100000000000000001010111001111101011010110110000010101101000100010000010011001100101001111111111110110000010001000100010001110'
    d = {
        'type': int(bit_vector(0:6), 2),
        'repeat': int(bit_vector(6:8), 2),
        'mmsi': int(bit_vector(8:38), 2),
        'ais_version': int(bit_vector(38:40), 2),
        'imo': int(bit_vector(40:70), 2),
        'callsign': ascii6(bit_vector(70:112)), # custom method of mine, ignore for now

    2: Use BitArray and slice:

    b = BitArray(bin=bit_vector)
    # access each piece of data like so
    type_ = b(0:6).int

    3: Using the bitarray module:

    -> The Bitarray module does not have a good way to convert individual parts into integers, so I dropped it.

    Approach 1 (my current) decodes messages # 8000 in 1.132967184 seconds
    and the second takes about 3 seconds.

    Overall, I'm pretty happy with my first idea, but I feel like I'm missing something.
    My main concern is readability, but the code should not be too slow. In C I would have used structs, the module ctypes worth it to be taken into account?

    I would like to receive comments and advice from you! 🙂


    Balanced epsilon binary search tree height

    Take any path from the tree, starting from the root, and consider the number of nodes in the rooted subtree at each vertex of the path. For the root, it is $ n $ nodes. For the second summit, it is at most $ epsilon n $ nodes. For the third summit, it is at most $ epsilon ^ 2 n $ nodes. For the $ t $& # 39; e top, it's at most $ epsilon ^ {t-1} n $ nodes. If the path has the length $ ell $ (edges), the last node contains at most $ epsilon ^ ell n $ knots, etc. $ epsilon ^ ell n geq 1 $, or equivalent, $ ell leq log_ {1 / epsilon} n $.

    computer science – the algorithm solves the decision problem: does this binary string (yy) have an exact factor of $ M $?

    Forgive me, I have developed a growing interest. I just started working on my first research project as a hobby. So please be courteous to one of my mistakes and I will fix them for this stack.

    Decision problem

    For a binary string equal to the input value (integer) in terms of length.
    Does this binary string $ (yy) $ has an exact factor for $ M $?

    replication link of the work algorithm


    $ M $ is basically a divisor. If there is no rest, then the answer is yes. If there is a rest, the answer is no.

    In this case, the decision problem requires an integer so that it can create a bit string equal to the input value in terms of length. This is equivalent to counting. So, my algorithm will always run in $ O (2 ^ n) $ time.


    It was said that it is impossible to convert binary files into lean files in polynomial time.

    If this assumption is true, would not my problem be just $ NP $?

    samsung – "Custom binary blocked by the FRP lock"

    this afternoon, while I was in college before the start of the conference, I turned off my cellphone as usual and, at the end of the lecture, I I tried to light it to congratulate another student who had graduated today, but surprisingly at the start screen with the logo I can read this green text in the corner upper left that says "custom binary blocked by FRP lock" and it turns off immediately.

    I've tried searching for solutions on the web and all the possible tutorials require that I go into recovery mode (impossible for me because the error displays before the time of pressure of the keys required) or that I connect my cellphone to my PC, but again For me it is impossible to do this because my computer does not recognize the cellphone, which gives me the following error: "the request descriptor failed "(as with another computer I've tried) and no, I do not use USB only the same cable I use to play with my PS4 controller.

    So, basically, I am cut off from my mobile phone. Is there a possible solution or is it just garbage now?

    If she could use (do not think so) it's a Samsung J3 2016 I think.

    Thank you in advance.

    EDIT: I just asked my brother who had my cell phone yesterday and he said that he was disabling the USB debugging setting. I'm screwed, right?

    Discrete Mathematics – Help Me Understand Hexadecimal to Binary Conversion

    Thank you for your contribution to Mathematics Stack Exchange!

    • Please make sure to respond to the question. Provide details and share your research!

    But to avoid

    • Ask for help, clarification, or answer other answers.
    • Make statements based on the opinion; save them with references or personal experience.

    Use MathJax to format equations. MathJax reference.

    To learn more, read our tips for writing good answers.

    binary data – How to make a graghyne or butterfly chart in R?

    I am looking for a real tornado graph showing the frequency of yes and no in each category.

    I've meticulously scoured / reddit, R documentation and YouTube video to get a chart that indicates the frequency of yes / no answers by category and I've reversed the coordinates so that the bars will show up. extend horizontally. This is usually enough for most people looking for a tornado graph. However, since I'm dealing with binary data, yes / no. The bars are simply stacked flush with the axis of the left hand and still do not produce a tornado graph.

    Due to some upcoming reports, I was hoping to get good information and lead me to make real tornado graphics, but with this simple set of 150 data observation, all I'm going on to produce is a bar graph.

    ggplot (data.frame, aes (x = GlobalRegion, y = No_Elected, fill = No_Elected, label = "")) + geom_bar (stat = "identity") + geom_text (size = 3, position = position_stack (vjust = 0.5) ) + coord_flip ()

    This produces an excellent histogram showing a single horizontal bar for each category, with decomposition of yes and no.

    Now, I just need to do it outside the axis and center each break, a midline / single point to create a tornado graph.

    python – Flatten the binary tree to the linked list

    I'm trying to flatten a binary tree in a linked list.

    I have a correct work solution

    # Definition for a binary tree node.
    # class TreeNode:
    #     def __init__(self, x):
    #         self.val = x
    #         self.left = None
    #         self.right = None
    def insert(root, node):
        if root.right is None:
            root.right = node 
            return None 
            insert(root.right, node)
    class Solution:
        def flatten(self, root: TreeNode) -> None:
            Do not return anything, modify root in-place instead.
            if root is None:
                return None 
            if root.left is None and root.right is None:
                return root 
            left = self.flatten(root.left)
            right = self.flatten(root.right)
            root.left = None
            if left:
                root.right = left 
                insert(root.right, right)
                root.right = right
            return root

    I'm not sure, but I think the time complexity is O (n ^ 2). How can I make it run faster? Specifically, how can I modify it so that it works in optimal time complexity (linear time)?