probability or statistics – Fitting data to a distribution

If I have the following data:

data={{0, 7.83}, {0, 8.55}, {0, 9.04}, {0, 9.41}, {0, 10.23}, {0, 
  10.51}, {100, 7.83}, {100, 8.55}, {100, 9.04}, {100, 9.41}, {100, 
  10.23}, {100, 10.51}, {33, 21.61}, {33, 22.86}, {33, 23.16}, {33, 
  23.22}, {33, 23.34}, {33, 23.41}, {10, 18.71}, {10, 19.4}, {10, 
  20.}, {10, 20.56}, {10, 20.6}, {10, 20.99}, {45, 15.53}, {45, 
  16.72}, {45, 16.94}, {45, 17.19}, {45, 17.34}, {45, 17.87}, {88, 
  9.96}, {88, 10.49}, {88, 11.02}, {88, 11.12}, {88, 11.82}, {88, 
  11.93}, {69, 11.04}, {69, 12.04}, {69, 12.43}, {69, 12.89}, {69, 
  13.52}, {69, 13.94}}

Which plotted looks like:

enter image description here

How can I fit it to a distribution? and what kind of distribution fits this data?. I am thinking that perhaps a skew normal distribution can describe this data.

Python code that has some divergences in regards to logger, printing statistics, writing of files, file paths and docstring

I think I have corrected most of the divergences but I am still concerned about docstring especially. For example, Docstrings should always be placed directly under the function specification, within its body. Have I used docstrings in an appropriate way or? If you still can find any errors regarding file paths, logger, printing statitics, writing of files, my comments and docstring, please let me know.

Code & Output

All file paths needs to be based upon variable RESOURCES and in order of providing platform independence all path’s should be constructed using pathlib. I need to change value of RESOURCES:

RESOURCES = Path(__file__).parent / '../_Resources/'

Logger

Name of logger needs to be ass_3_logger. Check

Printing Statistics

Duration headers need to be aligned with duration values, in a table column fashion, and name of Fibonacci approaches needs to be formatted in conformity to specified requirements:

---------------------------------------------------------------------------
              DURATION FOR EACH APPROACH WITHIN INTERVAL: 30-0              
---------------------------------------------------------------------------
                       Seconds   Milliseconds   Microseconds    Nanoseconds
Fib Iteration          0.00047        0.47289          472.9         472892
Fib Recursion          0.74339      743.39246       743392.5      743392458
Fib Memory             0.00052        0.52040          520.4         520404

I think I have corrected this divergence, not a 100% sure though.

Writing Files

Works as intended and mostly meets stated requirements. However, the name of produced files does not conform to stated requirements.

The implementation would perhaps become cleaner using zip() which lets us traverse two containers at the same time:

with open(file_path, 'w') as file:
    for idx, value in zip(range(len(details(1)) - 1, -1, -1), details(1)):
        file.write("{}: {}n".format(idx, value))

Complement

  • Attend divergence in regards to logger!
  • Attend divergences in regards to printing statistics!
  • Attend divergence in regards to writing of files!
  • Make sure all file paths are constructed properly using
    pathlib.Path().
  • I have to add more comments which describes my implementations, and make sure
    I also place docstrings within the body of respective function!

Complete solution:

#!/usr/bin/env python

""" LAB ASSIGNMENT 3
Below you find the inherent code, some of which fully defined. You add implementation
for those functions which are needed:

 - create_logger()
 - measurements_decorator(..)
 - fibonacci_memory(..)
 - print_statistics(..)
 - write_to_file(..)
"""

from pathlib import Path
from timeit import default_timer as timer
from functools import wraps
import argparse
import logging
import logging.config
import json
import codecs
import time

__version__ = '1.0'
__desc__ = "Program used for measuring execution time of various Fibonacci implementations!"

LINE = 'n' + ("---------------" * 5)
RESOURCES = Path.cwd() / "../_Resources/"
LOGGER = None  # declared at module level, will be defined from main()


def create_logger() -> logging.Logger:
    """Create and return logger object.
    Purpose: This method creates object for logger and return the object
    :param : None
    :return : Logger object."""

    logging.basicConfig()
    logger = logging.getLogger()

    # Load the configuration.
    config_file = str(RESOURCES) + "/ass3_log_conf.json"
    with codecs.open(config_file, "r", encoding="utf-8") as fd:
        config = json.load(fd)

    # Set up proper logging. This one disables the previously configured loggers.
    logging.config.dictConfig(config)
    logger = logging.getLogger('ass_3_logger')
    return logger


def measurements_decorator(func):
    """Function decorator, used for time measurements.
    Purpose: This is a decorator which is used for measurement
    of the functions execution and printing logs
    :param : func
    :return : tuple(float,dictionary)."""

    @wraps(func)
    def wrapper(nth_nmb: int) -> tuple:
        result = {}
        k = 5
        ts = time.time()
        LOGGER.info("Starting measurements...")
        for i in reversed(range(nth_nmb + 1)):
            result(i) = func(i)
            if k == 5:
                LOGGER.debug(str(i) + " : " + str(result(i)))
                k = 0
            k += 1
        te = time.time()
        return (te - ts), result

    return wrapper


@measurements_decorator
def fibonacci_iterative(nth_nmb: int) -> int:
    """An iterative approach to find Fibonacci sequence value.
    YOU MAY NOT MODIFY ANYTHING IN THIS FUNCTION!!  """

    """Purpose: This is function to calculate fibonacci series using iterative approach
    :param : int
    :return : int."""

    old, new = 0, 1
    if nth_nmb in (0, 1):
        return nth_nmb
    for __ in range(nth_nmb - 1):
        old, new = new, old + new
    return new


@measurements_decorator
def fibonacci_recursive(nth_nmb: int) -> int:
    """An recursive approach to find Fibonacci sequence value.
    YOU MAY NOT MODIFY ANYTHING IN THIS FUNCTION!!"""
    """Purpose: This is function to calculate fibonacci recursion using iterative approach 
    :param : int
    :return : int."""

    def fib(_n):
        return _n if _n <= 1 else fib(_n - 1) + fib(_n - 2)

    return fib(nth_nmb)


@measurements_decorator
def fibonacci_memory(nth_nmb: int) -> int:
    """An recursive approach to find Fibonacci sequence value, storing those already calculated.
    Purpose: This is function to calculate fibonacci series using memory approach
    :param : int
    :return : int."""

    memory_dict = {0: 0, 1: 1}

    def fib(_n):
        if _n not in memory_dict:
            memory_dict(_n) = fib(_n - 1) + fib(_n - 2)
        return memory_dict(_n)

    return fib(nth_nmb)


def duration_format(duration: float, precision: str) -> str:
    """Function to convert number into string. Switcher is dictionary type here.
        Purpose: This is a function to proper format for duration
        :param: float, str
        :return : str."""
    switcher = {
        'Seconds': "{:.5f}".format(duration),
        'Milliseconds': "{:.5f}".format(duration * 1_000),
        'Microseconds': "{:.1f}".format(duration * 1_000_000),
        'Nanoseconds': "{:d}".format(int(duration * 1_000_000_000))
    }

    # get() method of dictionary data type returns value of passed argument if it is present in
    # dictionary otherwise second argument will be assigned as default value of passed argument
    return switcher.get(precision, "nothing")


# purpose of this function is to display the statics
def print_statistics(fib_details: dict, nth_value: int):
    """Function which handles printing to console."""
    print(LINE)
    print("nt  DURATION FOR EACH APPROACH WITHIN INTERVAL: " + str(nth_value) + "-0")
    print(LINE)
    print("{0}tttt {1:<7}t {2:<7}t {3:<7}t {4:<7}".format("", "Seconds", "Milliseconds", "Microseconds",
                                                             "Nanoseconds"))
    for function in fib_details:
        print("{0}t {1:<7}t {2:<13}t {3:<14}t {4:<7}".format(function,
                                                               duration_format(fib_details(function)(0), "Seconds"),
                                                               duration_format(fib_details(function)(0),
                                                                               "Milliseconds"),
                                                               duration_format(fib_details(function)(0),
                                                                               "Microseconds"),
                                                               duration_format(fib_details(function)(0),
                                                                               "Nanoseconds")))


# purpose of this function is to write results into file.
def write_to_file(fib_details: dict):
    """Function to write information to file."""
    for function in fib_details:
        with open(str(RESOURCES) + "//" + function + ".txt", "w") as file:
            for idx, value in zip(range(len(fib_details(function)(1)) - 1, -1, -1), fib_details(function)(1).values()):
                file.write("{}: {}n".format(idx, value))


def main():
    """The main program execution. YOU MAY NOT MODIFY ANYTHING IN THIS FUNCTION!!"""
    epilog = "DT179G Assignment 3 v" + __version__
    parser = argparse.ArgumentParser(description=__desc__, epilog=epilog, add_help=True)
    parser.add_argument('nth', metavar='nth', type=int, nargs='?', default=30,
                        help="nth Fibonacci sequence to find.")

    global LOGGER  # ignore warnings raised from linters, such as PyLint!
    LOGGER = create_logger()

    args = parser.parse_args()
    nth_value = args.nth  # nth value to sequence. Will fallback on default value!

    fib_details = {  # store measurement information in a dictionary
        'fib iteration': fibonacci_iterative(nth_value),
        'fib recursion': fibonacci_recursive(nth_value),
        'fib memory   ': fibonacci_memory(nth_value)
    }

    print_statistics(fib_details, nth_value)  # print information in console
    write_to_file(fib_details)  # write data files


if __name__ == "__main__":
    main()

probability or statistics – Problem with parameters from fitting

I am trying to fit the following data (called TflogqIND):

TflogqIND={{0., 36.4886}, {Log(2)/Log(10), 37.1485}, {Log(5)/Log(10), 
  38.3859}, {1, 39.2263}, {Log(20)/Log(10), 39.8772}, {Log(30)/
  Log(10), 40.0107}}

to the following equation:

 eqn = ((log10q - Log10(qref)) == 
       c1*(Tfp - Tfpref)/(c2 + (Tfp - Tfpref))); (*WLF equation*)
    model = Tfp /. Solve(eqn, Tfp)((1)) // FullSimplify;
    constIND = {Tfpref -> 39.2263, 
       qref -> 10};
    modelIND = model /. (constIND // Rationalize) // FullSimplify;
    
    nlmIND = NonlinearModelFit(
       TflogqIND, {modelIND, c1 > 5, c2 > 5}, {c1, c2}, log10q); 

As you can see from plotting:

Show( ListPlot(TflogqIND, PlotMarkers -> Style((FilledSquare), 18, Red), Frame -> True, Axes -> False, FrameStyle -> 16, LabelStyle -> {Black, Bold, 10}, ImageSize -> Large, GridLines -> Automatic, GridLinesStyle -> Lighter(Gray, .8), PlotRange -> All), Plot(nlmIND(log10q), {log10q, 0, 1.4}, PlotStyle -> {Red, Dashed}))

The model seems to describe the data well. However, the parameters (obtained from nlmIND("BestFitParameters") with {c1 -> 1331.87, c2 -> 3520.53}) don’t make any sense. Usually c1 and c2 parameters are from 1 to around 100 or 200 at the most.

I think there must be a mistake somewhere but I cannot see it. How can I fix the model to give better c1 and c2 values and better describe the data using that equation?

statistics – When aiming to roll for a 50/50, does the die size matter?

I noticed how D&D 5e’s Hexblade Warlock subclass feature Armor of Hexes imposes a chance to miss regardless of the attacker’s roll. That chance is based on a d6: if it’s a 4 or higher it misses, and anything else it hits if the attack should have hit. To my understanding, this is simply a 50/50 roll on the d6 (success on 4, 5, 6, failure on 1, 2 and 3).

Out of curiousity, does it matter if the dice is a d4, d8, or even d100, as long as it’s an even-sided die and that it’s still 50/50? (On a d4 it would be a success on a 3 and 4, on a d10 it’s 6 and up, and so on.)

statistics – Find principal components of $(X_1,X_2,X_1+X_2)$

I need to find the principal components of $(X_1,X_2,X_1+X_2)$ and the proportion of variance the first principal component explain. No information about the distribution or independence of the $X_i$ is given.
If we denote $v_1= Var(X_1)$, $v_2=Var(X_2)$ and $c=Cov(X_1,X_2)$, then the variance-covariance matrix is given by
begin{pmatrix}
v_1&c& v_1+c\ c & v_2 & v_2 +c \ v_1+c & v_2 +c & v_1+v_2+2c
end{pmatrix}

Calculating the eigenvalues and eigenvectors of this matrix takes some effort. Is there a smarter way to calculate the principal components?
Thanks in advance

statistics and probability hw question

An experiment is run in the following manner.The colors red, yellow, and blue are each flashed on a screen for a short period of time. A subject views the colors and is asked to choose the one he feels was flashed for the longest amount of time. The experiment is repeated three times with the same subject. If all the colors were flashed for the same length of time, give the probability distribution for y, the number of times the subject choose the color red. Assume that his three choice are independent

probability or statistics – Computation of 1- and 2-sided p-values for Fisher’s Exact test for 2×2 table based on Monte Carlo

I am looking for code to compute the 1- and 2-sided p-value for Fisher’s Exact test for 2×2 tables based on Monte Carlo (bootstrap simulations).

For larger 2×2 crosstabulation tables, the exact p-value can often not be computed due to the large amount of computations.

Asymptotically p-values are often quite inaccurate.

Therefore, I would like to compute the p-values based on Monte Carlo.

probability or statistics – Calculate variance for a lottery?

I’m trying to calculate the variance for a lottery using the mathematica function “variance”. In the following a simplified example with just a few values, and how I tried to solve it:

payouts = {7, 3, 1, 0};
probabilities = {1/6, 1/6, 1/3, 1/3};

Variance(WeightedData(payouts, probabilities))
Mean(WeightedData(payouts, probabilities))

The Mean gets calculated correctly (= 2), but the output for variance is 108/13 instead of 6. What am I doing wrong? Should I use another function?

postgresql – postgres is choosing wrong index even with STATISTICS 1000

I have a table test_monika with 300k records. Simple query is not using index test_monika_rank_idx instead it is using primary key index test_monika_pkey. Actual rows returned are 1. The planner with current execution plan thinks that the estimated rows are 299996. I am using statistics of 1000 which are maximum. How to resolve this issue.

Current table is dummy table but we had the similar issue in production server. All of our production server runs with default_statistics_Target of 100, never faced any issue except few days back for one query.

Postgresql engine version is 10.

Explain plan is :
postgres=# explain analyze select * from test_monika where rank<=300001 and  rank>=300000 order by id limit 1;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.46 rows=1 width=22) (actual time=0.012..0.012 rows=1 loops=1)
   ->  Index Scan using test_monika_pkey on test_monika  (cost=0.42..10631.42 **rows=299996** width=22) (actual time=0.012..0.013 rows=1 loops=1)
         Filter: ((rank <= 300001) AND (rank >= 300000))
 Planning time: 0.227 ms
 Execution time: 0.033 ms
(5 rows)

Below is the schema:

Column |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id     | integer               |           | not null |         | plain    |              | 
name   | character varying(40) |           |          |         | extended |              | 
rank   | integer               |           |          |         | plain    | 1000         | 
Indexes:
   "test_monika_pkey" PRIMARY KEY, btree (id)
   "test_monika_rank_idx" btree (rank)

postgres-# 

ALTER TABLE test_monika ALTER rank SET STATISTICS 1000;

postgres=# select * from pg_stats WHERE tablename = ‘test_monika’;

schemaname | public
tablename | test_monika
attname | id
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {1,3000,6000,9000,12000,15000,18000,21000,24000,27000,30000,33000,36000,39000,42000,45000,48000,51000,54000,57000,60000,63000,66000,69000,72000,75000,78000,81000,84000,87000,90000,93000,96000,99000,102000,105000,108000,111000,114000,117000,120000,123000,126000,129000,132000,135000,138000,141000,144000,147000,150000,153000,156000,159000,162000,165000,168000,171000,174000,177000,180000,183000,186000,189000,192000,195000,198000,201000,204000,207000,210000,213000,216000,219000,222000,225000,228000,231000,234000,237000,240000,243000,246000,249000,252000,255000,258000,261000,264000,267000,270000,273000,276000,279000,282000,285000,288000,291000,294000,297000,300000}
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |

-( RECORD 2 )-

schemaname | public
tablename | test_monika
attname | name
inherited | f
null_frac | 0.999913
avg_width | 14
n_distinct | -8.66652e-05
most_common_vals |
most_common_freqs |
histogram_bounds | {ABCDEFGHIJKLMNOPQRSTUVWXYZ,BCDEFGHIJKLMNOPQRSTUVWXYZ,CDEFGHIJKLMNOPQRSTUVWXYZ,DEFGHIJKLMNOPQRSTUVWXYZ,EFGHIJKLMNOPQRSTUVWXYZ,FGHIJKLMNOPQRSTUVWXYZ,GHIJKLMNOPQRSTUVWXYZ,HIJKLMNOPQRSTUVWXYZ,IJKLMNOPQRSTUVWXYZ,JKLMNOPQRSTUVWXYZ,KLMNOPQRSTUVWXYZ,LMNOPQRSTUVWXYZ,MNOPQRSTUVWXYZ,NOPQRSTUVWXYZ,OPQRSTUVWXYZ,PQRSTUVWXYZ,QRSTUVWXYZ,RSTUVWXYZ,STUVWXYZ,TUVWXYZ,UVWXYZ,VWXYZ,WXYZ,XYZ,YZ,Z}
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |

-( RECORD 3 )

schemaname | public
tablename | test_monika
attname | rank
inherited | f
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {6,305,605,905,1205,1505,1805,2105,2405,2705,3005,3305,3605,3905,4205,4505,4805,5105,5405,5705,6005,6305,6605,6905,7205,7505,7805,8105,8405,8705,9005,9305,9605,9905,10205,10505,10805,11105,11405,11705,12005,12305,12605,12905,13205,13505,13805,14105,14405,14705,15005,15305,15605,15905,16205,16505,16805,17105,17405,17705,18005,18305,18605,18905,19205,19505,19805,20105,20405,20705,21005,21305,21605,21905,22205,22505,22805,23105,23405,23705,24005,24305,24605,24905,25205,25505,25805,26105,26405,26705,27005,27305,27605,27905,28205,28505,28805,29105,29405,29705,30005,30305,30605,30905,31205,31505,31805,32105,32405,32705,33005,33305,33605,33905,34205,34505,34805,35105,35405,35705,36005,36305,36605,36905,37205,37505,37805,38105,38405,38705,39005,39305,39605,39905,40205,40505,40805,41105,41405,41705,42005,42305,42605,42905,43205,43505,43805,44105,44405,44705,45005,45305,45605,45905,46205,46505,46805,47105,47405,47705,48005,48305,48605,48905,49205,49505,49805,50105,50405,50705,51005,51305,51605,51905,52205,52505,52805,53105,53405,53705,54005,54305,54605,54905,55205,55505,55805,56105,56405,56705,57005,57305,57605,57905,58205,58505,58805,59105,59405,59705,60005,60305,60605,60905,61205,61505,61805,62105,62405,62705,63005,63305,63605,63905,64205,64505,64805,65105,65405,65705,66005,66305,66605,66905,67205,67505,67805,68105,68405,68705,69005,69305,69605,69905,70205,70505,70805,71105,71405,71705,72005,72305,72605,72905,73205,73505,73805,74105,74405,74705,75005,75305,75605,75905,76205,76505,76805,77105,77405,77705,78005,78305,78605,78905,79205,79505,79805,80105,80405,80705,81005,81305,81605,81905,82205,82505,82805,83105,83405,83705,84005,84305,84605,84905,85205,85505,85805,86105,86405,86705,87005,87305,87605,87905,88205,88505,88805,89105,89405,89705,90005,90305,90605,90905,91205,91505,91805,92105,92405,92705,93005,93305,93605,93905,94205,94505,94805,95105,95405,95705,96005,96305,96605,96905,97205,97505,97805,98105,98405,98705,99005,99305,99605,99905,100205,100505,100805,101105,101405,101705,102005,102305,102605,102905,103205,103505,103805,104105,104405,104705,105005,105305,105605,105905,106205,106505,106805,107105,107405,107705,108005,108305,108605,108905,109205,109505,109805,110105,110405,110705,111005,111305,111605,111905,112205,112505,112805,113105,113405,113705,114005,114305,114605,114905,115205,115505,115805,116105,116405,116705,117005,117305,117605,117905,118205,118505,118805,119105,119405,119705,120005,120305,120605,120905,121205,121505,121805,122105,122405,122705,123005,123305,123605,123905,124205,124505,124805,125105,125405,125705,126005,126305,126605,126905,127205,127505,127805,128105,128405,128705,129005,129305,129605,129905,130205,130505,130805,131105,131405,131705,132005,132305,132605,132905,133205,133505,133805,134105,134405,134705,135005,135305,135605,135905,136205,136505,136805,137105,137405,137705,138005,138305,138605,138905,139205,139505,139805,140105,140405,140705,141005,141305,141605,141905,142205,142505,142805,143105,143405,143705,144005,144305,144605,144905,145205,145505,145805,146105,146405,146705,147005,147305,147605,147905,148205,148505,148805,149105,149405,149705,150005,150305,150605,150905,151205,151505,151805,152105,152405,152705,153005,153305,153605,153905,154205,154505,154805,155105,155405,155705,156005,156305,156605,156905,157205,157505,157805,158105,158405,158705,159005,159305,159605,159905,160205,160505,160805,161105,161405,161705,162005,162305,162605,162905,163205,163505,163805,164105,164405,164705,165005,165305,165605,165905,166205,166505,166805,167105,167405,167705,168005,168305,168605,168905,169205,169505,169805,170105,170405,170705,171005,171305,171605,171905,172205,172505,172805,173105,173405,173705,174005,174305,174605,174905,175205,175505,175805,176105,176405,176705,177005,177305,177605,177905,178205,178505,178805,179105,179405,179705,180005,180305,180605,180905,181205,181505,181805,182105,182405,182705,183005,183305,183605,183905,184205,184505,184805,185105,185405,185705,186005,186305,186605,186905,187205,187505,187805,188105,188405,188705,189005,189305,189605,189905,190205,190505,190805,191105,191405,191705,192005,192305,192605,192905,193205,193505,193805,194105,194405,194705,195005,195305,195605,195905,196205,196505,196805,197105,197405,197705,198005,198305,198605,198905,199205,199505,199805,200105,200405,200705,201005,201305,201605,201905,202205,202505,202805,203105,203405,203705,204005,204305,204605,204905,205205,205505,205805,206105,206405,206705,207005,207305,207605,207905,208205,208505,208805,209105,209405,209705,210005,210305,210605,210905,211205,211505,211805,212105,212405,212705,213005,213305,213605,213905,214205,214505,214805,215105,215405,215705,216005,216305,216605,216905,217205,217505,217805,218105,218405,218705,219005,219305,219605,219905,220205,220505,220805,221105,221405,221705,222005,222305,222605,222905,223205,223505,223805,224105,224405,224705,225005,225305,225605,225905,226205,226505,226805,227105,227405,227705,228005,228305,228605,228905,229205,229505,229805,230105,230405,230705,231005,231305,231605,231905,232205,232505,232805,233105,233405,233705,234005,234305,234605,234905,235205,235505,235805,236105,236405,236705,237005,237305,237605,237905,238205,238505,238805,239105,239405,239705,240005,240305,240605,240905,241205,241505,241805,242105,242405,242705,243005,243305,243605,243905,244205,244505,244805,245105,245405,245705,246005,246305,246605,246905,247205,247505,247805,248105,248405,248705,249005,249305,249605,249905,250205,250505,250805,251105,251405,251705,252005,252305,252605,252905,253205,253505,253805,254105,254405,254705,255005,255305,255605,255905,256205,256505,256805,257105,257405,257705,258005,258305,258605,258905,259205,259505,259805,260105,260405,260705,261005,261305,261605,261905,262205,262505,262805,263105,263405,263705,264005,264305,264605,264905,265205,265505,265805,266105,266405,266705,267005,267305,267605,267905,268205,268505,268805,269105,269405,269705,270005,270305,270605,270905,271205,271505,271805,272105,272405,272705,273005,273305,273605,273905,274205,274505,274805,275105,275405,275705,276005,276305,276605,276905,277205,277505,277805,278105,278405,278705,279005,279305,279605,279905,280205,280505,280805,281105,281405,281705,282005,282305,282605,282905,283205,283505,283805,284105,284405,284705,285005,285305,285605,285905,286205,286505,286805,287105,287405,287705,288005,288305,288605,288905,289205,289505,289805,290105,290405,290705,291005,291305,291605,291905,292205,292505,292805,293105,293405,293705,294005,294305,294605,294905,295205,295505,295805,296105,296405,296705,297005,297305,297605,297905,298205,298505,298805,299105,299405,299705,300005}
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |

statistics – How to count duplicates in a mixed pool using AnyDice?

For exact duplicates, the program given in Carcer’s answer to the question you cite can be easily modified to take multiple dice pools:

function: dupes in A:s B:s C:s {
  DICE: (sort {A, B, C})
  DUPES: 0
  loop X over {2..#DICE} {
    if (X-1)@DICE = X@DICE { DUPES: DUPES + 1 }
  }
  result: DUPES
}

output (dupes in 1d12 2d10 1d8)

Note that, like Carcer’s original code, this function will return 3 for (12, 12, 3, 3, 3, 1), not 5, because it counts a group of $n$ identical dice as $n-1$ duplicates! If you do want to count all dice that match at least one other die, here’s a version that will do that:

function: dupes in A:s B:s C:s {
  DICE: (sort {A, B, C})
  DUPES: 0
  loop X over {1..#DICE} {
    PREV_MATCH: X > 1 & (X-1)@DICE = X@DICE
    NEXT_MATCH: X < #DICE & (X+1)@DICE = X@DICE
    if PREV_MATCH | NEXT_MATCH { DUPES: DUPES + 1 }
  }
  result: DUPES
}

output (dupes in 1d12 2d10 1d8)

Note that this version will never yield a result of “one duplicate” — there are always at least two, or none at all!

Alas, because these programs need to iterate through all possible results of all the rolls, they can be rather slow if there are many dice with many sides in the pool. For example, my test outputs above use 1d12, 2d10 and 1d8, because trying to run them for your example of 3d12, 2d10 and 1d8 times out. :/

For bigger dice pools, my Python dice roller from an earlier answer can do the trick instead. Here’s a solution using Carcer’s counting method:

from collections import defaultdict
summary = defaultdict(float)

for d12, p12 in dice_roll(12, count=3):
  for d10, p10 in dice_roll(10, count=2):
    for d8, p8 in dice_roll(8, count=1):
      prob = p12 * p10 * p8
      roll = sorted(d12 + d10 + d8)
      dupes = sum(roll(i) == roll(i+1) for i in range(len(roll) - 1))
      summary(dupes) += prob

for dupes, prob in sorted(summary.items()):
  print("%d duplicates: %.2f%%" % (dupes, 100 * prob))

and here’s one using yours:

from collections import defaultdict
summary = defaultdict(float)

for d12, p12 in dice_roll(12, count=3):
  for d10, p10 in dice_roll(10, count=2):
    for d8, p8 in dice_roll(8, count=1):
      prob = p12 * p10 * p8
      roll = sorted(d12 + d10 + d8)
      dupes = sum(
        (i > 0 and roll(i-1) == roll(i))
        or (i < len(roll)-1 and roll(i+1) == roll(i))
        for i in range(len(roll))
      )
      summary(dupes) += prob

for dupes, prob in sorted(summary.items()):
  print("%d duplicates: %.2f%%" % (dupes, 100 * prob))

(These are also brute force solutions, but Python is faster than AnyDice. And, if you run it on your own computer, it has no time limit.)


All of these programs can be fairly easily modified to also count “near misses”. Since you’ve now split that part into a separate question, I’ve answered it there.