Why do hash functions use prime numbers?

25-02-28 14

在本文中，我们将为您详细介绍Whydohashfunctionsuseprimenumbers?的相关知识，此外，我们还会提供一些关于2_成员函数（MemberFunctions)、actionscr

在本文中，我们将为您详细介绍Why do hash functions use prime numbers?的相关知识，此外，我们还会提供一些关于2_成员函数（Member Functions)、actionscript-3 – 从MyProject-App.xml中检测versionNumber？ Adobe AIR、c# – EntityFunctions.TruncateTime和DbFunctions.TruncateTime方法有什么区别？、c# – TransactionScope中的Membership.GetUser()抛出TransactionPromotionException的有用信息。

本文目录一览：

Why do hash functions use prime numbers?
2_成员函数（Member Functions)
actionscript-3 – 从MyProject-App.xml中检测versionNumber？ Adobe AIR
c# – EntityFunctions.TruncateTime和DbFunctions.TruncateTime方法有什么区别？
c# – TransactionScope中的Membership.GetUser()抛出TransactionPromotionException

Why do hash functions use prime numbers?

In a previous post i pointed out how questions posted in reward based discussions sites likestackoverflow.com never gets answered satisfactorily. This post is a look at one such feeble answer and makes an effort to explain in more detail a basic question about hashes.

The Question

In Java, the hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the i^th character of the string, n is the length of the string, and ^ indicates exponentiation. Why is 31 used as a multiplier? So why not 29, or 37, or even 97?

The partially wrong answer

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

NOTE : Primes are part of an older technique for hashing and there are supposedly better hashes which use more sophisticated techniques. I do not know the mathematical foundations for these, so cant vouch for them but its true that the prime number technique is quite old.

The correct but longer explanation

The reason 31 is chosen is because it is prime – not because it is odd. It so happens that primes are un-divisible by any other numbers. This includes 2 and this makes it odd too but the real reason, is its primeness and one other factor which we shall go into shortly .

So why a prime ?

Hashing works by allocating a value into one of the many storages, a hash has, for fast retrievallater on. The storage a hash has is also termed as buckets in comp literature, for reasons which will become clear later on.

hash-new-item Now, how does the hash identify which bucket it needs to store the value in? This is an important question, due to the property of hashes, which makes it compulsory that a hash be able to tell you in constant time (which is hopefully fast) in which bucket the value is stored in. hash-check-items

The slooow hash find logic

for(i = 0; i < totalkeys; i++) { if (CheckForEquality(key[i], "Samuel")) return i; } [/sourcecode] This sort of sequential search, would cause the hash performance to worsen, directly dependent on the number of value it contains. In other words, you would have a linear performance cost (O(n)), which becomes progressively bad with larger and larger no of keys(n). The other complication is the actual type of the value you are dealing with. If you are dealing with strings and other such complex types, the no of checks or comparisons itself becomes prohibitive in terms of cost.

Now we have 2 problems

Complex values which are difficult to compare
Sequential searches are not fast and cannot give constant time lookups

Solving the complex value problem

The easy way out for this issue is to derive a way to decompose complex values into a key or a hash that is easy to work with. The easiest way to achieve this of-course is to generate UNIQUE numbers from your value. The number has to be UNIQUE since we want to distinguish one value from another. This UNIQUE side of things is where primes come in handy.

Primes

Primes are unique numbers. They are unique in that, the product of a prime with any other number has the best chance of being unique (not as unique as the prime itself of-course) due to the fact that a prime is used to compose it. This property is used in hashing functions.

Given a string “Samuel”, you can generate a unique hash by multiply each of the constituent digits or letters with a prime number and adding them up. This is why primes are used.

However using primes is an old technique. The key here to understand that as long as you can generate a sufficiently unique key you can move to other hashing techniques too. Go here for more on this topic about hashes without primes.

Now why is 31 used?

Ok, so now you have your unique identifier generated for each value. Now how do you allocate this value to a bucket or a container you have.

Lets say your container is a fixed array of 16 items. The easiest way to decide where to stick the value or the generated unique number, from now on referred to as the hash key, is to stick the key in the same location as its value.

So if your hash is 3 you stick your key in the location number 3. But of-course what would happen if the hash number is bigger than your container size. For this reason the initial or the very first hashing algorithms used to divide the generated hash key and stick the key in the location pointed to by the remainder. (This again necessitated that the no of storage locations in a hash was a prime number so as to generate unique remainder values).

So assuming your hash has 11 items, the hash key 12, would be stuck into 12%11 remainder 1 – 1st location.

why is 31 used?

Well, when you stick your keys in the containers, depending on the prime used, you would get a different key number and therefore a different distribution of the keys in your array.

So the same key “abc” would be a*31+b*31+c*31 for prime 31 and it would generate a different key for the abc with 29 – a*29+b*29+c*29.

Since the key produced is different, the data would go into a different location depending on the prime used.

Researchers found that using a prime of 31 gives a better distribution to the keys, and lesser no of collisions. No one knows why, the last i know and i had this question answered by Chris Torek himself, who is generally credited with coming up with 31 hash, on the C++ or C mailing list a while back.

Collissions

There are chances that certain strings might cause the same key to be generated. In such cases, the individual has storage is turned into a link list or some other type of storage that can store all the duplicate keys. This is the reason why the individual hash storage is called as a bucket.

Better hashing algorithms

But the modulo hashing algorithm we just looked at is extremely simplistic. There are much more complex and more faster hash algorithms that can be used. There are even tools that will generate the algorithm for you, given the type of keys you use.

Here are some fast and better distributing hash functions / modules if you wish to improve on the ones you have –

Paul Hseih’s Hash

Bob Jenkin’s Hash and his Dr Dobbs Article on the same

One from Google

3 cheers for better performance !!!

2_成员函数（Member Functions)

　　成员函数以定从属于类，不能独立存在，这是它与普通函数的重要区别。所以我们在类定义体外定义成员函数的时候，必须在函数名之前冠以类名，如Date::isLeapYear()。但如果在类定义体内定义成员函数时，并不需要在成员函数前冠以类名。

//=============================================
//日期类应用程序
//=============================================

#include <iostream>
#include <iomanip>

using namespace std;

/**
*类定义体
*/
class Date{
private:
    int year,month,day;
public:
    //在类定义体内定义成员函数，不需要在函数名前冠以类名
    void set(int y,int m,int d)
    {
        year = y;
        month = m;
        day = d;
    };
    bool isLeapYear();
    void print();
};

//成员函数类定义体外定义

bool Date::isLeapYear()
{
    return (year%4==0 && year%100!=0)||(year%400==0);
}

void Date::print()
{
    cout<<setfill(''0'');
    cout<<setw(4)<<year<<''-''<<setw(2)<<month<<''-''<<setw(2)<<day<<''\n'';
    cout<<setfill('' '');
}

　　需要注意的是，函数定义的花括号后面没有分号，而类定义的花括号后面以定由分号，这是由于C语言的历史原因造成的。class机制是建立在struct机制之上，所以要和struct对应。

　　在类内定义的成员函数，就默认声明内联函数的性质，因此当代码量较小，不超过3行的成员函数，放在类中定义比较合适。同样，我们也可以在类外定义成员函数，对编译提出内联要求。

代码如下：

//=============================================
//日期类应用程序
//=============================================

#include <iostream>
#include <iomanip>

using namespace std;

/**
*类定义体
*/
class Date{
private:
    int year,month,day;
public:
    //在类定义体内定义成员函数，不需要在函数名前冠以类名
    void set(int y,int m,int d)
    {
        year = y;
        month = m;
        day = d;
    };
    bool isLeapYear();
    void print();
};

//成员函数类定义体外定义

inline bool Date::isLeapYear() //显示内联
{
    return (year%4==0 && year%100!=0)||(year%400==0);
}

void Date::print()
{
    cout<<setfill(''0'');
    cout<<setw(4)<<year<<''-''<<setw(2)<<month<<''-''<<setw(2)<<day<<''\n'';
    cout<<setfill('' '');
}

actionscript-3 – 从MyProject-App.xml中检测versionNumber？ Adobe AIR

我正在开发的桌面应用程序正在Adobe AIR Flex As3中开发.现在,我想检测AIR应用程序随附的 XML描述文件中的versionNumber参数.

<!-- A string value of the format <0-999>.<0-999>.<0-999> that represents application version which can be used to check for application upgrade. 
Values can also be 1-part or 2-part. It is not necessary to have a 3-part value.
An updated version of application must have a versionNumber value higher than the prevIoUs version. required for namespace >= 2.5 . -->
<versionNumber>1.0.0</versionNumber>

是否有可用于访问此变量或属性的变量或属性？例如,在应用程序的“关于”框中,我希望它依赖于此版本,而不是手动编辑显示的版本.

解决方法

var appxML:XML =  NativeApplication.nativeApplication.applicationDescriptor;

你可以解析这个xml的版本号,就像这样

var ns:Namespace = appxML.namespace();
trace(appxML.ns::versionNumber);

c# – EntityFunctions.TruncateTime和DbFunctions.TruncateTime方法有什么区别？

有什么区别：

EntityFunctions.TruncateTime

和

DbFunctions.TruncateTime methods?

解决方法

没有区别. EntityFunctions是出现在内置于.NET Framework(通过.NET 4.5)的实体框架的版本中的类. DbFunctions类在与.NET Framework分开运送的Entity Framework 6中引入.
对于从6.0开始使用EF版本的任何新应用程序,您应该使用DbFunctions类,因为其他类(和大部分内置的EF库)已被废弃,有利于单独部署的版本.

这两个函数只是代理调用,它们被转换为Entity Framework模型中的底层规范函数,最终转换为sql调用.

c# – TransactionScope中的Membership.GetUser()抛出TransactionPromotionException

以下代码抛出TransactionAbortedException,消息“事务已中止”,内部TransactionPromotionException,消息“尝试提升事务时失败”：

using ( TransactionScope transactionScope = new TransactionScope() )
    {
        try
        {
            using ( MyDataContext context = new MyDataContext() )
            {
                Guid accountID = new Guid( Request.QueryString[ "aid" ] );
                Account account = ( from a in context.Accounts where a.UniqueID.Equals( accountID ) select a ).SingleOrDefault();
                IQueryable < My_Data_Access_Layer.Login > loginList = from l in context.Logins where l.AccountID == account.AccountID select l;

                foreach ( My_Data_Access_Layer.Login login in loginList )
                {
                    MembershipUser membershipUser = Membership.GetUser( login.UniqueID );
                }

                [... lots of DeleteallOnSubmit() calls]

                context.SubmitChanges();
                transactionScope.Complete();
            }   
        }

        catch ( Exception E )
        {
        [... reports the exception ...]
        }
    }

在调用Membership.GetUser()时发生错误.

我的连接字符串是：

<add name="MyConnectionString" connectionString="Data Source=localhost\sqlEXPRESS;Initial Catalog=MyDatabase;Integrated Security=True"
   providerName="System.Data.sqlClient" />

我read的一切都告诉我,TransactionScope应该神奇地应用于会员电话.用户存在(否则我希望返回null.)

解决方法

TransactionScope类掩盖了异常.很可能发生的事情是该范围内的某些内容失败(抛出异常),而TransactionAbortedException只是当控件退出using块时发生的副作用.

尝试将TransactionScope中的所有内容包装在try-catch块中,并在catch中重新抛出,并在那里设置断点;你应该能够看到真正的错误是什么.

另一件事,TransactionScope.Complete应该是在包含TransactionScope的using块结束之前执行的最后一个语句.在这种情况下,你可能应该没问题,因为之后你实际上并没有做任何工作,但是在内部范围内调用Complete会导致更多容易出错的代码.

更新：

既然我们知道内部异常是什么(失败促进交易),那么更清楚的是发生了什么.

问题是在TransactionScope中,实际上是在用GetUser打开另一个数据库连接.成员资格提供者不知道如何重新使用您已经打开的DataContext;它必须打开自己的连接,当TransactionScope看到它时,它会尝试升级到分布式事务.

它失败了,因为您可能在Web服务器,数据库服务器或两者上都禁用了MSDTC.

如果要打开两个单独的连接,则无法避免分布式事务,因此实际上有几种解决此问题的方法：

>在TransactionScope外部移动GetUser调用.也就是说,首先从会员提供者“读取”用户到列表中,然后在实际需要开始修改时启动事务.
>完全删除GetUser调用并直接从数据库,同一DataContext或至少相同的连接读取用户信息.
>在参与交易的所有服务器上启用DTC(当事务提升时,性能将受到影响).

我认为选项#1在这种情况下将是最好的;您需要从会员提供商处读取的数据在您阅读它和开始交易的时间之间进行更改是不太可能的.

我们今天的关于Why do hash functions use prime numbers?的分享已经告一段落，感谢您的关注，如果您想了解更多关于2_成员函数（Member Functions)、actionscript-3 – 从MyProject-App.xml中检测versionNumber？ Adobe AIR、c# – EntityFunctions.TruncateTime和DbFunctions.TruncateTime方法有什么区别？、c# – TransactionScope中的Membership.GetUser()抛出TransactionPromotionException的相关信息，请在本站查询。

本文标签：