字符串常量池是如何实现的？

问题

字符串常量池是如何实现的？

答案

核心实现

字符串常量池（String Pool/String Table）在JVM中使用固定桶数的哈希表（HashTable）实现，采用开放地址法或链地址法解决哈希冲突。

底层数据结构：

// HotSpot JVM源码（简化版）
class StringTable : public Hashtable {
private:
    static StringTable* _the_table;  // 全局唯一实例
    HashtableEntry** _buckets;       // 哈希桶数组
    int _number_of_entries;          // 当前条目数
};

数据结构详解

1. StringTable结构

StringTable (HashTable)
│
├─ buckets[0] → Entry("hello") → Entry("world") → null
├─ buckets[1] → Entry("java") → null
├─ buckets[2] → null
├─ buckets[3] → Entry("jvm") → Entry("gc") → Entry("heap") → null
└─ ...
└─ buckets[60012] → Entry("string") → null

默认桶数：60013（JDK 7/8）

2. Entry结构

// HashtableEntry结构（简化）
class HashtableEntry {
    unsigned int _hash;     // 字符串的哈希值
    oop _literal;          // 指向堆中String对象的指针（oop = ordinary object pointer）
    HashtableEntry* _next; // 链表指针（解决哈希冲突）
};

关键点：

StringTable存储的是String对象的引用（oop），而非字符串本身
String对象实际存储在堆内存中（JDK 7+）

哈希算法

1. 哈希值计算

// String.hashCode()实现
public int hashCode() {
    int h = hash; // 缓存的哈希值
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

算法：h = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

示例：

"hello".hashCode()
= 'h'*31^4 + 'e'*31^3 + 'l'*31^2 + 'l'*31^1 + 'o'
= 104*923521 + 101*29791 + 108*961 + 108*31 + 111
= 99162322

2. 桶索引计算

// 简化的桶索引计算
int bucket_index(String* str, int table_size) {
    unsigned int hash = str->hashCode();
    return hash % table_size; // 取模运算
}

示例：

"hello".hashCode() = 99162322
bucket_index = 99162322 % 60013 = 12296
→ 存储在 buckets[12296]

intern()方法实现

核心流程

// StringTable::intern()简化实现
oop StringTable::intern(String* str) {
    // 1. 计算哈希值
    unsigned int hash = str->hashCode();
    int index = hash % _table_size;

    // 2. 查找桶中是否已存在
    HashtableEntry* entry = _buckets[index];
    while (entry != NULL) {
        if (entry->_hash == hash &&
            java_lang_String::equals(str, entry->_literal)) {
            // 找到：返回已有对象
            return entry->_literal;
        }
        entry = entry->_next;
    }

    // 3. 不存在：创建新entry并插入
    HashtableEntry* new_entry = new HashtableEntry(hash, str);
    new_entry->_next = _buckets[index];
    _buckets[index] = new_entry;
    _number_of_entries++;

    // 4. 检查是否需要扩容（实际不扩容，桶数固定）
    // JDK实现中StringTable的桶数是固定的

    return str;
}

Java层面的intern()调用

public class InternDemo {
    public static void main(String[] args) {
        String s1 = new String("hello"); // 堆中对象
        String s2 = s1.intern(); // 调用native方法

        // intern()查找StringTable
        // 1. 如果"hello"已存在 → 返回已有引用
        // 2. 如果不存在 → 将s1引用加入StringTable → 返回s1
    }
}

性能特性

1. 时间复杂度

操作	平均时间复杂度	最坏时间复杂度
查找（lookup）	O(1)	O(n)（链表长度）
插入（intern）	O(1)	O(n)
桶数固定	-	可能导致链表过长

2. 空间复杂度

总空间 = 桶数组空间 + Entry链表空间
       = table_size * sizeof(pointer) + entry_count * sizeof(Entry)
       = 60013 * 8 bytes + N * 24 bytes（64位JVM）

桶数配置与优化

查看当前配置

# 查看StringTable统计信息
java -XX:+PrintStringTableStatistics -version

# 输出示例
StringTable statistics:
Number of buckets       :     60013 =    480104 bytes
Number of entries       :      1523 =     36552 bytes
Number of literals      :      1523 =    107880 bytes
Average bucket size     :     0.025
Variance of bucket size :     0.025
Maximum bucket size     :         3

调优参数

# 设置StringTable桶数（必须是质数）
-XX:StringTableSize=1000003

# 推荐值（根据应用字符串数量）
# 小应用：60013（默认）
# 中等应用：500009
# 大型应用：1000003

计算公式：

理想桶数 ≈ 预期字符串数量 * 1.5

原因：保持负载因子（load factor）在0.7以下，减少哈希冲突。

JDK版本演进

JDK 6及之前

永久代（PermGen）
├─ StringTable（哈希表）
└─ String对象本身（字符数组）

问题：
- 永久代空间有限（默认64MB）
- 容易发生 PermGen OOM
- Full GC才能回收，效率低

JDK 7

元空间（Metaspace）
└─ StringTable（哈希表结构）

堆（Heap）
└─ String对象（字符数组）

改进：
- String对象移到堆，可被Young GC回收
- 减少PermGen压力

JDK 8+

元空间（Metaspace）
└─ StringTable（哈希表结构）

堆（Heap）
└─ String对象（byte[]数组，Compact Strings优化）

优化：
- 使用byte[]替代char[]（节省50%内存）
- Latin1编码的字符串只占用1字节/字符

实战示例

示例1：查看StringTable负载

public class StringTableLoadDemo {
    public static void main(String[] args) {
        // 添加10万个字符串
        for (int i = 0; i < 100000; i++) {
            String.valueOf(i).intern();
        }

        // 触发统计（需添加JVM参数）
        // -XX:+PrintStringTableStatistics
    }
}

输出：

Number of buckets       :     60013
Number of entries       :    100523  ← 10万+原有字符串
Average bucket size     :     1.675  ← 平均每桶1.675个entry
Maximum bucket size     :        12  ← 最长链表12个节点

分析：

平均桶大小>1，说明存在大量哈希冲突
最长链表12个节点，查找效率降低
建议：增大-XX:StringTableSize

示例2：优化前后对比

# 优化前（默认桶数）
java -XX:StringTableSize=60013 \
     -XX:+PrintStringTableStatistics \
     StringTableLoadDemo

# 输出
Average bucket size     :     1.675
Maximum bucket size     :        12

# 优化后（增大桶数）
java -XX:StringTableSize=500009 \
     -XX:+PrintStringTableStatistics \
     StringTableLoadDemo

# 输出
Average bucket size     :     0.201  ← 明显改善
Maximum bucket size     :         3  ← 冲突减少

内存布局

堆内存（Heap）
│
├─ String对象1: "hello"
│   ├─ hash: 99162322
│   ├─ value: byte[]{104, 101, 108, 108, 111}
│   └─ coder: LATIN1
│
├─ String对象2: "world"
│   └─ ...
└─ ...

StringTable（Metaspace/Native Memory）
│
buckets[12296] → Entry{hash:99162322, literal:→String对象1} → null
buckets[45678] → Entry{hash:12345678, literal:→String对象2} → null

GC与StringTable

Young GC的影响

public class StringGCDemo {
    public static void main(String[] args) {
        // 字符串对象在堆中，可被Young GC回收
        for (int i = 0; i < 1000000; i++) {
            String s = new String("temp" + i);
            // s未调用intern()，不在StringTable中
            // Young GC可以回收
        }

        // 调用intern()后
        for (int i = 0; i < 1000000; i++) {
            String s = new String("persistent" + i).intern();
            // StringTable持有引用，无法被GC回收
        }
    }
}

StringTable Cleaning

JDK 8引入StringTable Cleaning机制：

# 在Young GC时清理死亡的StringTable entry
-XX:+UseStringDeduplication  # 字符串去重（JDK 8u20+）

性能优化建议

合理设置桶数：

-XX:StringTableSize=<质数>
# 常用：60013, 120011, 500009, 1000003

避免过度intern()：

// 错误：大量不重复字符串intern()
for (String log : logs) {
    cache.put(log.intern(), data); // 导致StringTable膨胀
}

// 正确：只对重复字符串intern()
if (isCommonString(str)) {
    return str.intern();
}

监控StringTable：
```
-XX:+PrintStringTableStatistics
```

面试要点总结

数据结构：固定桶数的哈希表（HashTable），链地址法解决冲突
存储内容：String对象的引用（oop），对象本身在堆中（JDK 7+）
哈希算法：String.hashCode()（31进制多项式）
桶数固定：不会自动扩容，需通过-XX:StringTableSize调整
时间复杂度：平均O(1)，最坏O(n)（链表长度）
intern()实现：查找或插入哈希表，返回常量池引用
性能优化：增大桶数减少冲突，避免过度intern()
GC影响：StringTable引用的对象不易被回收，需谨慎使用