NoriKV Java Client Troubleshooting Guide¶
Solutions to common issues and debugging tips.
Table of Contents¶
- Connection Issues
- Performance Problems
- Error Messages
- Configuration Issues
- Debugging Tips
- Common Pitfalls
Connection Issues¶
Problem: ConnectionException: UNAVAILABLE¶
Symptoms:
Possible Causes: 1. Server is not running 2. Incorrect node addresses 3. Network connectivity issues 4. Firewall blocking connections
Solutions:
# 1. Verify server is running
curl http://localhost:9001/health
# 2. Check network connectivity
telnet localhost 9001
# 3. Verify addresses in config
ClientConfig config = ClientConfig.builder()
.nodes(Arrays.asList("localhost:9001")) // Check port!
.build();
Problem: ConnectionException: DEADLINE_EXCEEDED¶
Symptoms:
Cause: Request timeout exceeded
Solutions:
// 1. Increase timeout
ClientConfig config = ClientConfig.builder()
.nodes(nodes)
.totalShards(1024)
.timeoutMs(10000) // Increase from 5s to 10s
.build();
// 2. Check server performance
// - Is server overloaded?
// - Are operations taking too long?
// - Network latency issues?
// 3. Optimize operations
// - Reduce value sizes
// - Use batch operations
// - Check consistency level (STALE_OK is fastest)
Problem: Cannot Connect to Cluster¶
Symptoms:
Debug Steps:
// 1. Enable verbose logging
System.setProperty("java.util.logging.SimpleFormatter.format",
"[%1$tF %1$tT] [%4$-7s] %5$s %n");
// 2. Test connectivity manually
try (Socket socket = new Socket("localhost", 9001)) {
System.out.println("Connection successful");
} catch (IOException e) {
System.err.println("Cannot connect: " + e.getMessage());
}
// 3. Verify gRPC works
ManagedChannel channel = ManagedChannelBuilder
.forAddress("localhost", 9001)
.usePlaintext()
.build();
try {
// Try health check if available
channel.getState(true);
System.out.println("Channel state: " + channel.getState(false));
} finally {
channel.shutdown();
}
Performance Problems¶
Problem: Slow Operations¶
Symptoms: - Operations taking > 100ms - High p95/p99 latencies - Timeouts under load
Diagnosis:
// Measure operation latency
long start = System.nanoTime();
try {
client.get(key, null);
} finally {
long latency = System.nanoTime() - start;
System.out.println("Latency: " + (latency / 1_000_000.0) + " ms");
}
// Check client stats
ClientStats stats = client.getStats();
System.out.println("Active channels: " + stats.getPoolStats().getActiveChannels());
System.out.println("Cached leaders: " + stats.getTopologyStats().getCachedLeaders());
Solutions:
-
Use appropriate consistency level:
-
Check value sizes:
-
Verify network latency:
-
Check server load:
- Monitor server CPU/memory
- Check server logs for errors
- Verify server is not overloaded
Problem: High Memory Usage¶
Symptoms: - OutOfMemoryError - Frequent garbage collection - Growing heap size
Causes & Solutions:
-
Large values:
-
Not closing clients:
// Bad: Leaking clients public void handleRequest() { NoriKVClient client = new NoriKVClient(config); client.get(key, null); // Never closed! } // Good: Use try-with-resources public void handleRequest() { try (NoriKVClient client = new NoriKVClient(config)) { client.get(key, null); } // Automatically closed } -
Creating too many clients:
Problem: Version Conflicts / CAS Failures¶
Symptoms:
Cause: High contention on hot keys
Solutions:
-
Implement retry with backoff:
int maxRetries = 20; for (int attempt = 0; attempt < maxRetries; attempt++) { try { GetResult current = client.get(key, null); int value = Integer.parseInt(new String(current.getValue())); PutOptions options = PutOptions.builder() .ifMatchVersion(current.getVersion()) .build(); client.put(key, String.valueOf(value + 1).getBytes(), options); break; // Success } catch (VersionMismatchException e) { if (attempt == maxRetries - 1) throw e; // Exponential backoff with jitter long backoff = Math.min(1L << attempt, 1000); Thread.sleep(backoff + random.nextInt(100)); } } -
Reduce contention:
-
Use idempotency for writes that don't need CAS:
Error Messages¶
KeyNotFoundException: Key not found¶
Meaning: Key does not exist in the store
Solutions:
// 1. Handle gracefully
try {
GetResult result = client.get(key, null);
} catch (KeyNotFoundException e) {
// Use default value or create key
client.put(key, defaultValue, null);
}
// 2. Check if key was deleted
// Keys with TTL expire automatically
// 3. Verify key encoding
byte[] key = "user:123".getBytes(StandardCharsets.UTF_8);
// Not: "user:123".getBytes() // Uses platform default!
VersionMismatchException: Version mismatch¶
Meaning: CAS operation failed - version changed
Normal behavior: Expected under concurrent access
Handling:
// This is not an error - it's how CAS works!
// Just retry the operation
for (int retry = 0; retry < 10; retry++) {
try {
GetResult current = client.get(key, null);
// ... compute new value ...
PutOptions options = PutOptions.builder()
.ifMatchVersion(current.getVersion())
.build();
client.put(key, newValue, options);
break; // Success
} catch (VersionMismatchException e) {
// Expected - someone else modified the key
// Loop will retry
}
}
IllegalArgumentException: key cannot be null¶
Meaning: Validation failed - key/value is null or empty
Fix:
// Bad
byte[] key = null;
client.put(key, value, null); // IllegalArgumentException
// Good
byte[] key = "user:123".getBytes(StandardCharsets.UTF_8);
if (key.length > 0) {
client.put(key, value, null);
}
NoriKVException: RETRY_EXHAUSTED¶
Meaning: All retry attempts failed
Causes: - Server is down - Network issues - Persistent errors
Diagnosis:
try {
client.put(key, value, null);
} catch (NoriKVException e) {
System.err.println("Error code: " + e.getCode());
System.err.println("Message: " + e.getMessage());
if (e.getCause() != null) {
System.err.println("Underlying cause:");
e.getCause().printStackTrace();
}
}
Configuration Issues¶
Problem: Wrong Total Shards¶
Symptoms: - Keys not found even though they exist - Inconsistent behavior
Cause: totalShards doesn't match cluster configuration
Fix:
// Wrong
ClientConfig config = ClientConfig.builder()
.nodes(nodes)
.totalShards(512) // Cluster has 1024!
.build();
// Correct
ClientConfig config = ClientConfig.builder()
.nodes(nodes)
.totalShards(1024) // Match cluster config
.build();
// Query server for actual shard count if unsure
Problem: Missing Nodes¶
Symptoms: - Some operations fail - Uneven load distribution
Fix:
// Bad: Missing nodes
ClientConfig config = ClientConfig.builder()
.nodes(Arrays.asList("node1:9001")) // Only 1 of 3 nodes!
.build();
// Good: All nodes
ClientConfig config = ClientConfig.builder()
.nodes(Arrays.asList(
"node1:9001",
"node2:9001",
"node3:9001"
))
.build();
Problem: Retry Configuration Too Aggressive¶
Symptoms: - Operations take very long to fail - High latency on errors
Fix:
// Too many retries
RetryConfig config = RetryConfig.builder()
.maxAttempts(100) // Way too many!
.maxDelayMs(60000) // 60 second backoff!
.build();
// Reasonable
RetryConfig config = RetryConfig.builder()
.maxAttempts(5)
.initialDelayMs(100)
.maxDelayMs(2000)
.build();
Debugging Tips¶
Enable Detailed Logging¶
// Enable gRPC logging
import java.util.logging.*;
Logger grpcLogger = Logger.getLogger("io.grpc");
grpcLogger.setLevel(Level.FINE);
ConsoleHandler handler = new ConsoleHandler();
handler.setLevel(Level.FINE);
grpcLogger.addHandler(handler);
Capture Network Traffic¶
# Use tcpdump to capture gRPC traffic
sudo tcpdump -i any -w norikv.pcap port 9001
# Analyze with Wireshark
wireshark norikv.pcap
Inspect Protobuf Messages¶
// Log protobuf messages
Norikv.PutRequest request = ProtoConverters.buildPutRequest(key, value, options);
System.out.println("Request: " + request);
Monitor Client Stats¶
// Periodic stats monitoring
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
scheduler.scheduleAtFixedRate(() -> {
ClientStats stats = client.getStats();
System.out.println("=== Client Stats ===");
System.out.println("Closed: " + stats.isClosed());
System.out.println("Channels: " + stats.getPoolStats().getActiveChannels());
System.out.println("Epoch: " + stats.getTopologyStats().getCurrentEpoch());
}, 0, 10, TimeUnit.SECONDS);
Test with Ephemeral Server¶
// Use ephemeral server for testing
import com.norikv.client.testing.EphemeralServer;
EphemeralServer server = EphemeralServer.start(9001);
try {
ClientConfig config = ClientConfig.builder()
.nodes(Arrays.asList(server.getAddress()))
.totalShards(1024)
.build();
try (NoriKVClient client = new NoriKVClient(config)) {
// Test operations
client.put("test".getBytes(), "value".getBytes(), null);
}
} finally {
server.stop();
}
Measure Operation Latency¶
public class LatencyMonitor {
public static void measureOperation(String name, Runnable operation) {
long start = System.nanoTime();
try {
operation.run();
} finally {
long latency = System.nanoTime() - start;
double ms = latency / 1_000_000.0;
System.out.printf("%s: %.3f ms%n", name, ms);
}
}
}
// Usage
LatencyMonitor.measureOperation("PUT", () -> {
try {
client.put(key, value, null);
} catch (NoriKVException e) {
// Handle error
}
});
Common Pitfalls¶
1. Not Using UTF-8 Encoding¶
// Bad: Platform default encoding
byte[] key = "user:123".getBytes();
// Good: Explicit UTF-8
byte[] key = "user:123".getBytes(StandardCharsets.UTF_8);
2. Forgetting to Close Client¶
// Bad: Resource leak
NoriKVClient client = new NoriKVClient(config);
client.get(key, null);
// Never closed!
// Good: Always close
try (NoriKVClient client = new NoriKVClient(config)) {
client.get(key, null);
} // Automatically closed
3. Creating Client Per Request¶
// Bad: Expensive
public void handleRequest() {
try (NoriKVClient client = new NoriKVClient(config)) {
client.get(key, null);
}
}
// Good: Reuse client
private final NoriKVClient client = new NoriKVClient(config);
public void handleRequest() {
client.get(key, null);
}
4. Not Handling Version Conflicts¶
// Bad: No retry
try {
client.put(key, newValue, casOptions);
} catch (VersionMismatchException e) {
// Just fail? Should retry!
}
// Good: Retry with backoff
for (int i = 0; i < maxRetries; i++) {
try {
GetResult current = client.get(key, null);
PutOptions opts = PutOptions.builder()
.ifMatchVersion(current.getVersion())
.build();
client.put(key, newValue, opts);
break;
} catch (VersionMismatchException e) {
if (i == maxRetries - 1) throw e;
Thread.sleep(backoff);
}
}
5. Ignoring Client Statistics¶
// Monitor health
ClientStats stats = client.getStats();
if (stats.isClosed()) {
throw new IllegalStateException("Client is closed!");
}
if (stats.getPoolStats().getActiveChannels() == 0) {
logger.warn("No active connections!");
}
6. Using Wrong Consistency Level¶
// Bad: Always linearizable (slow)
GetOptions options = GetOptions.builder()
.consistency(ConsistencyLevel.LINEARIZABLE)
.build();
// Good: Use appropriate level
// - LEASE (default): Most operations
// - LINEARIZABLE: Critical reads only
// - STALE_OK: Caching, read-heavy
7. Not Setting Timeouts¶
// Bad: Default might be too short/long
ClientConfig config = ClientConfig.builder()
.nodes(nodes)
.totalShards(1024)
.build(); // Uses default 5s
// Good: Set appropriate timeout
ClientConfig config = ClientConfig.builder()
.nodes(nodes)
.totalShards(1024)
.timeoutMs(10000) // 10s for slow operations
.build();
8. Large Keys¶
// Bad: Huge keys
String hugeKey = "x".repeat(1000000); // 1MB key!
client.put(hugeKey.getBytes(), value, null);
// Good: Reasonable key sizes
String key = "user:123"; // < 1KB
client.put(key.getBytes(StandardCharsets.UTF_8), value, null);
Getting Help¶
If you're still stuck:
-
Check Examples: See the code examples in
src/main/java/com/norikv/client/examples/ -
Read API Guide: API_GUIDE.md
-
Check Architecture: ARCHITECTURE.md
-
Run Tests: Verify SDK works in your environment
-
Enable Debug Logging: See detailed error messages
-
File an Issue: GitHub Issues
- Include: Java version, SDK version, error messages, configuration
- Provide: Minimal reproducible example
See Also¶
- API Guide - Complete API reference
- Architecture Guide - Internal design
- Advanced Patterns - Complex use cases